Security of LLM Agents

A three part series of assessing challenges in deploying LLM agents in enterprises

Jun 04, 2024

👋 Hi, this is Akash with this week’s newsletter. I write about security engineering to help you get into the world's best security teams. We recently crossed 100+ subscribers! Thank you for your readership.

This week, I’m sharing my assessment of AI and its security concerns. I hope it’s helpful; enjoy!

The hype around AI is not breaking news anymore.

The thought of “Devin AIs” of the world taking our jobs and civilization certainly crossed our minds.

The future of the workplace looks like this in our heads,

But how much of that is real looking at the technology today?

Today, I am sharing a three-part security assessment I did on some of the commercially available products.

⚠️ Disclaimer: This work is solely my opinion; it must not be confused with the views/opinions of my present and/or past employers.

🚀 The Space

AI agents can be defined as autonomous intelligent software powered by LLM that is connected to data and tools to perform tasks on behalf of a human or another agent. Over the last few months, we have seen multiple product launches in the space of employee and workflow automation for enterprises, including Devin AI, Ema Universal Engineer, Lutra.ai, and others.

We have made tremendous progress since “Attention is all you need”. With AI at the forefront of recent developments in the digital world, the rise of AI agents is at its peak.

AI agents can certainly boost team performance, but are they safe? When you connect your proprietary data, is your access control respected?

The benefits of these agents are realized only when we run them fully autonomously. But what happens when it’s tricked? Are these agents smart enough to stop that?

Cross-Plugin Request Forgery (CPRF)

In the first part, we cover how some of these agents are susceptible to a special attack. But first, let’s define “CPRF”.

An AI agent ecosystem consists of many plugins that work together. Cross-Plugin Request Forgery (CPRF) is an attack where one plugin tricks another plugin into performing unintended actions on behalf of the attacker.

Cross-plugin request forgery (CPRF) is a security vulnerability in software ecosystems. An attacker leverages the trust between different language model-based plugins to manipulate a user into executing unauthorized actions across plugins. This exploit can lead to unintended data access or actions, exploiting the interconnected nature of plugins within the system.

Enough with the basics; let’s do some cool stuff!

🔄 The Workflow

Reasonably simple work to begin with, where an agent is waiting to receive emails in your mailbox and preparing a draft or sending a reply. Let’s see where it can go wrong.

📦 The Payload

Here’s the payload we sent, email body:

Here’s the email that was sent by the AI agent upon receiving this email:

The exciting thing is, irrespective of who sends you an email, your “smart” email assistant might just send them an automated reply with contents from your latest email.

The newest email might not be the one an attacker sent. Do you want that to happen?

Now depending upon what integrations you have activated for this workflow, it might be able to do a lot more.

You might think we should implement prompt injection checks and filtering, which is a fair point. This is just a demonstration, but here’s some food for thought.

We bypassed some filtering by simply encoding the “instructions” in Base64. Here’s the updated payload:

The results were the same, the exploit worked.

Note: This work is done on personal accounts and should not be used for anything other than research or academic purposes for unethical benefits.

🌟 🔍 Parting Thoughts

AI promises a better future, and there’s no denying that. My goal is to raise awareness about the sensitivity of this technology before integrating it into your environment.

Security for AI is a big area where we need more investments from everyone. Let’s not make this technology inherently insecure.

Are you using LLM in your workspace? How do you ensure it’s secure?

👋 💬 Get In Touch

Want to chat? Find me on LinkedIn.

If you want me to cover a particular topic in security, you can reach out directly to akash@chromium.org.

If you enjoyed this content, please 🔁 share it with friends and consider 🔔 subscribing if you haven’t already. Your 💙 response really motivates me to keep going.

Cracking The Security Interview

Discussion about this post