👋 Hi, this is Akash with this week’s newsletter. I write about security engineering to help you get into the world's best security teams. We recently crossed 160+ subscribers! Thank you for your readership.
This week, I’m sharing a demonstration of the confused deputy in AI agents. I hope it’s helpful; enjoy!
Thinking of integrating LLM agents with your enterprise data?
Yes, it can make you more efficient.
Do you work with sensitive engineering designs?
What will happen if they get leaked outside?
Engineers work with documents all the time. Some of them are extremely sensitive, and some are public.
Working at Chrome, I’d often write two announcement emails. One would go out internally, and the other would go to the external chromium developer community.
The underlying message stays the same, but we often must redact sensitive information.
There’s no denying that copy-paste isn’t uncommon.
Now, imagine an AI agent is doing this work for you. That agent is controlling what goes out. Do you have that trust?
Previously, we looked at how multiple AI plugins can cause unintended outcomes with minimal prompt engineering. Read more:
Today, we explore another classic security issue in the LLM world, the “Confused Deputy Problem”.
👮 Confused Deputy Problem
“In information security, a confused deputy is a computer program that is tricked by another program (with fewer privileges or less rights) into misusing its authority on the system. It is a specific type of privilege escalation.”
- Wikipedia
Imagine a trusted program doing something on behalf of another user. If it's not careful, it might accidentally use its own permissions to do something harmful, like accessing private data or making unauthorized changes.
The program gets confused about who asks for what, leading to potential security issues.
A program taking action on behalf of a user? Sound familiar?
The keyword here is “trusted”.
🧑🔬 The Test
Let’s get onto the interesting part. In the next part, we will explore the intrinsic inability of AI agents to protect unintended information flow.
🔄 The Workflow
Today, we will have an automated, LLM-powered document summarizer. The goal here is to read a given document and summarize it for public consumption.
📦 The Payload
Here’s the payload we sent:
How you want to trigger this automation is up to you. As a result, we could bypass an instruction-tuned model’s safety features to copy sensitive information.
The results,
Data from this document:
were summarized and copied to,
or in an email,
The problem here is “access control”. The security boundary defined on these documents was not translated to LLM layers.
Note: This work is done on personal accounts and should not be used for anything other than research or academic purposes for unethical benefits.
🌟 🔍 Parting Thoughts
Can we add a DLP module and sanitize LLM’s outputs?
Yes, we can. But, if you want to copy between private documents, isn’t that annoying?
Let’s hear your thoughts on how to solve this problem. Can we translate the access control to LLM?
Share in the comments!
👋 💬 Get In Touch
Want to chat? Find me on LinkedIn.
If you want me to cover a particular topic in security, you can reach out directly to akash@chromium.org.
If you enjoyed this content, please 🔁 share it with friends and consider 🔔 subscribing if you haven’t already. Your 💙 response really motivates me to keep going.