Zero Trust for Multi-Agent Systems // Surendra Narang | Venkata Gopi Kolla // Agents in Production 2025
speakers

I am a seasoned cybersecurity and cloud infrastructure leader with over 20 years of experience in cybersecurity, driving innovation at the intersection of AI Security, Zero Trust, and enterprise security. I currently lead advanced security initiatives at Palo Alto Networks, where I oversee the deployment of secure architectures for large-scale systems, including AI-powered infrastructure and Zero Trust models.
I am the author of an upcoming book on AI Security and a regular reviewer for IEEE cybersecurity conferences for AI papers from a security perspective. My current focus includes evaluating the security and scalability of multi-agent systems (MAS), where I explore risks such as prompt injection, rogue agent behavior, data leakage, and adversarial manipulation. I am also focusing on integrating trust mechanisms, enforcing secure communication channels between agents, and deploying strict access controls to secure interactions within decentralized agent environments.

Venkata Gopi Kolla is an Edge Computing & Network Security Specialist with over a decade of experience designing and scaling distributed systems. At Salesforce, he has led major initiatives at the intersection of edge architecture, cybersecurity, and high-availability content delivery — including secure multi-CDN failover, bot protection, and authenticated caching using platforms like Cloudflare Workers and Lambda@Edge.
He also architected a platform-wide API caching framework that enables intelligent caching across CDNs and browsers, achieving over 90% performance improvement for cacheable endpoints. His work spans ETag optimization, origin lockdown strategies, and secure edge-routing at global scale.
Venkata actively serves as a reviewer for several AI and Security conferences, and is currently co-authoring a book on Security in AI Systems, exploring threats, architectures, and real-world mitigation strategies.
SUMMARY
Don’t trust AI agents. Just because an agent is in your system doesn’t mean it should have overly permissive privileges. Restrict access. Defining a clear role for each agent from the beginning. Give each agent only the tools and information access it really needs. Monitoring of the agent's activity. Keep an eye out for odd behavior or agents that step out of line. The sooner you catch it, the easier it is to fix. Keep things safe without making them slow. Good security shouldn’t get in the way of your agents doing their job. You can have both speed and safety.
TRANSCRIPT
Surendra [00:00:00]: Hi everyone, my name is Surendra and I work with Palo Alto Networks as a senior manager for Enterprise Security team. I am here with my co author and partner Venkata. We both come from the Zero Trust deployment experience in the infrastructure and we thought of bringing that experience experience with the AI deployments and started doing some research work keep us learning with all the things required from Zero Trust and mass integration perspective. We are also coming up with our book in sometime in August 10th where we both are co author. So today we are very excited to share our thoughts and learning journey. It's really very different for all of us. We are learning with this journey. So Venkata to you.
Venkata [00:01:00]: Yep thank you Surendra. Hello everyone, I'm Venkata from I work as an application Security engineer at Salesforce and as Surendra mentioned we've been authoring the book about how security comes into picture in the AI world and we are very much excited about it and without any further ado we can actually start our we can dive into the topic. So yeah, all right, so let's first start with the introduction of multi agents. So a multi agent system is a network of independent autonomous software agents. It's designed to perform a specific task or a function based on its assigned role. So agents can make decisions and take actions on their own using the data they observe or receive. The agents don't just work in silos, they coordinate and collaborate using a central orchestrator or a communication framework. The orchestrator manages the workflows, handles conflicts and dynamically reallocates responsibilities among them.
Venkata [00:02:05]: It's just like how an emergency response team would work. Why we need that? What are the key advantages? The advantages are pretty obvious which are like speed, its scalability and file tolerant and real time collaboration and modularity and task specific specialization. It's more like a delegation delegating to whoever does the job in a best way. So yeah.
Surendra [00:02:32]: Okay, before we go and deep dive from the security perspective of multi agent system, it's important that we take some time and talk about the reference architecture. As we can see in the architectural diagram itself, we have several components. If we look at it, each layer has its own architecture playing a distinct role independently and but still collaboratively with keeping security, observability and flexibility in consideration. Now, the key component of the architecture multi agent system is Orchestrator. I call it brain and soul of multi agent system. Considering its role like it coordinates SAS distribution, messaging, workflows, info sequencing and handle dependency, I can keep going around it. Now who all are relying on this Architecture it's mostly agents. Agents which are small software units working or designed to focus on a specific business logic or function such as inventory management system or from the technical perspective running query or a specific time on or a database based on customer request scanning the Internet.
Surendra [00:03:49]: From security perspective a lot of them is there. Then comes the communication layer. If we talk about communication layer, each agent need to exchange message and share the context securely. This mostly being accomplished using APIs or Pub Sub models for flexibility. The important thing which we need to understand is the shared memory concept where agents record the state, store the session context and the log inside and the whole autonomous system is depending on it. I would like to talk about policy plane in detail a little bit because that is where the zero trust deployment or the principle is being taking place. Where you enforce identity, access, control or policy rule. Agent only do what they are permitted based on the policy plane, what sort of policies you have deployed.
Surendra [00:04:42]: That goes with the least privilege. Then comes the observability allows for real time monitoring, debugging and auditing.
Venkata [00:04:54]: Now let's take a real world example of how a multi agent system in a large scale CDN or edge environment would show up. I mean how it would work. So in such platforms you often have a specialized agents like each handling a different specialized task. For example like one manages TLS certificates, another handles DNS routing and another takes care of bot mitigation or even content caching and so on. So these agents operate independently, but still they need to coordinate seamlessly, especially during dynamic events like domain migration or policy changes. But with this power there is a risk as well risk always associated with this kind of power. For instance, if the certificate manager fails or is compromised, we are running it at the risk of man in the middle attack. Or if the routing agent is misconfigured, then we could end up leaking customer metadata or even we create some outages, even observability agents.
Venkata [00:05:56]: If we don't secure them properly, they can be abused to inject or tamper logs. So to prevent this we apply zero trust principles at agent level. That means like each agent only gets a scoped permission and communication is mutually authenticated and every action is logged and signed as a telemetric purpose. And malicious or malfunctioning agents can be quarantined by dynamically. And this kind of agent level security posture helps us scale safely even in a highly dynamic distributed edge systems.
Surendra [00:06:35]: Let's take another use case. We are taking most of the security focused use case because we thought of giving the examples or the Reference of the security, leveraging mass and then talking about the zero trust Here the biggest challenge for almost every enterprise or organization is securing the attack surface in the cloud environment irrespective of any cloud service provider. If you take a look, what are the critical components like API storage bucket or services which are accidentally get exposed to the Internet. Why does this happen? It's usually because of the misconfigured infrastructure code or drift in the ise or someone deploying unapproved change unintentionally or intentionally knowing the consequences of it. How do we solve it? We have certain type of agents. They are playing critical role in real time and making the assessment of the infrastructure like with the name we can easily figure out like cloud asset agent. Then we have service map agent Drift watcher agent. Asset agent looks for anything in your cloud that has been publicly export.
Surendra [00:07:50]: If there is a new delta, it will look whether it has gone through the proper process pr. Whether there is an API exposed, doesn't have any login or any other information being called out. Then you have a kubernetes agent which is monitoring your kubernetes cluster time to time. Now how does it work as a whole? Let's take a small example. We have a DevOps engineer, he accidentally deployed a billing API without any authentication. Now your agent go and identify it within fraction of minutes and see that we don't have the right API controls or the TLS version or cipher suite. It take a look what was the pr? Whether it goes through a proper approval process, it shut down the whole thing like blocking public access, fixing the code automatically with a pull request. It takes maximum three minutes and then our system is again secure.
Surendra [00:08:54]: This ensures that we are having a complete visibility into attack surface area, which traditional tools cannot do. Yeah.
Venkata [00:09:05]: So now let's talk about why security in multi agent system can get really tricky. Like each agent in a multi agent system is typically autonomous, meaning it makes its own decisions, often based on partial knowledge. While that gives us agility and scalability, it also introduces the new class of risks. So for example, like some agents may start with too much privilege either due to misconfiguration or poorly scoped permissions. So that can also lead to privilege escalation or even unauthorized access to critical systems. So here we are talking about unauthorized access by agent, not even by external party. And another problem is like prompt injection, the classical problem in security, right? So especially in ll empowered agents, these agents are really easily gets manipulated if the inputs they receive are not filtered or validated. So that Input could come from a user or another agent, or even an external data.
Venkata [00:10:02]: We also see risks like internal data leaks where one agent inadvertently shares sensitive information with another agent even if that agent is not authorized to see it. And collusion between agents is also another concern. In theory, multiple compromised agents could collaborate to bypass policies. And then there is the classic insider threat. An agent deployed with good intent starts behaving abnormally due to a bug or a logical flaw. So while multi agent system unlocks a lot of flexibility, it also expands the attack surface across like communication paths, memory and decision making processes. And that's where the zero trust becomes really essential.
Surendra [00:10:52]: Okay, now this slide particularly reflect how zero trust principle can be mapped with the mass deployment in the infrastructure. Here each agent is acts within the guardrails of zero trust from identity to access to communication. Now mass and zero trust together create a framework where massagent can work securely within their boundaries. The biggest thing which we are trying to address is unauthorized access and privilege escalation. It's very different from our legacy identity and access management. Now we are talking about non human identities when they are trying to validate and how this can be accomplished using just in time concept risk privileges. And this was being enforced using attribute based and policy based access control. The next one is like the biggest risk for our LLM model is prompt injection and data poisoning.
Surendra [00:11:48]: And basically our API gateways, the perimeter API gateways are the front defense mechanism to protect us from the prompt injection attack. Because most of the prompt injection attack are coming from the untrust environment where with the zero trust we have a principle verify trust and then move give the excel. So that's how it works. Another issue is like data leakage between the agents. This is bigger problem. When the whole system is automated and new human intervention it's very difficult to deploy the micro segmentation. It's not like just you are doing it in the based on the different network. It go with the granular access control and limit the lateral movement.
Surendra [00:12:33]: Data is stacked and smoked like scope limiting flow to authorization only. The biggest target for the zero trust or sorry for the mass system is like bad actors are always trying to perform the civil attacks and collision. When civil attacks like the rogue agents are sitting in the infrastructure and trying to impact the working criteria of the legit attack. So sorry legitated and each to remediate it or I will say the guardrails around it. We need to have a cryptographical verifier build identity which can be like mtls where the sender and receiver both have served to authenticate and that also used for the enabling the encryption collision detection is basically by the behavior based monitoring and traffic recognization. When we have a collision detection, it means the bad actor is already sitting in our infrastructure and we are already in the verge of compromise.
Venkata [00:13:36]: The next one is like unauthorized access and privileges. So let's imagine an agent responsible for provisioning resources. It starts by deploying containers, but then due to overly broad permissions, it ends up deleting firewall rules. And that's not a bug, that's a privilege escalation waiting to happen. In multi agent systems it's common to over permission agents for convenience. But this becomes dangerous fast. So the answer is tight and just in time access, which is like where each agent gets exactly what it needs and only when it needs it. And even during long running sessions we keep reevaluating the access policies dynamically by the policy enforcer.
Venkata [00:14:24]: An agent's environment changes then so as the permissions that agent have and that's how we break the set and forget model we traditionally use and replace it with adaptive trust.
Surendra [00:14:42]: Again we come to the same prompt injection and the data poisoning which is like the biggest problem. Why does the biggest problem in the multi agent system we have multiple entry points where agents are fetching the data from the external sources either using via URLs, API or communicate with other LLM prompts. So they are very much exposed to this sort of attack. Now the question comes how we can prevent it. We already discussed in a brief that our API gateways can be our first defense to protect against the prompt based attack. Here the API gateways are like intercepting each input given by the either by the other agent or by the human by validating and filtering gateways for blocking malfunction. The another way can be context aware sanitization. The middleware agent can be playing a very vital role over here because it does not clean the data, but it also make decision who is the sender and what's the intent.
Surendra [00:15:48]: And based on intent it takes a decision. So it's like a smart agent doing the work on behalf of human. Then we have input schema, how users can what sort of data structure or the defined structure it can take. All this sort of things plays a vital role when it comes to defend against the prompt injection. And the last one which is very important is quarantine and sandboxing the systems or the agents which we don't trust and we put it into an isolated environment and all their communication is being analyzed by either by AI model or by human verification.
Venkata [00:16:31]: Okay, the next one is like the data leak, right? So data doesn't just leak to outsiders, it can also leak between the agents too. Let's say an agent trained on billing data, it starts helping out a support agent. Now we have accidentally exposed financial records in a support workflow. So these kind of cross agent leaks happen when agents are not isolated clearly either through my either in memory or in scope or in communication. We treat every agent like it's operating in a different trust zone. We use access tags, data sends to labels and also role based filters. Even when the agents collaborate, they can only pass along sanitized payloads. Think of it as a clean room for interagent messaging.
Surendra [00:17:21]: The biggest problem with the multi agents or the target for the bad actors civil attacks collision. And that all can happen if you are not using the secure communication. Multi agent system can easily be target via advisory creating multiple fake agent to mislead the system or flood in like false signals. To address these we all should always use like cryptographically verify identity which works on the concept of mutual tls where we encrypt the traffic and as well as both the sender and receiver will have the search which will be used to authenticate and authorize. And this sort of search should be, you know, revoked from time to time. Even when the communication is happening depend on the intent it should happen. Next one is the collision attempt. Where is the agents are that the attacker has taken over.
Surendra [00:18:16]: They may act like a normal agent but are secretly trying to exploit the system. Example shared stolen credentials or sending the hidden messages. Modifying the policy enforcement logic and even avoiding detection by coordinating their behavior. Every message is secured through end to end encryption. If you want to provide the protection against this sort of collision, attack this layer defense ensure that even if the network is breached and the bad actors sitting attackers cannot impersonate agents or hijack the workflow and compromise the communication.
Adam Becker [00:18:56]: Okay guys, I'm gonna. Maybe this is a final slide. Maybe if you want to wrap up, we need to keep going. So if you want to.
Venkata [00:19:06]: Yeah, so yeah, yeah. We have the insider threat, how we. How we actually will mitigate that. And the other one is like we have third party risks. And finally we'll be concluding. So do you want us to directly go to conclusion?
Surendra [00:19:22]: Adam?
Adam Becker [00:19:23]: Yeah, please.
Venkata [00:19:24]: Okay. So yeah, to conclude like so to bring it all together. Multi agent systems are not a future concept. They are already here. And from customer support to orchestration to all the way to cloud defense, Multi agent systems are already in production today. But autonomy at scale introduces risk at scale too. And the old security models like trust once assume forever no longer would work. And that's why we need zero trust at the agent level.
Venkata [00:19:51]: Identity isolation, context aware access and strong observability when done right. This approach lets you safely scale intelligence across your systems without letting Kaya scraping. And that's what we are all in here for. Right. So building a secure, intelligent and self defending systems for real world use. And yeah, that's all we have.
Surendra [00:20:13]: Thank you.
Adam Becker [00:20:15]: Beautiful. Thank you very much guys. This is again, Diego just began to allude to the risks involved here and how the way we think about security and trust needs to evolve. And I think you guys are essentially embodying what that evolution is to at look look like. So I really appreciate the thoroughness and the thoughtfulness. It's very clear that you guys have spent a good amount of time thinking about how these things should operate. What's a good way to stay in touch with you and to follow your work and or connect with you?
Venkata [00:20:49]: LinkedIn. I'm happy. We are happy to post our LinkedIn profile. Cool.
Adam Becker [00:20:54]: If you want to drop it in.
Venkata [00:20:55]: The chat, we'll make sure in the private chat.
Adam Becker [00:21:01]: Yeah, that's fine. I'll drop it for the audience. Thank you very much folks.
Venkata [00:21:07]: Thank you. Thank you so much guys. Thanks for the opportunity.
