MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Building Multi-Agent Systems // Shrivu Shankar // Agent Hour

Posted Feb 19, 2025 | Views 13
# Multi-Agent
# AI Systems
# Security
Share
speaker
avatar
Shrivu Shankar
Staff Machine Learning Engineer @ Abnormal Security

GenAI Engineer at Abnormal Security specializing in AI-driven cybersecurity solutions. Passionate about leveraging cutting-edge technologies to protect organizations from advanced cyber threats.

+ Read More
SUMMARY

As Large Language Models (LLMs) evolve, the challenge shifts from raw capability to structuring them into reliable, scalable systems. Many real-world AI products struggle with robustness, complexity management, and evaluation—especially in enterprise contexts. This talk explores how multi-agent systems can help overcome these obstacles by decomposing large monolithic agents into specialized subagents working together in structured architectures. We’ll cover: - Why enterprises struggle to integrate LLM agents effectively. - How multi-agent architectures (Assembly Line, Call Center, and Manager-Worker) improve scalability, modularity, and reliability. - Practical trade-offs and implementation strategies from real-world applications. (planning to adapt my post https://blog.sshh.io/p/building-multi-agent-systems)

+ Read More
TRANSCRIPT

Demetrios [00:00:00]: We will go to our next speaker. Where you at? There you are. Do you want to share your screen real fast?

Shrivu Shankar [00:00:11]: Hey guys, talking about multi agent systems, not just single agents, but now kind of stringing multiple together and how this solves some of the real world problems that I and I'm sure many other people face whenever building these kind of LLM based systems in the wild. A little bit about me I am a machine learning engineer termed prompt Engineer at Abnormal Security where I work on a lot of LLM in agentic systems for cybersecurity purposes. In my free time when I'm not building models or agents at work, I'm building stuff on the side in open source to really explore what are some of the limitations of LLMs and some of the cool ways to use them. One core question that I've been thinking a lot about as I use AI products in the wild and as I build with LLMs is if LLMs are so smart, when we see all these benchmarks coming out of these top tier models, why are a lot of AI products that I interact with not actually that useful or have very limited capability? Some of my ideas for why this is the case is while 90% accuracy might work for a ChatGPT like product where people already know the limitations of the technology, it might not cut it for enterprise products that are being used by people who have higher expectations. You also have this issue where the efficacy of an agent might degrade as you introduce enterprise specific complexity. So, so the basic prompts might work, but as you add more and more constraints about how your business specifically operates, especially if that business wasn't in the training data, it starts to work less and less. Another issue you have is enterprise data is often very messy. Maybe it's not a single PDF document that describes your business, but a variety of different data sources with different nuances and different data formats.

Shrivu Shankar [00:01:46]: And the last thing is like the larger and more capable the agent often as just as an engineer building these kinds of systems, it's much harder to evaluate, make low risk changes and also paralyze improvements if you have multiple people working on the same project. So kind of a motivating example that I'm sure many people who are building agents or these LLM GPTs might be familiar with. So on the left I have what I consider, you know, this simple agent weather assistant. You know, here's the Get Weather tool and you have a location and it can handle pretty simple things where it's like, you know, hey, compare the weather between ETSAP and Seattle and use that tool. But when we move to the right, and even this is actually a simple example, but you can imagine, you know, dozens more or even hundreds more tools and instructions. We say, hey, you are a forecasting assistant for Cloud Incorporated. Here's all these different tools for getting the weather now and forecasting and warnings. And you know, maybe you have all these interesting edge cases where, you know, you don't want to accidentally have your GPT or your agent or LLM suggest alternative weather products.

Shrivu Shankar [00:02:43]: You want a specific format, you might not have data for all locations, you want to make sure that it doesn't like suggest random data that you're not pulling. And so over time, as you connect your agent with more and more data sources to build like an agent for your company or for your application, you run into this issue where you have tons of tools and instructions, but they just end up being too much and the LLM is unable to kind of really understand. The more instructions you add, the more instructions it sort of forgets. So one solution I have, and this is kind of similar to a lot of engineering problems, is to kind of modularize it and build multi agent systems and in this case basically splitting the problem into smaller pieces. So talk a little bit about some of the advantage of this, some high level design patterns and then the remaining open questions. So at a high level you're breaking this into sub agents. So what became one prompt is maybe multiple calls to an LLM with different contexts. The idea here is that each individual subagent can own and abstract away the complexity of their subdomain.

Shrivu Shankar [00:03:40]: And, and I like to compare this a lot to actual human organization design, where a SaaS company might be made up of lots of different people and with different specializations. In this case, a software engineer might own the complexity and nuance of a code base, whereas an account executive might own the complexity or knowledge about a specific account. You also have this idea where they communicate with each other not through RPC calls, but this semi structured natural language which is akin to how we actually communicate with maybe tickets or structured meetings or channels. Then you have this idea where by breaking it up into smaller pieces, you can evaluate and improve parts individually without risking as much of a degradation to the system as a whole. So rather than having a single prompt that you're maintaining, maybe you have multiple smaller prompts that you can individually fix issues with, rather than messing with the full one. Going back to some of those pain points, this allows you to manage complexity by keeping the per subagent complexity low and so Rather than one big prompt, you sort of have multiple small prompts that are kind of a more bite sized amount of instruction for an LLM. And then you have reliability, which is improved through the ability to now make changes and evaluate kind of locally and isolate faults to specific parts of the whole system. So I'll kind of break this into three agent design patterns.

Shrivu Shankar [00:04:56]: I almost like to think of this as kind of, you know, you have a lot of engineering system design patterns. Potentially more and more we'll see this field evolve into actual agentic design patterns. Or how do you actually get LLMs to systematically perform these more complex tasks? First I'll kind of say that there's sort of two different kinds of subagents. When you're breaking a problem down, there's what I call the front end subagents, which is maybe kind of similar to the idea of front end code and back end code. The front end subagents are the ones that interact with users outside the organization. They handle things like understanding what the person is actually trying to ask, handling the actual right tones and structured outputs. And they sort of own the complexity involved with customer interaction and customer interaction like outputs. Then you have the idea of backend sub agents who interact only internally and they own various subproblems.

Shrivu Shankar [00:05:43]: So maybe you have specific data sources that have nuances or instructions related to internal workflows. Those are handled by your backend sub agents. And so we'll talk a little bit about some of these design patterns and you'll kind of see that the blue and green in some of them. But the first one is this idea of a assembly line. So this is the first kind of way that you have a single agent. It's not working. How do I break it into these smaller pieces? And so this one you might see in other frameworks called like vertical or sequential. It's this idea of where you take an input, you maybe do some planning at the very first layer here, and then you kind of break down the problem into individual stages that are handled by separate chat completion calls.

Shrivu Shankar [00:06:22]: And so this is great for when you have an agent that has a lot of very well defined steps that you want to break apart. And so the example is building a website where maybe the first sub agent does some amount of planning, figuring out what the user was actually asking for, clarifying things. The next subagent maybe is responsible for building the database schema. The next one might be front end, the next one might be back end, and so on. So you have like a very stage based approach and if you want to add features, you just add additional stages to this workflow. The cons here is when you break something up like this, it doesn't really handle out of sequence or unexpected results very well. If people don't ask, hey, build me a website, but instead, hey, maybe fix an existing website. Those kinds of questions will really utterly fail if you try to structure it like this.

Shrivu Shankar [00:07:01]: But it works for the very staged applications. You also have the call center agent, which is kind of the transpose of that, where you have agents that are specific to different domains. And this is what I think I see a lot when it comes to a lot of current agent frameworks or even things like OpenAI Swarm. It's this idea where you have kind of different almost GPTs for different use cases and you're routing the question to a specific one that handles that request. So this is great when the user's request is obviously matched to a single subdomain, maybe you have an agent for every product that you have that answers questions on that. And if you have a new product, you just simply add another agent and route to it. The downside of something like this is it's very hard to do cross domain queries. And so you can imagine in a case where maybe we have a trip advisor assistant kind of thing, and we have hotel questions, flight questions, and those are different domains if you wanted to say, hey, book me a flight that's similar to this hotel or located near it.

Shrivu Shankar [00:07:55]: You now have to cross reference different agents and it kind of fails within this pattern. But it's a great way when you kind of have these clear zones, domains or divisions of knowledge, kind of putting them into this pattern and splitting it up. Then you have the manager worker architecture here. And so this one, you take the input, you give it to a single kind of orchestrator manager sub agent at the top that kind of handles pulling information from different data sources. And this is actually kind of very akin to a lot of other agents that we see where, you know, maybe we have an agent and it has all these different tools. However, the difference here is these tools themselves are other agents that it's asking for information from. So you can imagine in that Hotel TripAdvisor kind of type application, you have all the complexity related to hotel bookings and what hotels available, what hotel APIs you have access to are all within that kind of hotel sub agent. And it's not something that the main manager orchestrator sub agent needs to worry about to kind of help reduce the complexity of any individual LLM Caller set of instructions.

Shrivu Shankar [00:08:56]: So what's nice about this architecture, it's very flexible, modular, and can handle complex queries because we can kind of just pull and join information from all these different data sources. But the main issue is eventually your manager does become a bottleneck. You can imagine even routing to so many different sub agents eventually becomes complex. And typically you might need to even break this subagent into multiple subagents that can kind of help with that complexity. So going back to that weather agent example, really it depends on what you're trying to do and what you expect the customers to be asking for, how you should actually break it down. But let's say this prompt was not working for you and you had a couple different use cases. Maybe you're building detailed weather forecasts that pull from all the different pieces of data. Maybe it makes sense to put this into an assembly line pattern where you have different agents that own specific parts.

Shrivu Shankar [00:09:44]: Maybe the weather now is in one agent, the weather forecast is another agent, and they kind of iteratively add to this building report. Another option is potentially, you know, if you have a lot of different, you know, Weather Incorporated products for live weather forecast and alerting, you put this into the call center agent and you just sort of route the user's request only to an agent that has a subset of those full set of tools and answers questions there. Or maybe you have. You kind of want a more complex ask anything join agent that you want to work on. And so this one, you would put this within a manager worker kind of architecture where the manager itself might not really even know about the different nuances of data or even the specific weather tools. Instead, it sort of asks individual subagents who kind of really understand those different domains. So some open questions I think with breaking things down into these modular architectures are, you know, what do the costs look like? On one hand, the costs are reducing because you, you no longer have to spend all those tokens sending a massive prompt with all your instructions. Every time it's sort of more optimized for certain questions that require certain knowledge, you'll spend tokens on those.

Shrivu Shankar [00:10:49]: But then at the same time, there's a lot of cost associated with so many different LLM calls that you're doing, rather than just a single LLM call. There's also a question of like, what do Genai engineering teams look like? How do you split this problem among multiple engineers who might be working on an AI product? On one hand, maybe you might want to break and assign each parts of the subagent with different engineering teams or different engineering engineers. But then there's also this question of how much of this do you actually need to orchestrate by hand versus the future agents who are also fairly good at coding, can they also do some of this splitting and breaking up into multi agent structures? There's of course the question of what are the actual tools and frameworks to build these. I think I've seen a lot of applications are using Langgraph or Crewai or some of those other ones. I don't think any of them right now are sort of, I would say like super mature. And I think we're still figuring out exactly what are the right ways to build these. And you know, how do we interact with these models going forward. But I think those are great frameworks to kind of start building and thinking about these different architectures.

Shrivu Shankar [00:11:51]: And then there's how does this evolve with bigger and more capable models? So, you know, a lot of the motivation for building this, building on this design was because the models themselves could only really understand and use so many instructions at a time. We look at like reasoning models and more advanced, larger ones. Potentially the need is less to do these. My argument would still be that breaking it into different little pieces is still important, just regardless of the model's efficacy. But the ability as engineers to kind of understand and evaluate the system in a modular way is still very useful. So I think while the bigger models you have, the better the models get. Probably the point at which you need to switch to a multi agent architecture goes farther, farther away, but there's still a need for very complex applications to be able to break it down and yeah, that's it. Hope this was useful.

Demetrios [00:12:40]: Bro. That was awesome. All right, cool. I am always going to have questions, but I also want to make sure that everybody that is here knows. Feel free to blast the questions off at any moment in time. You can raise your hand or you can just go off. Mute. I wanted to ask about.

Demetrios [00:12:59]: So I have two questions just to start us off on the front end versus back end, which I find fascinating. How have you seen it done on the front end to make sure, like sometimes when you have certain agents, you need X amount of information before you can go off and get whatever task needs to be done. How have you been doing it so that you get that information with that front end agent?

Shrivu Shankar [00:13:36]: Yeah, so I think ultimately it comes down to like, what would your prompt have looked like before as a single sub agent? And like what were the instructions in that prompt to handle? Like that kind of Gathering requirements sort of phase. In this case, I would say, like, in this case, the front end subagent is solely an agent that has those instructions for, hey, we need these pieces of information to get started. Or before we actually jump into the question, ask these clarifying questions to the user using the front end agent. That is where you put all of those instructions around human interaction or user interaction.

Demetrios [00:14:11]: Nice. Yeah, that makes sense. And then when you are talking about all of these different design patterns, you're designing each one of the, like, let's take the assembly line, for example. Each one of these squares, you are specifically creating each one of these agents. You're not having the agent spawn off new agents, right?

Shrivu Shankar [00:14:36]: Yeah. So there's definitely different variants. The most basic example is you are hard coding basically these different agent structures. And so like in the website example, you're saying, you know, I want the backend to be handled by one agent. Like that's one bundle of complexity. And then the front end or the front end of the website's handled by another sub agent. There's definitely versions where the agent itself spins up different sub agents to kind of help handle that. Definitely.

Shrivu Shankar [00:15:01]: Pros and cons to both. And I think that comes back to the level of autonomy you want to give the agent. I think it's definitely a trade off with the efficacy, the understandability. I can imagine. If you're trying to debug, why did a customer request result in such a weird query? It's harder to look at the agent spotting its own sub agents to debug that versus saying, oh, the backend sub agent is the one that failed. We just need to continue iterating on that specific piece of the system.

Demetrios [00:15:28]: Yeah, yeah. And it gives you that modular ability. Debugging. I like, I like that you call that out that. I mean, I got more questions, but I know other people have questions too, so I want to shut up and make sure that I don't just talk the whole time. Just in case. We could put on some elevator music too while we wait and. Or the Jeopardy music meanwhile.

Demetrios [00:15:59]: All right, somebody just jump in and cut me off because I'm gonna keep asking these questions for the. The different design patterns that you had. I think one thing that was super clear to me is I've seen all of those different design patterns smushed into one design, like a hybrid. Have you experimented much with those? And the complexity versus, like actual reliability, I imagine is a huge trade off there too.

Shrivu Shankar [00:16:30]: Yeah, yeah. I mean, definitely. I think the high level idea for me is you know, you start with the simple agent and then you see how it's failing, and then you kind of handle some of those failures by breaking it into these different modules. So definitely in some of the cases where I built some of the more complex agents, there's cases where I go with the manager, manager, sub agent, worker kind of paradigm. And then I see that the routing itself is now being kind of complex and there's a lot of business logic there. And then we break the manager itself into maybe an assembly line agent where maybe it does a planning stage. It kind of looks at the different data sources to kind of like understand how should I be routing this kind of data? How should I be structuring it? So it's almost like breaking the manager worker architecture into like an assembly line manager and then, you know, keeping the workers. And so it's sort of like as things fail, breaking it into the right pieces.

Shrivu Shankar [00:17:19]: And so then it becomes a much more complex graph as you sort of iterate on things.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

Managing Small Knowledge Graphs for Multi-agent Systems
Posted May 28, 2024 | Views 1.8K
# Knowledge Graphs
# Generative AI
# RAG
# Whyhow.ai
Intelligent Autonomous Multi Agent AI Systems // Natan Vidra // Agent Hour #2
Posted Dec 19, 2024 | Views 393
# Autonomous
# Multi-Agent
# Agents
# AI agents in production
Create Multi-Agent AI Systems in JavaScript // Dariel Vila // Agents in Production
Posted Nov 26, 2024 | Views 1.1K
# javascript
# multi-agent
# AI Systems