MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Building Reliable AI Agents

Posted Jun 28, 2023 | Views 1.2K
# AI Agents
# LLM in Production
# Stealth
Share
speakers
avatar
Travis Fischer
Founder @ Agentic

Technical Founder with 2 prior exits building OSS product experiments at the intersection of AI agents and Devtools. Previously Amazon, and Microsoft.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Autonomous AI agents have gotten a lot of attention recently, but they're mostly just toys. What are the primitives that we need to build more reliable agents, and what are the main business use cases that agentic automation will enable over the next few years?

+ Read More
TRANSCRIPT

Link to slides

 And I'm gonna bring on Mr. Transitive bullshit himself. What's up Travis? How you doing, dude? I'm good. Demetrius. Thank you very much for having me. It's a pleasure to be here, dude. Can I just say this in front of everyone? I'm going to declare my love for your email. Oh, thank you that you have, and I don't know if we wanna put you on blast and say it to everyone, but let's just say that your email is probably the best that I have ever seen.

And if anyone is not following you on Twitter, I highly recommend that you are such an incredible follow, and you do all kinds of Twitter spaces to talk about what's happening in AI and ML all the time. Yeah. Uh, something that we're all learning about this space. It's so new to together. And, uh, just have a, a good group of people that we like to, uh, discuss the, the, the latest weekly news, um, working with, uh, Ben's bys, uh, newsletter as well.

There we go. And just try to distill it down and make it, make it accessible for people. That's it, man. So it's super cool to have you here. As you know, I'm a fan and now you're on this agent kick. I'm gonna give you 10 minutes on the clock, uh, as if I follow the clock strictly as you know. But that's the fun of having big cushion breaks at the end so I can be a little bit more liberal.

I'll share your screen for you. You should be good to go. I'll be back in 10 minutes, man. Awesome. Thank you, Demetrius. Uh, yeah, so as, as Demetrius said, um, I am transited bullshit, uh, otherwise known as, as Travis Fisher. Um, uh, quick, quick background On, on, on me and, and, uh, what. Why I'm, I'm chatting about agents today.

Uh, so I have, uh, a strong background in open source and, uh, the AI space has been, um, changing exponentially, as I'm sure most of you're aware. Uh, when Chichi BT released six months ago, I released the Chachi BT NPM package and then, uh, released this, uh, Twitter bot chat BT bot on Twitter. Uh, that has about 130,000 followers now.

Um, I also run a, uh, a dis, a discord called Chichi BT hackers.dev, which has about 10,000, uh, AI developers in that, um, And, uh, I've just been building a lot of open source, uh, experiments and, and demos, uh, to really, uh, maximize my rate of learning around this space. And, uh, more recently I've started to really focus on AI agents and, um, You know, let's, let's talk a little bit about, uh, about that.

Uh, so there's been a lot of hype around agents recently. Um, you know, auto G p t, uh, as an example in this, this GitHub star chart is the, uh, fastest growing, um, mo most GitHub starred repo of all time. Um, which is just, just insane. I mean, a month after, uh, auto G P t uh, launched it, it had more stars than like, Soccer, Kubernetes, like some of these just massive, uh, traditional, um, open source projects and, you know, the, the, there's a reason why there's, there's so much hype around it.

Uh, it's, it's, you know, PE people are, are, are in, in enraptured by the, the possibilities. Um, at the same time right now, the, the vast majority of, of these AI agents are, are actually just toys. Um, so let's, let's, uh, be really clear about what we're talking about. Um, what actually is an agent, um, The, I'll, I'll be clear there, there are a few other definitions of agents coming from, uh, a tr a traditional, um, machine learning and, and reinforcement learning.

They, agents are a, a term that, uh, is not super well defined, but from within the scope of, of this talk, uh, and the type of agents that, that I. Feel like there are, the, the most promising, um, I'll define, uh, agents are either autonomous or some autonomous programs, which in particular use the reasoning abilities of AI models to accomplish tasks.

So, you know, currently, uh, the, the AI models that we're talking about are LLMs and it really only. Became possible, start viewing these, these language models as reasoning engines, uh, fairly recently, uh, with, with large foundational models like GT 3.5 and, and GPT four. Really just, just exponentially pushing the bar forward in terms of their ability to, uh, do reasoning.

And then, uh, you know, certain, um, prompting techniques such as chain of thought, uh, or, or self-reflection. Um, just adding even more. Robustness around their ability to actually reason. Um, and, you know, I, I would define an agent as having a couple of, of key components. So, you know, an agent is, you can think of it as, uh, you give an agent a very, very well specified task.

Uh, you, you give it some resources for accomplishing that task. Those resources could be things like, uh, compute time. It could be actual financial money. It could be tools such as access to a third, uh, access to your browser, access to your user accounts. Um, it could be, uh, you know, more, more traditional tools like, like access APIs or, or, or proprietary data.

Um, but so you, you, you give an agent a task, you give it. An agent, some resources to accomplish that task. Uh, generally the, the implementation of the agent is going to involve some planning or de decomposition of the, the task into subtasks. And when you, when you do that, there's, there's a need to do some scheduling and prioritization around these, these sub tasks.

And, and ultimately it ends up looking very much, uh, similar to. Orchestration or, or scheduling in, in like a traditional operating system where you have, let's say, a lot of threads or a lot of work, uh, that, uh, that need to happen and you need to prioritize and actually, uh, schedule, schedule. Those tasks to be executed.

Um, now this can get very complicated very quickly. Oh, and, and there's also the, the whole external memory. Um, and this would be things like existing databases or uh, vector databases. The main difference being that vector databases are kind of fuzzy and for a lot of, of what agents are, are gonna work really well with or where their, where their strengths are going to shine.

Um, the. The key advantage is, is that you don't need to have as, uh, specific of, of rule, rule-based and, and, uh, uh, uh, exhaustive, uh, programming that, that, that makes, makes things really explicit. They, they shine in areas where, where, uh, you can be really flexible and that's, You know, and a, a, a reason why vector databases, even though there's a lot of hype there, is there is this, this flexible as aspect to them in terms of their using them for retrieval.

Um, so I view agents very much as a spectrum. Uh, I think when, when a lot of people say, say agents, they, they, they jump straight to the right side of this, this graph and, and think about fully self-driving programs. Um, I think of, of agents as. Uh, a spectrum where you, you start off at the, at the very bottom you have, uh, let's, let's think of, think of the, the large language model as a C P U, um, or as a, a reasoning engine.

And, and for the purposes of this talk, we can think of them as a black box. That, and, and, you know, they're, they're, they're great at understanding and generating natural language, but the, the thing that's really. Game changing about more recent LLMs is their ability to reason. And and that's where if you, if you start to think about them as, as CPUs inside of a, a fundamentally new, Paradigm of compute.

Um, and then you can, you can ask yourself, well, what, what does, what does the world of of programs look like that are built on top of those, those language models as reasoning engines, um, built on top of those CPUs. And, uh, I, I view those programs as agents, um, And you know, on, on the one hand you have very traditional deterministic programming.

Uh, you can think of this as programs that are written by a human. Uh, the human is definitely driving, driving the bus. And on the, the, the other hand, uh, other side of the, the spectrum, you have fully self-driving programs. Um, these are are things like, uh, auto, G P T or baby agi. Uh, and the, the idea is, is you give, you give the, the fully self-driving agent, um, or fully autonomous agent, uh, a task, and it just goes off and, and completely on its own, uh, accomplishes that task.

And one of the things that, that I really wanna, what drive home is that. Um, just like, uh, just like with, with self-driving cars, trying to jump straight to self-driving programs is a mistake. Um, it, it, it's, it's amazing from a marketing perspective, it's amazing from a. Uh, the perspective of, of showing people like, like what will be possible soon, but, you know, this is LMS in production.

We're talking about, uh, what's possible today. And, and, uh, my main point that I, I wanna make here is, uh, the, that, that we really wanna start kind of more towards the left hand side of this graph, but it's really a spectrum. And the question is, you know, how can we. How can we start with more deterministic agents?

Uh, and, and, and, and slightly more constrained, uh, uh, uh, areas. Just like we, we we've done with, with self, self-driving cars. There's a spectrum there. Um, and then, uh, gradually move towards more self-driving programs over time. So some of the key challenges, uh, and I'll go through this kind of, kind of quickly here, uh, would be one, agents have a, a tendency, the more autonomous you get, they, they tend to get stuck in loops or, or diverge, uh, away from the original path.

And, and they, they don't oftentimes have a good ability to reflect and actually get back on on track. Um, there's something known as a composability gap where even if, if. Uh, an agent is really good at solving like one sub problem. Um, the, the, the original problem, which is composed of many sub problems, uh, the, the kind of probability or, or, or reliability of, of that, that overall agent, um, decreases very, very quickly.

Uh, another thing is like if you, if you just give an agent, uh, a task and say, go off and do this. Um, this becomes a, a very challenging UX problem. So it's not just on the algorithmic side. It's not just on the, on the, the, um, developer side, the UX of, of actually. From a user standpoint, understanding what these agents are doing and being able to interpret them and also being able to, to keep a human in the loop is, is really important.

And that, that requires, uh, really, really a lot of work on the UX side. Uh, another. Challenge is, is kinda latency and cost. Most of the, the, the current ai, uh, or agent frameworks use default to using, um, you know, large foundational hosted models. And, uh, the cost. There is something where when you have, uh, recursive, uh, uh, calls like that can, that can add up really, really quickly.

Um, and, and alt Uh, finally, safety is, is, is of paramount importance. Um, if you are actually giving agent right access to the world on your behalf. Uh, making sure you keep, keep some guardrails on that is super important. So I'm just gonna, uh, uh, go through some, some advice for, for building towards reliable agents.

Uh, you know, the first, first, and, and most important piece of advice is to actually constrain the, the, the types of tasks that you're, you're setting out to, to accomplish with, with agents. Um, anything that is, is too generic or, or, or too large. I, I think you're, you're gonna run into, uh, uh, problems with. In, in my opinion, uh, agents are a really good fit today for more of, uh, uh, very repetitive, uh, uh, tasks.

Robotic pro, uh, process, uh, RPA type tasks, um, tasks that, that, uh, you would want to have, be always on in the background, working on your behalf. Um, I mean, another, another, uh, approach is to constrain the set of tools that you, that you use. So, Uh, instead of giving an agent access to, to a thousand tools, if you know that there, there's only a couple tools that you need for a given task, just constrain it that way.

Um, keeping a human in the loop as part of the feedback process. This is, this is 1, 1, 1 area where, uh, if you compare, um, agents to kind traditional workflows, uh, this is, this is an area where. Um, agents can fundamentally improve because they're, they're built on top of, uh, machine learning models and, and you can build in some, uh, human, human feedback loops that improve and learn and, and, and get more, uh, better at, at solving the task over, over time.

So this is something that I think is a fundamental key, key primitive, uh, here. Uh, another, uh, approach is to kind of build up, uh, an ecosystem of very reliable primitives. I'll, I'll talk about this in a second. Um, perf. Wherever possible preferred deterministic code. And you might start off with a very generic, like, like say GPT four, but uh, move towards deterministic code eventually.

Um, One, one thing I want, I wanna say is, is multi-agent systems are distributed systems, and you can, there, there's a lot that we can learn from traditional distributed systems in that context. Uh, one example, uh, here is a recent paper, uh, by Nvidia and some researchers. Called Voyager, uh, that is solving, uh, uh, various Minecraft tasks.

And, and one of the key insights here was that the, the AI would generate, uh, uh, code based skills, uh, uh, on the fly and, uh, evaluate the, those, those skills as kind of subroutines and, and have them, uh, build up this library of skills over time. That, that, that it could keep, keep referencing, uh, backwards.

And, uh, I, I'm running out of time here. So last thing I want, I wanna say, um, If, if you view kind of the, the ideal agent as this fully autonomous, uh, single entity given single task, um, I am advocating for, uh, more of, of, of a kind of scripted agent workflow that that breaks up a task into subtask. And, and in particular if you, you think of, of, of the, the, uh, solution here as, as a series of nodes.

Every one of these, these nodes is something that you can reason about. No matter. No matter how. How intelligent the underlying language models become. Um, this, the ability for, for humans to reason about the steps is, is fundamental. And whether, whether this, uh, graph on the right is, uh, you know, generated, um, statically or whether it's generated on the fly.

Like, I still think it's, it's very important to be able to have this level of interpretability because you can also then start to think about, well, this, this graph is, is kind of a higher level programming language, and we're moving away from like traditional, I'm writing Python code to, I'm writing in, in these higher level, uh, primitives.

And in that world we can, we can apply lessons from traditional software engineering, you know, at the, the node level of this graph. What does a, a unit test look like? What does, uh, writing an eval to, uh, Maybe start off with, with a very, maybe a, a note starts off with GPD four. It's a very generic model.

You run, you run it, uh, a thousand times, you get a thousand inputs and outputs. You distill that down into a very fi, uh, fi, fine grain model. Um, these are all lessons that, that I think we can take from traditional software engineering and apply in this context. Um, and, uh, anyway, I I, super quick, quick version of this lightning talk, but that's, that's it on my end.

So, uh, yeah. Awesome dude. Thank you so much. This was incredible as I knew it would be. Agents are so hot right now and it is so hard to figure out how you can get them to work and you are deep in the trenches. I know that I was talking to a friend of mine, uh, Brian, and he was like, yeah, I tried auto G P T and all I was left with was a $200 open AI bill, and I did not get it working the whole time.

So it's good to see. Another friend of mine was like, Pretty sure ai Twitter is gaslighting me because this shit never works when I try it. Yeah, I mean there, there's, there's definitely a, a, a lot of hype and a lot of excitement around what will be. Possible in the future. For today, I think you, you wanna be more on the, the left hand of that spectrum where you're writing more deterministic code, code driven and using LLMs as, as little pieces instead of big pieces of, of fully autonomous, um, at least for the time being.

Yeah, we're not there yet. All right, man. I'm gonna kick you off because we're keeping it rolling. Thank beat. Thank you, Travis. I'll talk to you later, man. And I'll see you in, uh, San Francisco. Maybe if you're Yeah, yeah. I'm over at HF Hero. Uh, would, would, would love to, uh, to meet you when you're, when you're out in f f.

All right. Sweet. Talk to you later, man.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

24:45
Building Reliable Agents // Eno Reyes // Agents in Production
Posted Nov 20, 2024 | Views 1.3K
# Agentic
# AI Systems
# Factory
Building Conversational AI Agents with Voice
Posted Mar 06, 2024 | Views 1.5K
# Conversational AI
# Voice
# Deepgram
Building the Next Generation of Reliable AI // Shreya Rajpal // AI in Production Keynote
Posted Feb 17, 2024 | Views 892
# AI
# MLOps Tools