Sign in or Join the community to continue

Building Stateful Agents with Memory // Sarah Wooders // Agent Hour

Posted Feb 06, 2025 | Views 208

Share

speaker

Sarah Wooders

Co-Founder and CTO @ Letta

Sarah Wooders is the co-founder and CTO of Letta, which is building the platform for stateful AI agents. She has a PhD in from UC Berkeley, where she was advised by Ion Stoica and Joseph Gonzalez and focused on systems for AI. Prior to Berkeley, she graduated from MIT with a degree in Computer Science and Math, and co-founded a startup using AI for e-commerce product categorization which was part of the YC W20 batch.

+ Read More

SUMMARY

We are currently in the midst of a paradigm shift from stateless LLM workflows to stateful LLM agents. Today, developers are responsible for managing state (e.g. message history across sessions) and memory (e.g. with a RAG and a vector DB) themselves. Letta is an agents framework where the agents service is responsible for state and memory management, rather than client-side applications. This dramatically simplifies the experience of building stateful agentic applications, as Letta will use memory management techniques (extending the ideas from MemGPT) to automatically ensure the most relevant information is passed into the LLM context window, and also avoid context overflow errors. In this talk, we’ll cover Letta’s high-level architecture, and also explain the details of state and memory management. We’ll also go over how to use Letta to build stateful, reasoning agents with support for custom tools, secure tool environments, and personalized memory.

+ Read More

TRANSCRIPT

Demetrios [00:00:00]: Sarah, I will hand it over to you and let's get rock and rolling.

Sarah Wooders [00:00:09]: So hi everyone, I'm Sarah. I'm the CTO and co founder of a company called Letta. And we're basically working on agents that learn. So Letta is basically an open source framework and also a platform for powering stateful agents. So we actually just released a blog post on this today, kind of explaining what this means, but basically it's agents that are persistent and then they also have better reasoning, memory and also personalization. So the company was actually spun out of UC Berkeley's Rise Skylab and we actually also worked on the memgpt project prior to starting the company. So the first three authors of MGPT are part of the founding team along with our advisors. So to kind of contextualize where LEDA is on the stack, I think we've actually seen like a lot of maturity coming into the AI agent stack.

Sarah Wooders [00:00:57]: Where before there wasn't really any kind of tooling, but now we're seeing a ton of like really specialized tooling. And so I would kind of categorize Letta as basically one of many different agent frameworks versus mem. GPT was kind of more focused on just the memory management aspect. And I think our view on like basically what should an agent framework do? Is that we want to basically solve this key engineering challenge of like what data do we feed to the LLM and how. And we basically call this context management. We think that this is basically the most important problem in terms of making capable agents and basically what the agent framework should be responsible for, basically compiling your context and interfacing with the LLM so that the developer can get the best performance that they possibly can out of their agents that are able to call tools and have persistent memory. Another aspect too of an agent framework is basically deployment. A lot of us have probably built agents just in Python scripts.

Sarah Wooders [00:01:53]: It's very simple or straightforward to understand how you might run an agent just in like your notebook or like in Python script. But basically I think when we the kind of vision that we all have of agents is really something that can run autonomously. So maybe it just runs 24, seven, it goes and does work on our behalf. But obviously the things that we have in our notebooks are not easily going to become these kinds of services. So the other aspect that we also work on is basically how do we put agents in production? So you know, how do we take agents that live in these examples, notebooks that are maybe just pretty thin wrappers around elements to actually building production grade services. So I think this has kind of created a new category as well, which is basically products for agent hosting and serving. So I think again, there's like many different players here, but basically the idea here here is how can we provide agents not just as something that lives in a notebook, but actually as an API service. So I'm going to quickly just kind of go over what LEDA is in the LEDA stack.

Sarah Wooders [00:02:53]: So Letta is an open source project and it's built on top of things like postgres and fastapi and basically build agents, but in a very service oriented way. So the idea here is if you have some kind of application, whether it's like a AI SDR companion or even some kind of workflow automation, your application interacts via rest APIs or one of our with our SDKs with some kind of LEDA service. And this letter service is actually what's responsible for managing things like memory, agent state, tool execution, and then any additional data sources and then using all the state. It basically interfaces with the LLMs to holistically kind of create these stateful agents. So in terms of what the stack looks like, we basically have a service that's basically an agent's API that you can interact with under the hood. We manage things like memory, extended chain of thought to basically make sure that we can squeeze the best performance possible out of LLMs. So we do things like basically prompting for reasoning and also automatically managing memory by adding recursive summaries, evicting messages when the message list gets too long, et cetera, to basically make sure that you avoid context overflow and context pollution. And then we're also model agnostic.

Sarah Wooders [00:04:06]: So even if you have an agent that already has a long message history or its own memories, you can actually just swap out the backend LLM provider because we store all the state and persist it in a model agnostic way. And so our actual product suite is basically this agents framework, which is open source. And so this handles things like the memory, reasoning, persistence and also integrates with over 7,000 doll and tools through our compose integration. And then we also have the Letta agent development environment. So we kind of think of this as basically an IDE for agent lifecycle management. So we allow developers to visually understand what's actually going into the context window of their agents and then also do live iteration on both their prompts and tools and also model configurations all within the same interface. So in terms of the kind of challenges that we're focused on, there's A couple things, but I think the most fundamental one is basically the idea that the intelligence of your agent is bounded by what you put into the LLM context window. So the LLM context window is in some ways the only input we actually really control with these models.

Sarah Wooders [00:05:11]: And so I think it's really effective to think of it as basically the program that we're passing into the LLM. It's basically everything that we have to control and contextualize things for the LLM. And it's actually not so straightforward as just stuff in whatever into the context window. If you put in too much data, the LLM performance can actually drop because there's actually a lot of work that that shows that the bigger context window, the more likely the LLM is to get confused about what to pay attention, to get distracted, or maybe just kind of overly generally summarize it without really kind of making fine grained observations about the details of the data. And then if you have too little data, then of course the LLM just doesn't have any relevant context, will be missing a lot of information and generally will be more likely to hallucinate just because if an LLM doesn't do something. And I think often at times reaction is to just kind of make something up, which I'm sure you've all seen. And so the question is like, you know, for a lot of applications we have a ton of information, like maybe for a recommendation engine we have product offerings, company policies, like user transaction histories, data about the users, There's a ton of data in general. And so the question is like, how do we actually connect our LLMs to all this information? How do we program our LLMs in the way that we want to when we have such limited context windows? And so because of these challenges, I think what we've seen is that a lot of Asians or just LLMs don't have long term memory.

Sarah Wooders [00:06:33]: There's like this finite context window and there hasn't been a lot of work in how do we kind of allow agents to have memory that goes beyond what's just in their context. And most systems kind of just limit the memory into like a single session that fits into the context window. So agents are often thought of as just like throwaway as opposed to things that are actually persistent and learn. And then also in general, if we have like a huge amount of existing data, there's not really a way to connect your agent to it. There is rag, but it's not really the same in the sense that it's like a Pretty, that's like a very weak retrieval mechanism as opposed to what you would imagine with a human being able to actually like read and develop an understanding of like large amounts of data. And so what we do at Lyda is we basically kind of think of ourselves as having a context management runtime, or we often, we sometimes refer to it as like the LMOS as well. So the idea here is we basically have model agnostic states. So things like the experience of the agent, the messages, the tools, the runtimes, and then we kind of manage the process of packing this into the context window in the most effective way.

Sarah Wooders [00:07:39]: So we actually sometimes do additional processing with an LLM to kind of compact down that data to make sure that we're only putting in the most important things in really concise ways in the context window. And then we basically manage interfacing with the LLM to pass this compacted context window to the LLM and then that LLM will do something like generate a tool call, which then we can translate into like an actual update of the state. So this is kind of like the loop that we imagine LEDA agents having. So I'm also going to go over some like use case stuff that we've seen just to kind of make this more concrete. So one, one company that we worked with was basically building an AR sdr. So basically building like a agent that can write very personalized emails to each client. And so what we did with Lyta was that we can basically have agent that can review all the interactions with the user or review any existing data about the user and basically use this to construct a very customized memory about the user to write highly engaging and personalized emails. In addition to this, because Letta is a general purpose agents framework, it can call a bunch of different tools.

Sarah Wooders [00:08:47]: So we worked with the Composio team to actually provide connectors to over 7,000 pre made tools across hundreds of different providers with authentication, all handled through Composio. So this makes it really easy to do things like like integrate with Google or Slack or Calendly. And then finally we also, because Letta is actually a service, it's like an API, it's very easy to actually integrate your end application with the Letta service. So you don't need to use Letta's ui. That's just for debugging. You can actually talk to the agents through the API service and you don't need to worry about turning your notebook agent into an actual service. Another aspect that we've been really interested in is basically how to kind of get agents to learn from data in a way that goes just beyond rag. So say you have like company data, it's like millions of tokens, there's no way it fits into the context window.

Sarah Wooders [00:09:36]: How can you actually kind of have your agents be educated about all this information? So it's basically we have a ton of like prior data, like maybe our product offering transactions, et cetera. And then we want the agent to basically somehow derive insights from this, like the user's preferences or historical trends. So another customer that we worked with was basically like a payments platform. So they were interested in basically improving the recommendations. And what they wanted to do was they wanted to improve their offer engagement by giving more personalized recommendations. And the challenge was they had all this very, very unstructured data like transaction histories, user data, marketing guidelines, et cetera. And they didn't really have a way to kind of put all that information or provide all that information to the LLM, because obviously LLMs have limited context windows. So what we ended up doing for them was to basically create an agent that can do offline processing of the data to kind of learn.

Sarah Wooders [00:10:35]: These profiles are like user preferences and historical trends over time. So basically the agent will iteratively look at the data and each time it looks at the data, basically update the things that it's learned about the user or any other kind of historical trends. And so this is actually also related to some academic work that we're doing which will hopefully be coming out in a month or two. And so once we kind of bootstrap this agent's memory with offline learning, the agent can actually then use these memories that it's formed to make real time recommendations. So if we ask the agent to basically make a recommendation for an offer because it's learned all this information from the existing data, it's able to make very precise, very tailored recommendations. And finally, I just want to go over what deploying Letta actually looks like. I think I went over something similar to this before, but we basically have a model where we every single agent is basically an agent as a service. So from the get go, your agent is always a service.

Sarah Wooders [00:11:36]: There's no process of actually deploying the agent. So we have a very granular API. We also have auto generated Python and TypeScript SDKs. So regardless of for both Python and TypeScript it should work very well. Basically this API not only lets you create messages and, and also, sorry, send messages and also create agents, but it also lets you actually like have granular control over all the agents memories. So you can do things like query the entire conversation history, the entire step history, and also like even modify the agent state through the API. So I would encourage people to check it [email protected] and then we also have the agent development environment which I mentioned before, which is unfortunately this is a little bit small, but you can also just download it as well. This is basically compatible both with the open source server and also our cloud server.

Sarah Wooders [00:12:29]: And it's like a one stop shop for building and monitoring your agents. Yeah, so in terms of Letta open source, this is where we have like our open source implementations of agent memory, personalization, reasoning, etc. It's all model agnostic white box and you can deploy these agency services through running the LEDA server. We also actually recently released Leda desktop so you can run agents fully locally. So if you're using like an open source model like R1 or like one of the Llama models, you can actually run agents with no Internet connection whatsoever. And yeah, this desktop has a version of the agent development environment built in. So it's a place where you can locally access your agents and test different prompts and models. And then of course we also have Lutta cloud.

Sarah Wooders [00:13:17]: So this is a place to manage really the entire development lifecycle. So you can build your agents into ad testings and then also actually version and deploy things in the cloud without any infrastructure. That's all I have for today. Thank you so much for listening. Please feel free to reach out to [email protected] and you can also follow us on our GitHub, Discord, Twitter and other socials.

Demetrios [00:13:43]: Excellent. Thank you Sarah.

Sarah Wooders [00:13:46]: Yeah, thank you.

Demetrios [00:13:48]: I've got one question you flew through that, which is thank you for doing that. I've got a big piece that I wasn't super clear on and if anybody else wants to drop any questions in or, or just come off the microphone, feel free. But for me, how does it keep memory? I feel like I, I blinked and I missed the part that you were talking about that on how the agents will keep that memory and like you fine tune the memory or they it like stays in the state. What was it exactly? That.

Sarah Wooders [00:14:25]: Yeah, so I think the way that we manage memory is very similar to what's described in the memgpt paper. So we basically will automatically persist all message histories and then we also give the agent the ability to read and write to an external vector store. And then we also allocate a section of the context window for basically in context memory. So the agent basically uses a combination of tools to write to both its in context and external memory. And yeah, we basically use the power of the LLM to kind of orchestrate memory over time.

Demetrios [00:14:56]: I see. Okay, I see another hand going up. Eric, you around here? You want to drop?

Eric B [00:15:03]: Yeah, hey. Yeah, thanks Sarah. This is a fantastic stuff you have here. The question I have is related a little bit to this notion of enterprise grade agents. So what we see with my clients is they're all, a lot of them are looking for things like discoverability, so some, some version of a registry or a marketplace. I suppose they're looking for things like a trust framework that lets let someone actually certify, for lack of a better word, their agents are doing what they need to do. And the other piece that's puzzling folks is the ability to integrate security. So every agent as they talk would use mutual TLS or every agent enforced through things like OAuth 2.

Eric B [00:15:48]: Is that something you guys are thinking about or. And again, I missed the first three minutes or so of that presentation. Maybe you already have it. And I'll pause there, let you respond. Thank you.

Sarah Wooders [00:15:58]: Yeah, so in terms of the OAUTH stuff for tool calling, I think composio is really great. This is like basically one of our partners that we work with. They'll basically allow you to kind of specify the authentication scopes of the tool that you're connecting to your agent. And in terms of security on our end and we basically run insecure tools all in a sandbox automatically to ensure that you don't run dangerous tools in your server. And for really security focused applications, for example, this customer that was dealing with transaction data, you can actually just deploy a letter server yourself. It's a Docker image. It's very compatible with standard infrastructure. You can run just a single image if you want to, you can run it on kubernetes as well.

Sarah Wooders [00:16:44]: So that's kind of like how we've been thinking about security so far.

Demetrios [00:16:47]: Nice.

Eric B [00:16:48]: Okay, great. Thank you.

Demetrios [00:16:51]: And Jason, I see you got a question you want to come off mute? Yeah, There he is.

Sarah Wooders [00:16:58]: So, apologies if this was already answered and I missed it, but we're talking about memories, right? How are the memories actually constructed? And is there any ability within Letta to influence that? Yeah, so the memory management, the memories are actually all written by the LLM itself. So the LLM has these tools called Core Memory Replace, Core Memory Append and the LLM decides what to store. So if you want to actually control how that's done, you could first of all, do it by prompting. But you could also override those tools. If you're like, I always want the memory C written in this certain structure or something like that, then you can always override the set of tools that Letta has with your custom tools to manage memory in different ways. Correct me here, to some degree you're asking the LLM like, hey, I've got all this stuff, come up with some memories. In short. Yeah, we're basically telling it remember to remember important things, remember stuff.

Sarah Wooders [00:18:00]: Yeah, got it. Yeah. Because I think the LLM is basically the most powerful tool that we have. I don't think there's any model better than the LLM. So why not use that to also memor manage memory?

Eric B [00:18:10]: Yeah.

Demetrios [00:18:10]: Okay, so that's a fascinating idea on. You're allowing the model to be the one that chooses what it remembers. Yeah, exactly. Have you found that it will remember things it doesn't necessarily need to or like it misses the point sometimes. Or is it generally. Does it work well with forming those memories?

Sarah Wooders [00:18:36]: I think it works pretty well, yeah. I mean the MEM GPT paper was written like a year ago and I think, yeah, it's kind of like stood the test of time in terms of just continuing to work really well. I think part of why it works so well is because LLMs kind of just speak text, right? So it sounds like a little jank, like just writing text as your memory. But that is in some ways like the most friendly memory representation for LLMs. We do have like some additional work we're doing in this space. So we are starting to, we're, we're kind of introducing a new way to manage memory where we actually have like a multi agent setup, like one agent that specialized in memory management, managing the memory of the other agent. So we've seen better results from that over the long term because that agent will actually have the ability to kind of, you know, go back and revise its entirety of its memory as opposed to the MEM GPT agent can only kind of make like, you know, incremental changes to its memory.

+ Read More

Sign in or Join the community

Watch More

Lessons From Building Replit Agent // James Austin // Agents in Production

Posted Nov 26, 2024 | Views 1.4K

# Replit Agent

# Repls

# Replit

Building Conversational AI Agents with Voice

Posted Mar 06, 2024 | Views 1.6K

# Conversational AI

# Voice

# Deepgram

AI Agents as Neuro-Symbolic Systems // Nirodya Pussadeniya // Agent Hour

Posted Jan 24, 2025 | Views 678

# Agents

# neuro

# symbolic

# neuro-symbolic systems