MLOps Community
+00:00 GMT
Sign in or Join the community to continue

The Facts Flywheel // Devin Stein // Agents in Production 2025

Posted Jul 28, 2025 | Views 66
# Agents in Production
# Facts Flywheel
# Dosu
Share

speaker

avatar
Devin Stein
Founder & CEO @ Dosu

Devin is the CEO and Founder of Dosu. Prior to Dosu, Devin was an early engineer and leader at various startups. Outside of work, he is an active open-source contributor and maintainer.

+ Read More

SUMMARY

Agent memory and organizational knowledge are actually one in the same. Organizations remember the whys, and how-tos by writing down learnings in a centralize place. For agents to remember, they need to do to the same. But they both suffer the same shortcoming - it's impossible to keep up-to-date. Can we solve both problems at once?

+ Read More

TRANSCRIPT

Devin Stein [00:00:00]: Well, hello everyone, I'm Devin Stein, founder of Dosu, and thank you to Demetrios and the ML Ops community for putting this on. Today I'm going to be talking about what we call the Facts Flywheel internally at Dosu. And it's kind of a preview of our new agent architecture that we've been rolling out the past few months based on our learnings of having Dosu in production over the past two years and really focused on trying to solve this pain point between the speed and performance trade offs you often have with agents. So just a real quick agenda, but I give some context on Dosu for those of you who aren't familiar with us. You know, walk through the challenge, our solution and sort of future work kind of where what is like the broader vision for this facts flywheel and how we're thinking about it at Dosu. So to start, you know, what is Dosu? So DOSU is an intelligent knowledge base for software teams and agents. Basically we make it really easy to generate, share and maintain knowledge on engineering teams.

Devin Stein [00:01:19]: And so this means rethinking documentation in the AI era, sort of like, you know, making documentation more of a byproduct of building and then keeping that up to date and then sharing it wherever it's needed, so helping answer questions or triage issues. And for those of you who have seen Dosu, you'll most likely know us from our activity within open Source. You know, Dosu is well known for being a GitHub app to help with open source maintenance, because open source maintenance, as many of you know, involves a lot of support and documentation. And so the interesting thing about Dosu that maybe isn't obvious, you know, at face value is Dosu's always been, you know, since day one, almost two years ago, an asynchronous agent, meaning when someone creates a GitHub issue or a ticket. In Jira, Dosu does an investigation looking at code commits, conversations and tickets, like an engineer would. The response times can take two to 10 minutes, depending on the complexity of the task. Within GitHub issues, this was actually considered really, really fast. And the typical response time within GitHub is usually around three days.

Devin Stein [00:02:47]: So getting a response from an agent within a few minutes is actually amazing and very fast. And it's always been the reason we focus on this async modality was we were, by the nature of the product, we are answering questions that are typically not always documented. So they're pretty complicated questions. And so we always wanted to focus on quality over Speed. And so having the time to do research through all the code and the conversations around it allowed us to come up with very high quality responses. And it worked really, really well within GitHub. And as the products evolved, people wanted DOSU in more places, they wanted to chat with it outside of GitHub, in our app and in Slack and Discord and other kind of more synchronous modalities. So you know, people want to ask questions in Slack and get responses immediately.

Devin Stein [00:03:44]: And in Slack you know, the expectations for response times is much, much faster. And so we had this problem of we wanted to preserve the quality of the DOSU agent and kind of the overall architecture, but we want to make it faster. And there's this inherent tension and trade off between speed and performance within agents on one hand, and this is a bit of an oversimplification, but you have, you know, your typical RAG pipelines, they are pretty fast, they're relatively cheap. Basically just the amount of context that you shove in the window, the performance is pretty much correlated with retrieval results. Like if you can find the information when you search for it, you know, you'll have pretty good performance. But because it's dependent on retrieval results, it gets harder as you know, the data sources size increases beyond the context window or kind of the different types of data sources get involved. And it's also worse for multi step queries. And a lot of the questions that we get are more multi step type of problems which where you need to figure something out and then ask follow up questions.

Devin Stein [00:05:01]: But RAG is like, you know, by far the most common for your typical like AI chatbots. And then on the other hand it's more of the agent modality which DOSU original design was, it's slower, you know, relatively expensive because you're churning through more tokens. And then performance is determined by tool quality. So you know, does your agent have the right types of tools to find the information and iterate. But it's really scalable, flexible and kind of adept at complex queries and is really good for background tasks or asynchronous workflows like what we were describing before where you know, ticket comes in and do so generates a response. So the problem we faced was like how do we support both modalities without having two completely different agents? And so you know, first we were like, okay, well how do other AI products solve it? And in general the most common way people solve this problem is through model selectors. So you know, like saying hey, I want to chat versus do something like deep Search or you know, deeper search in the case of Grok or you know, ChatGPT's infamous model selection menu where it has, I mean, a comical number of models for you to choose on based off the type of task you want. And Perplexity does something similar.

Devin Stein [00:06:26]: Basically there's search and then there's research. And at Dosu we didn't want a model selector, especially because the way people interact with those who is often in channels where it's like, you know, we don't control the UI and so you can't really select the model. And from a sort of first principles perspectives, we've always strived what we call like a teammate ux. And when you talk to teammates, you don't ask them if they should answer fast or slow. You expect them to figure it out. And so we want to do something similar. So we're like, okay, well how else can we solve this problem? Well first, kind of along the lines of the teammate analogy, we're like, well, how do humans do this? And if you take the example of like an engineer onboarding day zero engineering hires are typically pretty slow. You know, you give them a task, they have no internal knowledge, they're figuring everything out from scratch, which is very much like how agents work today.

Devin Stein [00:07:32]: They go do the problem, they look through the code base conversations, try to figure out what's going on to complete the task or answer a question. And then at some point they have a deliverable, maybe it's an answer, maybe it's a pr. And they get feedback on the quality of their work. And based off feedback, they remember things for the future, for the next task, if it's in a similar domain, they know where to go in the code base for the next time. And if it's incorrect, they revise their understanding and try again. Then over time, an engineer on boards and after a certain number of days they become really fast, much, much faster than they were day zero. You know, they're asked to solve a problem or a question and they immediately, you know, almost implicitly, you know, people don't really do this consciously. They take stock of what you know, they already know about the domain.

Devin Stein [00:08:25]: You know, where does this code live in the code base? Have I seen recent pull requests related to this topic? What do I know generally about these topics? The user's asking about what conversations have I seen related to this that might be useful? And if the answer is top of mind, they'll just respond, you know what the answer is. So there's no point in doing research. You Just respond. Or if you don't know or you're unsure, you can do research to fill in the gaps. But crucially, you're not starting from scratch. You're just filling in the gaps between what you know today and what you need to know in order to answer the question or solve the problem. So looking at humans as sort of analogy, we're like, can we do the same thing on the where you know, it has no internal knowledge beyond like the model context. But as it does work to answer questions similar to a person, it's like learning things.

Devin Stein [00:09:35]: It's observing things about your code base, about your slack conversations or PRs that relate to the problem at hand. That context is really helpful for framing a piece of code. Piece of code without any context, it's hard to really know what that means. But in the context of a problem, you're able to understand maybe the bigger picture of what that code snippet might do. And then on positive feedback so the agent gets the answer correct or is able to generate a PR successfully. Can we persist these learnings that were built up along the, you know, agent run and actually save them to knowledge? Or on the flip side, you know, if you know user says you're completely wrong. Can we prune these kind of facts that we thought we learned and then try again and kind of repeat this process? And so, you know, the goal being that day an agent is actually fast where the same way a human onboards can this agent onboard by learning these facts over time where next time it gets an input, we look at the set of facts that we already know about this topic and if the answer exists in that set of facts, we can just do rag and return immediately. If not what is the gap? What is the gap in our understanding given our current knowledge and set of facts and what we need to know in order to answer the user's question and and then do research to figure out the missing knowledge and then repeat the process from like the day zero agent where if we're able to answer the question or problem correctly, we can then persist those new learned facts to knowledge and grow and be smarter and faster for the next question.

Devin Stein [00:11:38]: And so internally we call this fact based reasoning. And it in theory gives you the best of both worlds. You know, over time the agent's knowledge base grows like a human, it gets smarter over time and grows with usage. So kind of this kind of very virtuous flywheel effect from a user's perspective where the more you use the product, the smarter it gets, the better, better it gets the faster it gets. And it's faster because you're not repeating duplicate work. Right. I mean, how many times does like Claude code list the same directory, you know, as like the first step of its process? You know, it's likely that that directory structure hasn't changed. So does it need to run that tool again? And you know, therefore we're basically caching information learned from tools between requests and because or doing less work, it's also more efficient, more cost effective.

Devin Stein [00:12:35]: And you only need to research like the incremental knowledge. And so, you know, we have a similar thing to humans where you ask a person a question and the answer is fast when it's known and then it's slow for harder requests or where there are unknowns that need to be solved. And so it's a system that is both Prague and agentic and kind of combines the best of both. So it's amazing, right? Yes, but it's, you know, actually very complicated, unfortunately. So for those maybe paying close attention, you might have seen some parallels to caching in the way that, you know, we are treating facts. And caching is a notoriously hard problem in engineering. There's some saying about majority of bugs related to cache invalidation. I think more broadly a way to think about cache invalidation is really about maintaining knowledge or knowledge management Organizations are not static systems.

Devin Stein [00:13:51]: They're constantly changing. And so these facts that we're learning also need to be evolving with your organization. They need to be pruned, they need to be reconciled, they need to be updated. And you know, that means removing facts that are no longer true, reconciling facts that maybe contradict each other, and consolidating facts across different sources. And this process, you know, is complicated, but fortunately is this is happens to be a problem that we are already solving within our product. So, you know, we're focused on helping, you know, teams maintain knowledge in their knowledge base, typically in documentation format. So why shouldn't we extend this concept to facts as well, where we can treat the agents learn facts as just another form of documentation that needs to be kept up to date. It's information with a source and we can monitor, pull requests for changes that conflict with information or knowledge base.

Devin Stein [00:15:03]: So looking at a PR or looking at a conversation, we say, what is the impact of this change? Does it require me to update or prune knowledge? And if there's contradictions, can we surface that to users to figure out what is truth? And what's really cool about this is that it sort of creates the foundation for a living knowledge base and also reframes agents not as just like a means of generating a response, but also as like a means of generating knowledge or documentation for you. Right. Like the same way a engineer is going to document its learnings when it does, you know, an investigation of a hard problem. Now you kind of get this implicitly out of the box from the agent as well. So we're really excited about this direction for Dosu and for agents generally. And one kind of closing thought that, you know, maybe is in people's minds is something we think about is like, okay, what? Facts have a lot of similarities to memories, right? People talk about agent memory in agents. How do we have agents that like remember things so they aren't starting from scratch every time? And so are facts memories, Are they different? From our perspective, they're very similar, but I think different. And maybe, you know, there are teams that have better ontologies of, you know, different types of memory then you could argue it's a type of memory.

Devin Stein [00:16:40]: But really I think what's unique about facts is that they are learnings internal to the agent during kind of the research process and are not necessarily directly like visible to the user. It's like this intermediate representation of how it's figured, coming to the conclusion. And users usually just see, you know, the document or the response from the agent versus memories. The way we think about them are typically learnings from external interactions or explicit feedback. So you know, learning from a conversation that hey, you know, this user likes to be, you know, likes to have their responses formatted this way or they really like concise things or someone corrects DOSU to say, hey, actually when you're solving this type of problem, you should look in this part of the code base or these parts of the our documentation or talk to these users are more akin to like the memories that we typically see. So it comes from external sources versus facts are very much internal as part of the response generation process. Cool. I know that's a lot of information, so would love to open this up to questions, thoughts, comments.

Demetrios [00:17:58]: Yeah, I've got one right off the bat and then I see somebody else chiming in in the chat. First one from me is how are you making sure that like when an agent doesn't know something, it doesn't just go on infinitely looking for the answer?

Devin Stein [00:18:18]: Yeah, great, great question. I mean, I think this is an interesting general problem with agents and also from like a product perspective as well, where the worst case for agents is also the most expensive case, where if there you Ask a question where there isn't an answer or the agent isn't capable of finding the answer, it's gonna, you know, keep spinning its wheels. We internally address in a few different ways and like we're actually constantly iterating on new approaches. We basically have a hard coded limit in terms of the number of loops we're willing to tolerate for certain types of queries. And then after each sort of loop within the agent, we do a reflection step, almost like a critique of like, hey, how are we progressing on our trajectory? Like are we making progress? Yes or no. And if we're not making progress for a certain number of steps, then we also early exit. But we're also kind of doing research into trying to have a better like failure mode categorization for agents or our agents specifically and then see if we can have like an early detection system for those failure modes. So that's very much work in progress.

Demetrios [00:19:38]: So if I understood that correctly, it's like you're hitting a place where you're saying these. It feels like you've gone on too long. Let's sanity check it with another LLM call and say, is something wrong?

Devin Stein [00:19:54]: Exactly, exactly. And you know, saying like right now kind of we do it in the generic form which is like, is something wrong? Are we making progress? Does it look like we're stuck? And what we're kind of moving towards is almost more of a classification task of like, hey, we know that these are the types of failure modes that we see. Are any of these happening? So is the agent using a tool incorrectly or is it kind of using the tool with very different variations of the same query but not getting any new information? But it's like persisting to do that. So trying to get a lot more specific to improve the quality of that task.

Demetrios [00:20:36]: I like that a lot. And I also think there's, it's a fascinating problem that you're working on because you can really think deeply about when to talk to the user and when to try and just like shoot out and then come back to the conversation almost. So I, I think it's super cool to hear about the research. I want to like have you back on here in a year and hear what you learn, you know, because you're like in the middle of it right now. So there's a question that came through in the chat though. Where do the learned facts persist?

Devin Stein [00:21:14]: Great question. I mean essentially in our database. So you know, whether it's in our kind of. We use like both have it stored in our actual like Application level database as well as our vector store. So it becomes, in and of itself kind of a data source is one way to think about it.

Demetrios [00:21:35]: Oh, fascinating. And have you seen folks that interact with your agent, with their agents almost? It's like.

Devin Stein [00:21:48]: Well, I do think there's something interesting there. I mean, it's a bit of an aside, but like, because we're, you know, deployed within, you know, thousands of open source projects, a lot of open source projects and companies like, they all depend on each other. And so there is like a future which hopefully we'll get to where DOSU deployed on Airflow can talk to DOSU deployed on, I don't know, Apache Superset, for example. If there's a project that uses both and kind of like using the knowledge base from different projects.

Demetrios [00:22:22]: Oh, dude, imagine how many headaches you could save people. That, that would be cool. All right, I, I like that vision. Last thing that I wanted to show you was from Ricardo, which is an awesome little meme that he created for us. And here you go. Wait, let me get this survey out of the way so we can really see it. But I want to say Ricardo, Bravo on this. Ah, let's full screen it.

Demetrios [00:22:56]: That's been our day today. That's what we have been up to. Who owns the data store? The client or Dosu?

Devin Stein [00:23:06]: It's a good question. So ownership, I guess, is a complicated question in the sense, like it lives in Dosu, but you know, we like, you know, like a security and privacy perspective, it's like isolated by tenants and we don't train on customer data. So but like data resides within Dosu but from like a product perspective or angle. You know, DOSU is sort of your knowledge base as well. So these facts, the idea is it's part of your documentation, so, you know, like, it's just another knowledge base for you.

Demetrios [00:23:42]: Yeah, there's a, there's a fuzzy line there because it, it's, it's like, it's yours. But yeah, I could see how, whatever these are fine. These are details that you can hash out later in a way, so. All right, dude, I gotta keep it rocking and rolling. We've got more talks to get to. Devin, it was an absolute blast to chat with you and also, of course, to learn from you. As always. Thanks, dude.

Demetrios [00:24:17]: I absolutely am a fan of what you're doing at Dosu and I appreciate you coming on here and chatting.

Devin Stein [00:24:22]: Yeah, I mean, likewise. Love what you're doing with the MLOps community and thank you so much for having me.

+ Read More
Sign in or Join the community
MLOps Community
Create an account
Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
Comments (0)
Popular
avatar


Watch More

Generative AI Agents in Production: Best Practices and Lessons Learned // Patrick Marlow // Agents in Production
Posted Nov 15, 2024 | Views 6.3K
# Generative AI Agents
# Vertex Applied AI
# Agents in Production
Building Reliable Agents // Eno Reyes // Agents in Production
Posted Nov 20, 2024 | Views 1.6K
# Agentic
# AI Systems
# Factory
Planning is the New Search // Fabian Jakobi // Agents in Production
Posted Nov 26, 2024 | Views 1.6K
# streamline workflows
# Agentic
# memoryrank