Knowledge is Eventually Consistent
speakers

Devin is the CEO and Founder of Dosu. Prior to Dosu, Devin was an early engineer and leader at various startups. Outside of work, he is an active open-source contributor and maintainer.

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
SUMMARY
AI as a partner in building richer, more accessible written knowledge—so communities and teams can thrive, endure, and expand their reach.
TRANSCRIPT
Devin Stein [00:00:00]: So code is like a really, really good system of record. And maybe you can extend the analogy to other types of system records internally where they are truth and you can monitor changes on those and reflect those back into the knowledge base. But unless you have a digital system of record, it's really hard to reconcile things. So there's still going to be meeting notes and other types of documents that might be floating around, but certain types of knowledge, like how to our product actually work. Like, I think there should always be a very clear answer. Devin Stein, CEO and founder of Dosu and I drink my coffee black, generally pour overs in the morning, but we have an espresso machine at the office, so also drink quite a bit of espresso.
Demetrios Brinkmann [00:00:53]: Let's talk about the facts agent. Can you break it down for me? What is it exactly before we go into like the details?
Devin Stein [00:01:02]: Yeah, I mean basically it's our. So before we get into the details, you know, like, I guess Dosu is a product. We got our start helping out with open source maintenance. Basically kind of the premise of Dosu was as an engineer, I actually don't spend that much time coding, especially as I grew more senior in my career and as an open source maintainer, a lot of my time was spent answering questions and triaging issues. So I started Dosu to focus on like, hey, can we answer questions and triage issues like engineers can by looking at code commits, conversations and tickets, like everything kind of around the code base and the product kind of. Engineers are unique in organizations because code is truth. Like it really tells you how your product actually works. And so whenever there's ambiguity, you need to escalate something to an engineer.
Devin Stein [00:01:52]: And so we built Dosu kind of our initial agent to answer questions on triage issues, whether they're like incoming in Slack or on GitHub issues in the case of open source maintenance. And DOSH has been very popular within open source, helping answer questions and triage incoming GitHub issues. And we just launched kind of our second iteration of our agent we're calling our fact based reasoning agent. And the premise is exactly this idea that like hey, users ask very related questions over and over again. And Dosu, you know, does an investigation every time kind of starting from scratch like most agents do, like, hey, okay, we're going to now search the code base, we're going to look at recent PRs, we're going to scroll through Slack, but a lot of that work is kind of duplicated across different agent runs. And so this new design has it. So as Dosu is doing research in a given run. It's learning what we call facts, which are like claims supported by evidence it found, and it uses those facts when generating its response.
Devin Stein [00:03:02]: Now, and if the response is correct, which we get from, you know, either direct user feedback or from like a maintainer expert that jumps into the thread, then those facts are committed to its knowledge base. And then next time someone asks a similar question about related topics, Dosu first takes stock of like, hey, what do I know about these topics in terms of the facts in my knowledge base? Do I have enough information to respond? If not, what am I missing? And then it'll do additional research to find that information and respond. And it's kind of this nice sort of learning loop where the more you use the product, the more facts it learns, the better it gets and the faster it gets.
Demetrios Brinkmann [00:03:47]: Okay, so the facts flywheel is fascinating to me. And also the whole idea of how and when you get the agent to jump in is probably the most fascinating because how does it get solidified that a decision has been made? How does an agent know that a decision has been made and it's not just continuing a conversation or it needs to be picked up again when XYZ happens? Like, there's so many X factors and intangibles that feels like it's a very hard problem for an agent to solve or maybe not.
Devin Stein [00:04:26]: Yes, in the general case, I think it is very hard to have that human intuition of like, when I should jump into this, like, what is my expertise needed? I think in the domain we work in, it's usually who is asking the question, like the audience actually matters a lot. If we have it's, you know, the kind of ticket or issue is coming from a maintainer, they probably know generally about the problem. And maybe like a good response would then be like pointers and like where in the code base or recent works, or no response at all, because they probably have it under control. Versus if a user is asking a question and they are maybe non technical or less technical or new to the project, then usually help is like, you know, any information to help them go from where they are to kind of closer to where they want to be in terms of resolving or answering their question is welcome. So I think audience plays like a pretty important role when thinking about like, does an agent want to jump into this conversation?
Demetrios Brinkmann [00:05:30]: So, so do you have to have user profiles built up that the agent is aware of?
Devin Stein [00:05:37]: Yes, we have kind of a pretty simple structure right now where we have basically experts who are users in the app, the curators of knowledge and then normal users who are less familiar with the domain in the future. I think there's a lot of cool things we could do around different types of audiences. Whether you're following fully non technical and you can only understand things in product terms versus you're an engineer that is just unfamiliar with this code base. But you can read and write code.
Demetrios Brinkmann [00:06:09]: Yes. And where does it triage items from? Do you plug into Jira? Do you plug into confluent notion like ClickUp? Is it all of that? Because I know documentation and I was expressing this before we hit record. It can be such a pain and a lot of times it's a pain because it's just so dispersed.
Devin Stein [00:06:35]: Yes. I think that's one of the really interesting things and sort of about LLMs is that and humans. Right. If you think about human memory, something engineers like our senior engineer at a company or an open source maintainer who's been on a project for a while, something that they do that is, you know, really, really unique, is they're able to see the activity across all these different apps, see what's going on in Slack, maybe in ClickUp and Jira, across PRs, what has been merged, what has been like, you know, what issues have come up and then actually connect the dots between these different data sources, kind of see the connections in terms of topics, how they relate together. And we try to take a similar approach at Dosu around basically trying to figure out what is the product or engineering ontology for your organization. So what are the key concepts, topics, product features, components on the engineering side? And then how can we relate the conversations that are happening across these disparate apps, the documentation that lives across these disparate apps, back to those core concepts and start building the connections between them. It makes it easier for us to then, you know, when we're talking about a specific part of the product where that information might live across everything.
Demetrios Brinkmann [00:07:55]: This episode is brought to you by MLFlow, the open source platform trusted by teams worldwide to manage the entire ML and gen AI lifecycle. And with free managed MLFlow on Databricks you get all the benefits of MLFlow plus automated infrastructure, unified experiment tracking, model versioning, observability and enterprise grade governance all in one place. Ship reliable models to production faster with less hassle. Get started at mlflow.org or check out the recent talk that we had from Eric Peter at our SF Agent Builder Summit. mlflow.org links in the description we had.
Demetrios Brinkmann [00:08:42]: On here a few months ago. Donay who created a data analyst agent. And one of the hardest challenges that she talked about was how words don't mean the same thing always. Words are hard, basically was the long and short of it. And she used it in the context of if you're a data analyst, you have all this jargon that you say and that you want. You ask questions to an agent about, but those questions and those words, all that jargon, you have to explain to an agent what that means because. And I always go back to the most simple answer that I can wrap my head around, which is an MQL at one company. A marketing qualified lead at one company can mean one thing at this company, but at the other company, you have to do five different things to become an mql.
Demetrios Brinkmann [00:09:44]: So even though we're using the same word, it is not the same. When the data analysts are asking questions of the LLM and what they did to fix that is they essentially had to create a glossary of terms so that when a data analyst would use this jargon, the LLM could reference that and it could say, okay, cool, I understand what an MQL is at this company. How have you dealt with that? Because I can only imagine in your world it is multiplied times a hundred.
Devin Stein [00:10:16]: Yeah, I mean, we have a very similar approach where for topics they have aliases or synonyms that, you know, also referred to as. It's kind of like the relationship. So, you know, we have a lot of examples of this. I think every company does, you know, where we call something one name in the front end, but then in the back end actually has a historic name that's completely different. And so we just go back and forth internally using either the front end term or the back end term or another thing is like, you know, the word task. We have like three or four different terms for task in our backend code base. Could be a celery task, it could be an LLM task, it could be this background task. And you know, how should we differentiate for given the same term, you know, what other.
Devin Stein [00:11:08]: What. What is the meaning of this term in this context is also like a related and hard problem. Another thing that's kind of fun and relates back to like auto reply or jumping into conversations is conversation implicature. What is the implied meaning of a comment in Slack, especially when someone will post in Slack, how do we do this thing? And what is we in this case? Is it the team they're on? Is it the engineering team of the channel? They're. There's a lot of implied meanings of the way people converse in shared channels, you kind of know the domain of discourse, like what people are generally talking about and having the LLM do the same, sort of figuring out what is the implied meaning, what is this person actually looking for is fun and challenging.
Demetrios Brinkmann [00:12:06]: It's so fuzzy, isn't it? It's just like yeah, we. Who you calling we?
Devin Stein [00:12:11]: Yeah, right. What do you mean by that?
Demetrios Brinkmann [00:12:14]: And how do you think about adding value but not being too noisy? Because an LLM is always going to say that it knows something and it's probably always going to want to jump into a conversation. But that actually adds cognitive load. At the end of the day, if I have to read a one page report for a simple question that I have and it really isn't getting to the essence of the question, I'm going to be pretty pissed at that tool.
Devin Stein [00:12:44]: Yep, I think it's one of the hardest problems generally in the agent space. So I think there's two pieces with that. One is conciseness is generally very hard with LLMs partially the way I think they're both in pre training and kind of fine tuned they tend to be more verbose than what a human counterpart would actually say. And so we've invested a lot in reducing verbosity like keeping responses concise because you're right. It's one thing to be wrong and have a two sentence answer, but to be wrong with a two paragraph or page answer, then it takes a lot of time and cognitive load to figure out hey, the LLM's not correct. So I think the way you deliver information, we've thought a lot about it. I think that we still have a long way to go and in order to kind of be concise, point back to reference sources as well can help with that. So instead of restating what exists in the kind of reference source or fact kind of link back to it can help shorten things.
Devin Stein [00:13:49]: The other side of it is when do LLMs know? If they have do they know the answer? Are they confident enough to respond? Also like an open research problem, I actually just saw a really interesting paper. I can't remember the name. It was something to do with abundance was in the title that came out of Fair at Meta yesterday where they actually found that reasoning agents are worse at knowing that they don't know the answer. So they're because you know, it's almost this overthinking problem where they will even in their chain of thought they might say they don't know but by the end of the time they finish thinking they're like, I got this.
Demetrios Brinkmann [00:14:28]: They talk themselves into it.
Devin Stein [00:14:30]: Yeah, exactly. It's like, yeah, coaching them, kind of like, okay, I can respond to this. And so there's ways in which I think given a specific domain, you can help figure out better what is your confidence level? One is again, audience. So who is asking the question, I think helps dictate quite a bit on what is an appropriate response and if they want help. And then the other side is kind of going back to the fact based agent is can you ensure that all these statements you make are coming from facts and those facts have been confirmed before either by a human expert or kind of from previous conversations. So the quality of your knowledge, I think also dictates the quality of your responses.
Demetrios Brinkmann [00:15:25]: Let's put a pin in the quality of your knowledge because I want to dive into how you think that can improve over time. The last thing while we're on this topic of these, these fact based agents is do I need to give explicit consent or approval when a fact is, is immortalized so that it can go and then be rerouted and put into a knowledge base? Or is that just happening in the background and it's constantly updating if there's been enough insinuating at it? Because that feels like something that could easily go off the wheels too, right? Like you just start going on this circular kind of false, fake news type of vicious cycle and next thing you know you're like, wait, how did that become something that quote, unquote, we do here?
Devin Stein [00:16:28]: Yes, this is something we think a lot about and actually have. I think it relates generally to like UI or I guess UX for agents in that when we started kind of rolling this out, we actually said, hey, we're going to just generate facts based on, based off all the data we ingest. So no human loop, fully automated. And where we ended up is we would generate a lot of knowledge. I think the knowledge is correct, but you don't know. And so it is this sort of scary middle ground of like maybe you're kind of propagating some false claim and that's actually cascading failure. And so where we've ended up, and I think from a product philosophy just makes sense generally for agents is there's sort of like three stages to automation. First is, you know, human in the loop.
Devin Stein [00:17:22]: So you know, like in the open source case, an open source maintainer says, you know, like great response either by Dosu or they added a response and they're like, I want to make sure that Dosu knows about this topic and they can take an explicit action to save it to their knowledge base. And so there's like, you know, the human is. They then see like a preview of what gets saved and they can edit it. So there's like, you know, very high quality knowledge that the human, you know, took the initiative to say, save this to my knowledge base and reviewed it.
Demetrios Brinkmann [00:17:54]: It feels like that is you're asking a lot of someone to do that.
Devin Stein [00:17:59]: Yes and no. I think what's unique somewhat about the domain we're in is that usually Dosu operates in like public forums. So whether that's like an internal Slack or, you know, open source project where there's a lot of people asking questions or looking for help, and then there are a few experts and they're, you know, even if Dosu doesn't answer it, like someone will generally, you know, it's like someone's going to respond to you in Slack, hopefully, and if you don't, you'll keep pinging them. And so we actually do usually get a resolution on all the threads that we're on, whether it's Dosu, you know, driven resolution or a human driven resolution. And so it's like a lot less work to say, you know, a command to save this to the knowledge base than it is to like go and update your documentation, which is like kind of the alternative. And there's also an incentive to do so because, you know, if you do this, then next time Dosu is going to get that and you won't have to answer that same question. So I think we're fortunate in like, our domain that our users, our experts, are very incentivized to make the product better. And so that's the human in the loop modality.
Devin Stein [00:19:05]: Then the next one is more, I would say, maybe AI driven. It's kind of in the crawl, walk, run. We're at the walk stage where Dosu is reviewing threads and whether it's a PR or a conversation and it is saying, hey, you said something that I use, either one conflicts with what is in my knowledge base or I don't have in my knowledge base, but seems important and then can actually reach out to that person either in the thread or in like a direct message or kind of in the app and say, you know, should I save this to my knowledge base based off what you suggested? So it's kind of a little bit of a proactive where, you know, we're not saving it directly, but we're doing the work to kind of get everything ready in like A preview draft state and then you just have to approve it.
Demetrios Brinkmann [00:20:00]: So it's like a sleeper agent that's there in the background recognizing and how does it know when it seems important? Is that just like my use of like bad words and caps locks?
Devin Stein [00:20:15]: Yeah, it's similar to, you know, a lot of it has been tuning. I think we're still iterating again, we're kind of fortunate in that the domain that is discussed is very, it is well defined, usually what you want to save to a knowledge base or to your right to your documentation. It's like, hey, there was some investigation where this seemed like a lot of work, a lot of back and forth or a question that, you know, was very relevant to the domain of like, how do I get started? How do I troubleshoot? So there's like a class of stuff that we're looking for that is likely what you want to put back into the knowledge base.
Demetrios Brinkmann [00:20:51]: Yeah, nice. And so then what's.
Devin Stein [00:20:55]: Yeah, you know, the fully automated is kind of where we originally thought we wanted to start, which is we get so confident and like we get enough approvals on all these drafts that we can within, you know, start generating knowledge where we're confident automatically. You don't have to be in the loop. And then maybe if we're not confident, then that gets moved to a draft state for you to review. But generally it's like hands off. Dosu is extracting knowledge for you generating previews when it needs your help. And then important then is actually maintaining that knowledge, which is the second piece of the puzzle.
Demetrios Brinkmann [00:21:35]: So because of the rise of all the coding agents, how have you seen things change?
Devin Stein [00:21:45]: Things are changing in a few different ways. So I think one is the importance. Ironically, I think the importance of written knowledge is only increasing because unlike humans, AIs at least currently do not have good memories. And so, um, you know, it doesn't like, you know, there's the intern analogies made often with like, oh, you know, agents are like having a thousand interns or a hundred interns, but interns hopefully will get better over time. Agents are, you know, very, very smart, but they are forgetful. You know, they typically like, don't learn from doing in the same way. And so what we've seen from kind of our users and talking to customers is that agents do much, much better when they have written knowledge to reference of like, you know, I'm working on billing. Like, where is billing? How does it work in this code base? Or like, you know, how should I think about billing conceptually? For this product.
Devin Stein [00:22:46]: And so they, you know, agents need more documentation and also the format in which they consume them is different. You know, agents, like, unlike people, can like read giant blobs of text and that's good docs for them. You know, it's like very information dense. Yeah. It doesn't have to be aesthetically pleasing. And also I think similarly, like connecting, like the best docs for agents connect product concepts back to the code base. So it's not just talking about like billing in the abstract, but it has references back to like where it's implemented in what directories, which is, you know, important for people, but I think even more important for agents because they're often like operating on the code base and they want to. They don't want to learn about a concept and then have to figure out where it lives in the code base.
Devin Stein [00:23:36]: They just want to know where to go in the code base immediately. So that's one piece, I would say. The other side is that coding agents excel in smaller projects, the vibe coding examples, but at larger kind of add scale. It can be very scary to use a coding agent because if you are an engineer and you're asked to make a change and you don't know how to make the change, and then you ask an agent to do it, it's really playing with fire. Yeah, you're playing with fire. And so kind of a prerequisite to making changes is generally understanding the system, like, what is the impact of your changes? And so that also is kind of where documentation or just knowledge becomes more important. But for humans, just like understand, like, what is the impact of this change? Where should it change? So then you can actually like review and be a sort of co pilot for the coding agent.
Demetrios Brinkmann [00:24:40]: So incredible about the ways that they consume information because that lines up with everything. And it's funny how making the link so giving them almost like grounding them in code gives them a better chance of success and then throwing as much information at them as possible really lines up with, yeah, like give all the context you can and they can suss out what is actually important for them. Now this feels like a perfect segue into the quality of knowledge and how you're thinking about making knowledge higher quality in general. Because as I was mentioning before we hit record again was I love documentation. I'm a huge fan of trying to write. I think a writing helps you clarify your thoughts. It helps you get down the most important stuff that you want to then take forward. But what I've noticed is you have to constantly be updating docs as things change.
Demetrios Brinkmann [00:25:48]: And you also have to recognize that a lot of things are just going to not be relevant after a certain amount of time. And you kind of need to know which ones are relevant in those moments of time. So how does that. Knowing that, like, if I look through my notion, which is where I keep all of my documentation for the community and all of that stuff find, I look through that. I'm not going to say like 80% is obsolete, but it's not. It's definitely a majority is obsolete. And that's because, like, every podcast that we've had for the last five years, you know, I have a notion page on them. I'm not using those anymore, but they're there and maybe they can be referenced, maybe it can be something.
Demetrios Brinkmann [00:26:38]: Or there's like, strategy documents that I've written up in 2022, and those are not relevant or like reflection documents. All of this stuff feels like it would muddy up the waters on documentation. And if you want the highest quality documentation for your business, how do you think about, like, keeping it high quality?
Devin Stein [00:27:01]: So I think you're not alone in that, you know, 80% of your notion is stale. I think that's probably the norm. You know, what's the saying is the instant you write docs, they're stale. And I think that's really true. I think the way we think about it, and I think generally is a good framing is you want to lean on humans for what they're good at and an AI for what they're good at. And humans are really good about knowing what is actually important. So I think writing is still a very useful and important exercise of you, usually as an expert on some topic or having just done research, know what are the things that someone in the future need to know and sort of like maybe the story that you want to tell around it. And, you know, AI can help you, right? But at the end of the day, like, you know, the nuggets of, you know, expertise or like, knowledge that you as the expert, like, put into it, like, that's what's really, really important to.
Devin Stein [00:27:59]: To have the human in the loop. And so we focus on, like, what is. Like, how do we make tools to make it really easy for you have something that you think is important, how do you get it down in a written format, you know, saved to a knowledge base or to your documentation as easily as possible? Because I think, you know, at least from my experience and I know from a lot of other engineers, it's just there's a lot of things I would like to document, but I just don't have time. And so I think part of the puzzle is really being making it easy, like 10x100x easier to get information out of experts. And then the second part of the story is like, okay, how do we maintain it and how do we know what is truth? The way we're approaching it. And I think it's unique to our product and I think it's at least currently it's easiest in the product engineering domain because you have code which is a source of truth that is very clear how it changes or when it changes really. And so the way we think, I think there's like two types of, I guess documentation or knowledge. Some of it is more notes, it's temporal in nature.
Devin Stein [00:29:06]: It's maybe a meeting note. You don't want to update a meeting note because it's a record of what happened. But there's pieces of that that maybe you do want reflected as canonical as source of truth. And so we're very focused on the knowledge that you want to be source of truth. How do we keep that up to date? And the nice thing is that code is another source of truth that's very clear what it is. And so as code changes we can compare those to the state of your knowledge base and try detect inconsistencies. So you have this in your source of truth, but your code is now saying this, or used to say this, but now says that you should probably update these sets of documents or these facts. So I think making the distinction between what is information you want to last versus something that is kind of a record of a point in time I think is very important.
Demetrios Brinkmann [00:30:05]: It's so fascinating to try and think through that.
Demetrios Brinkmann [00:30:10]: Let's take a second. Just talk about our sponsors of today's episode. Hyperbolic's check GPU Cloud delivers Nvidia H1 hundreds at just $1.49 per hour. No sales calls, commitments or surprise fees. Spin up one GPU or scale to thousands in minutes with virtual machines and bare metal multi node clusters. Featuring high speed networking, attachable storage and auto top ups. You only pay for compute when you need it. Hyperbolic costs up to 75% less than legacy providers.
Demetrios Brinkmann [00:30:45]: You need longer runs. Well get reserve clusters at predictable rates with instant quotes and fast onboarding. Try Hyperbolic's H100 power on demand. Try it now at app.hyperbolic.ai and just so you know the links in description. Let's get back to the show.
Demetrios Brinkmann [00:31:07]: I've got a friend Willem who is working on, like, agents. SRE agents. Right. It feels like this is in some ways the first time that I've thought about, wow, both of these agents could be best friends. Like, if the Dosu agent is talking with Willem's cleric agents and, and he's trying to root cause and doing more or less the same thing that you're doing, but with root cause analysis and trying to really figure out, wow, something went wrong, or the. Whatever, something is super saturated and servers are failing around the world, or datadog is showing this, blah, blah, blah, and here's why. And then it can sync with those who and get. Maybe it uses it for knowledge to.
Demetrios Brinkmann [00:32:04]: To help reference different things and. And do its job better. Or when a root cause analysis is made and then something is committed, it's thrown into Dosu. I wonder if you see that as a way that, like, the two agents will play together or do you feel like it is something that Dosu will eventually start doing, it's just not there yet.
Devin Stein [00:32:29]: So I know Willem as well, is great.
Demetrios Brinkmann [00:32:31]: Shout out to Willem.
Devin Stein [00:32:32]: Shout out to Willem. So, actually, I think you're spot on. Like, we want Dosu to be other agents best friend. So, like, you know, Dosu can be a knowledge provider for cursor, for cleric. Because, like, you're saying, knowledge is really important for root cause investigations in the SRE world. And then importantly, like, you know, you have runbooks. How do you make sure runbooks don't go out of sync and go stale? Because the worst thing you could do is give Cleric, like a runbook that isn't correct, and then it's going off and doing something that it shouldn't be or that used to be true. And it's confused, which is one of the things.
Demetrios Brinkmann [00:33:10]: Yeah, it was also a question that I had in my mind is how are you making sure that you're giving the most relevant information? Is it just that you are looking for the most recent? Because sometimes maybe the most recent isn't the most. The thing that is the most relevant, or I guess it's maybe not necessary to say relevant because that's more of a search problem. But if you want to update information, how are you going about updating it and saying, this is the source of truth now?
Devin Stein [00:33:40]: Yeah, so I think the way we approach it. So there is a search element to it, like, kind of. But the. I think the. There's an interesting, you know, comparison between, like, search and documentation, but where you can almost think of documentation as like a knowledge cache in some ways, where Someone's done the searches, they've compiled the information and then they've written it down so you don't have to do those searches again. And so the way we kind of think about it is when we are learning when just was like learning about a topic or that front book is associated with, we are trying to keep that topic as like a source of truth document. So we're, you know, as, you know, conversations are happening, we are like kind of reflecting those pieces back into the document. And so as long as you can find the relevant document, you can trust, you know, it's correct, if that makes sense.
Devin Stein [00:34:33]: So you don't have to go through the like work of searching through Slack or all of these other documents to try figure out like, okay, well this one happened like three months ago and they said this, but then this was one month ago, but then they said this again. Exactly.
Demetrios Brinkmann [00:34:48]: Yeah, Ben, that feels like the headache, but sorry I cut you off from okay. You're now other agents best friends. And it makes complete sense. Like cursor does much better if it gets given, I imagine, the stuff that Dosu can feed to it.
Devin Stein [00:35:05]: Yeah, exactly. And I think there's interesting things where, you know, if you think about how agents are probably going to become like the majority consumer of documentation, if they're not there already, they might be in the kind of coding domain where again, the format of docs, how you think about docs, starts to change. The APIs you might want to expose, also change. So I think an interesting thing, I think a big challenge, at least in our notion or on projects that I've worked on, is information hierarchy is also really hard to do in traditional documentation knowledge bases but maybe don't matter as much in the world where you have like agents being the main consumers of information. And maybe there's a better experience that we can build around how information is organized for the more discovery kind of modality, because that is when it's useful to see everything laid out, is because you're just trying to figure things out, you're trying to learn. Maybe you don't have a question yet.
Demetrios Brinkmann [00:36:04]: It's fascinating. Yeah. Information hierarchy is such a headache. And I can't tell you how many docs I have lost because they were nested inside of like 10 other docs. Just this happened to me just the other day and I was so frustrated. I'm like, where the hell is that database? I know it's around here somewhere and I couldn't find it. And I used all the key search terms that I Thought I would have called it, but since I haven't referenced it for like a year, I had no idea. And it was just like going off a feeling of like, I think it was somewhere in here and then spent too much time on it and eventually gave up.
Devin Stein [00:36:41]: Yeah. So I think there's. I mean, I think there's some interesting experiences that I mean we haven't built. But I think like, no, I would imagine like where if you think about like documentation, there's two modalities for it really is like how we think there's the. You want an answer. And so if you have a question, you want an answer. Search chat is actually pretty good for that. The other side is you are new to something and you're just trying to explore what is possible.
Devin Stein [00:37:10]: How should I be thinking about things? And that's where it's really nice to have that directory structure where you kind of poke through. And so I wonder if there's some more generative experiences there on the learning side to be built out of. Like, hey, maybe you can visualize your knowledge this way and you can be guided through it in a specific way. Um, that's even better than kind of like our traditional docs hierarchy we have today.
Demetrios Brinkmann [00:37:34]: Yeah, it just makes me think about those onboarding experiences being so much more custom and so much more tailored to the way that you like to learn. And then you can just be getting exactly what you're talking about. Where I want to see it this way or I really want to know what are the main principles that we're working off of. And you don't have to go and click through like the start here, read me or watch this video or this loom that somebody put together a few months ago or years ago.
Devin Stein [00:38:06]: Exactly.
Demetrios Brinkmann [00:38:08]: Well, what else do we want to hit on? Is there anything else that like for you specifically is really top of mind?
Devin Stein [00:38:22]: I think one piece that we, I mean we touched a bit on like knowledge maintenance. But I do think it's really like worth emphasizing that it's like something that humans just aren't. It's like not a job for humans. I think like maintaining knowledge is just the effort to do that. To be able to like monitor all the different changes, conversations happening in organization reflect that in documentation is near impossible. I mean some organizations have technical writers that their job is to try keep up and distill what is important from all the activity and then make sure that's reflected. But even then it's very, very hard to keep up. But for LLMs or AI monitoring all these changes and then doing an analysis of how this impacts your current state of knowledge is a much more straightforward routine operation.
Devin Stein [00:39:17]: So it actually can. And I'm just excited about, you know, what are the implications of that of like when you can actually have knowledge that you trust and that like things are always up to date. I don't know, I think it's just like something that is like really built for machines that has only been possible as of recently.
Demetrios Brinkmann [00:39:38]: Well, it feels like a really low hanging fruit that you probably already do is just giving executive summaries of here's the changes that happened last week or on whatever the cadence is that the person is looking for. Hey, here's everything that you should be in the loop on and you can almost like subscribe to. Okay, I want to know all the stuff that's happening just so that I can have one eye on that and make sure if I can be valuable. I, I can throw in my 2 cents too.
Devin Stein [00:40:11]: Yeah, I actually think that's an interesting one. We haven't explored much as the interplay like okay, I want to be aware of changes as an expert to know when I should also be involved as well or as a saboteur be like.
Demetrios Brinkmann [00:40:27]: No, we're not doing that. Just total block on everything. But yeah, I think that is a fascinating piece. Just like you know how you have the ability to watch when things happen right as there's repos or whatever. Like I can just be lurking in the background and recognizing when things are going on. I also, as I was thinking about that, I was thinking through how specifically you're going about evals because it feels like you can in a way get a lot of signal from people. Like if a PR is merged or if there's. We talked through like all these different areas that you're touching on which can give you a ton of different signals and ways to evaluate if the agent is correct or it's.
Demetrios Brinkmann [00:41:24]: It's doing what it needs to do. Have you found that certain signals should be weighted more than others?
Devin Stein [00:41:34]: So emails, I mean, always top of mind for us. I do think like you're saying we're pretty fortunate in that unlike maybe you know, chatgpt where it's a very one on one interaction because we're operating in public forums on repositories where PRs are merged, there's usually truth and that helps us get signal as to like how we're doing when we make mistakes. Something that we've been doing recently that I think is kind of cool and is sort of Dogfooding our product in how we are doing evals. And so we're sort of calling internally like living evals in that, you know, you know, at the end of the day, you know, if we run the agent and it's like someone has a question or an issue, you produce an answer or a document and that question and the answer is a piece of knowledge. And. And so can we save answers that users have asked and then try detect when our answers on the eval side has gone out of date? And so that way we can kind of keep instead of having to before, what we were doing is we were saving a version of every dataset for that specific point in time where someone asked a question. And that works. But it's a lot of, like, operational overhead.
Devin Stein [00:42:54]: And wouldn't it be nice if that we could actually just save evals as they are today, and only when relevant pieces of information change do we actually have to update that eval. And so we're very early in this process. But I'm like, I'm kind of excited about it because it reduces the friction of us collecting evals like a ton. And then we also get signal in the product side as well.
Demetrios Brinkmann [00:43:17]: It's a great way to save. It's like, it just reminds me of, you know, like the best engineers I know are lazy. And this feels like that's like no shade to you. It's like the smart way to do it is just to think like, how can I make sure that we don't have to do this all the time? How can we just take. It's almost like, are you thinking about it in a way of taking the delta of, okay, well, this changed, so we need to update our eval set just on that part.
Devin Stein [00:43:47]: Yeah, exactly. It's like if you think about an answer as like a document or a knowledge artifact, our job should be, you know, a user saves this question answer pair as like a thing to our knowledge base. That is a good eval until, you know, that document goes stale. And then can we update that eval or should we prune it from the knowledge base?
Demetrios Brinkmann [00:44:10]: How are you seeing folks want to deploy this inside of the companies? I imagine you get a lot of data on what kind of integrations they want. And there's probably like us, the 8020 principle, where, yeah, we gotta have Jira, we gotta have GitHub, and we gotta have Slack. That's probably like the stack that I would imagine majority of folks run with. But do you see folks that want to have Dosu then deployed in their private cloud because documentation is, it's like the special sauce in a way.
Devin Stein [00:44:52]: Yes, security is top of mind for us. I mean we're SOC 2 compliant but right now Dosu is kind of a SaaS product. We you know, go to great lengths to make sure like customer data is partitioned multi tenant but and currently don't offer self hosting, at least out of the box. I think we're willing to work with customers on it because of the sensitivity of knowledge on the integration side, I think that's, you know, it is a long tail. The way we think about it is like at the core of what we want is like we need code commits, conversations and tickets and documentation is like the key integrations and so Confluence, Notion, jira, Linear, Slack, Teams, Discord, GitHub, Cover, majority of people. But then there is a really long tail of information that lives in other places that people want access to. And something that we've been exploring is when does it make sense for a data source to be something that we need a formal integration with versus something where we can do more like just in time authentication, you know, where you know, someone like gives Dosu a access token on their behalf to go and look at, you know, maybe the logs in this service because, or this private internal thing as long as it's like OAuth 2 compliant. So we're trying to figure out like maybe there's a way for you know, first party kind of integrations but then long tail, you know, people just authorizing Dosu to just do just in time access.
Demetrios Brinkmann [00:46:27]: Yeah, it's funny how email is not in there at all. Like nobody's figuring out their documentation on email.
Devin Stein [00:46:35]: It's true because email, I mean maybe at some companies, but most of the companies we work with like email is, you know, it's a lot more. We really focus on like the product and engineering domain and a lot of those conversations are happening kind of in shared channels or on like PR reviews, less so kind of in more formal back and forth on emails.
Demetrios Brinkmann [00:46:59]: Yeah, it's, it's like email is externally facing and all of the slacks and the discords or teams or whatever it may be are internally facing. And so you wouldn't, you wouldn't expect email to be that. I wanted to ask about it the and we don't have to like specifically call out glean, but we can talk about like different ways that knowledge sprawl happens and we're talking about just knowledge kind of multiplying and how complex things get every time you add a new employee. Every time you add a new service, every time you add a new feature, that knowledge sprawl just continues to happen. And it's not like anybody ever comes and says we have less documentation this year than last year. I think that's the same with data. Nobody ever comes and says great news, last year we only had or we had this data sprawl. Now we were managed to get it in order and we don't have as much data this year.
Demetrios Brinkmann [00:48:13]: You've got a vision for a flywheel of data that you just talked about. I think that there's other folks that are trying to go about it and say, look, da data slash documentation is messy. Let's find the best ways to join the message or give you access to the mess. You're taking a bit of a different approach. So can you walk through that vision?
Devin Stein [00:48:44]: Yeah, I think, you know, right now how do we deal with knowledge sprawl or like, you know, what happens as an organization grows? And the answer ends up being something like search, which is like we. And that's what you see is people are building better and better tools for sifting through this knowledge and trying to, you know, let people reason about what is truth or agents to reason about what is truth. But I think if we kind of look forward into a world where you have companies that are started with LLMs and AI first tools for knowledge management, you can imagine where you have a store of knowledge that is source of truth that grows with you and you don't end up in the same situation where you have six copies of a similar document and you're not really sure which one is truth and you have to go ping the authors to say, hey, is this still true? You can actually have a system that is doing that for you in real time as you scale. And I think that this is true at least in our view of the world. This can be true for product and engineering knowledge or at least where you have a system of record. So code is like a really, really good system of record. And maybe you can extend the analogy to other types of system records internally where they are truth and you can monitor changes on those and reflect those back into the knowledge base. But unless you have a digital system of record, it's really hard to reconcile things.
Devin Stein [00:50:18]: So there's still going to be meeting notes and other types of documents that might be floating around. But certain types of knowledge, like how does our product actually work? Like, I think there should always be a very clear answer.
Demetrios Brinkmann [00:50:33]: Do you also think about the differences between internal and external Documentation.
Devin Stein [00:50:42]: Yes, this comes up a lot. I think internal. There are some interesting differences today and I don't know how it's going to be in the future. So today, you know, I think the big difference between internal external docs is. External docs is usually like a representation of your product. It should be very polished, has your brand, you know, style, tone. These little details really matter versus internal knowledge. It's really about does it exist and is it correct and can I find it? And so, and also like, kind of the topics they cover are different.
Devin Stein [00:51:21]: External docs are very focused on user facing what users want to do with a product. Internally you want to know all the messy stuff. You want to know why do we have this hacky code or why can't we handle this integration or what should I not do to break the product. Those things are not going to be written publicly facing but are really important for internal collaboration and communication for people to do roadmapping effectively in the future. Yeah.
Demetrios Brinkmann [00:51:54]: Oh, sorry to interrupt, but this makes me think about one of those stories that you hear. It's one of the horror stories of like an engineer sifting through the documentation is trying to figure out how to optimize or refractor something and gets to this, you know, this function and it's like, why do we need this function? We do not need this function. And there's only one comment above it that says, do not ever change this function. You will deeply regret it. And they're like, whatever. They go and they say, it actually doesn't do anything, I'm getting rid of it. And they delete the whole thing. And they come back and they realize they just crashed the whole product, whatever, and they recognize that it was them that put that comment five years ago or whatever.
Demetrios Brinkmann [00:52:47]: And the real story is even funnier because I think the language they use is very colorful and it's almost like, wouldn't it be great if in those comments you can just have a link to the Slack conversation or the area that it was decided on in the actual documentation on how that was decided that that happened. Or maybe it's a link to the last outage of the person that tried to get rid of that function. Whatever. It's just made me think of that, but. Sorry, I cut you off.
Devin Stein [00:53:25]: No, exactly. I think like that's the, a great example of like, you know, machines can have better memory than humans. You know, I, I could totally see myself doing something like that where, you know, made a change, everything broke and I was like, okay, never do this again. No links in the comments, the git blames buried so far deep you can't find it. And then you, you know, just repeat the same thing and then all of a sudden all the memories come back to you and you're like, oh my God, I can't believe I did that again. And you know, with LLMs, like both making the connections for us and helping link disparate conversations or reviews back to code and making that really, really accessible, hopefully we can avoid that. I also think there's a piece of interesting about sort of a knowledge continuity there too, where that I think example is funny because it's the same engineer doing the same thing like five years later. But more often it's like that engineer leaves after four years and then there's a lot of engineers that are looking at that code being like, can we delete it? We don't know.
Devin Stein [00:54:32]: You know, Sarah left and no one knows what happens if this is deleted and someone's like, I'm just going to try it. And then, you know, the cycle continues. And I think that as we have better, we make it easier for people to get their knowledge kind of in a store and that store is kept up to date. I think you can have better sort of continuity as people come and go from companies and communities that can help them like be longer living kind of. Yeah. Over time.