MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Context Engineering, Context Rot, & Agentic Search with the CEO of Chroma, Jeff Huber

Posted Nov 21, 2025 | Views 5
# Context Rot
# Search
# AI Agents
# AI Engineering
# Chroma
Share

speakers

user's Avatar
Jeff Huber
Cofounder, CEO @ Chroma

Jeff Huber is the CEO and cofounder of Chroma. Chroma has raised $20M from top investors in Silicon Valley and builds modern search infrastructure for AI.

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Jeff Huber drops some hard truths about “context rot” — the slow decay of AI memory that’s quietly breaking your favorite models. From retrieval chaos to the hidden limits of context windows, he and Demetrios Brinkmann unpack why most AI systems forget what matters and how Chroma is rethinking the entire retrieval stack. It’s a bold look at whether smarter AI means cleaner context — or just better ways to hide the mess.

+ Read More

TRANSCRIPT

Jeff Huber [00:00:00]: And I think also people like don't want to believe that intelligence, AI intelligence, artificial intelligence is spiky. The most insidious, distracting information is the information that looks relevant, but it's not for some subtle reason. This is why sub agents, I think are like a powerful idea is like literally. It's because context rot, I think implies the existence of context window encapsulation. And that is by the way, maximizing recall, maximizing precision. Maybe the largest install base of chroma in 10 years is going to be inside of 5 billion robots.

Demetrios Brinkmann [00:00:37]: What's context rot?

Jeff Huber [00:00:39]: Are we talking now? Yeah. Okay, I know if you're gonna be like, hey everybody, like, you know, I.

Demetrios Brinkmann [00:00:44]: I don't get that quiet on it. You guys put out that blog and it caught fire.

Jeff Huber [00:00:49]: Yeah, yeah. I think we're really passionate about helping developers build useful stuff with AI and there's a lot of misinformation flying around. I think that AI is really hypey. That's both because it has a lot of promise, but also that ends up meaning there's a lot of also snake oil. And in particular understanding what's actually working for people and what's not actually working for people. At times there's asymmetry in that information. So we were hearing from a lot of developers that these sort of million plus token context windows, whilst kind of advertised as being perfect because look, they're very good on the DLL haystack benchmark. They're clearly very good.

Jeff Huber [00:01:34]: It seems like most developers actually their intuition was like, yeah, I don't trust Claude past 40,000 tokens or OpenAI or Gemini or any of these models. Right. You don't need to pick on anybody here. That's interesting. That seems really important. We should help builders understand that. And so we sort of launched this technical report, the multi month investigation. We tested I think 17 plus models across a suite of benchmarks and tasks and tests to see when do models stop working the way that you expect.

Jeff Huber [00:02:07]: And model behavior is not invariant to the length of the context window. When you use more context, both its ability to pay attention goes down and its ability to reason goes down. And obviously needle on a haystack is the easiest task. You have to pay attention to a needle. And if you ever look at the needle on Haystack benchmarks, it's basically all lexical matching. There's 18 words in common between the search query and the needle. And so it's a very easy task, zero reasoning power required. But of course most real world stuff like, oh, you have to connect these 18 things together.

Jeff Huber [00:02:42]: And then you need to also reason about all that. And so. And that's what we wanted to figure out. And so, yeah, I think that, that the technical report, I think it's like now 140,000 views on YouTube or something like that. And you know, you go to any event and people have heard about it. And I think it also was popular because it just sort of like it ratified things that people were already feeling.

Demetrios Brinkmann [00:03:02]: That's what I was going to say. It was like we all knew that was happening, but we didn't have any data to point to it.

Jeff Huber [00:03:08]: Yeah.

Demetrios Brinkmann [00:03:09]: And so then when you came out with that, it's like. Yeah, exactly. It's also funny that needle in the haystack is the only thing we had. It feels like we shouldn't only have that. And props to Greg for creating that, because I love the needle in the haystack. But at the end of the day, there should be 20 of these to really stress test the context window and figure out.

Jeff Huber [00:03:31]: Yeah.

Demetrios Brinkmann [00:03:32]: What is going on and how we can take advantage of it. To the map.

Jeff Huber [00:03:36]: Yeah. It's unclear to me. Like maybe the labs. I think the labs probably already knew this or at least had an intuition about it. Do they have their own internal benchmarks that they're training against or that they know about? I think the reality is that when you're marketing something, you invariably have to pick the things where you do the best and you kind of ignore the things where you don't do the best. And of course, if there's something that you don't do well on that nobody else measures themselves on either publicly, then you're definitely not going to talk about that. Why would you? And so I think also just how viciously competitive the sort of state of the art, large language model world is today, that creates a negative incentive to being as transparent as possible about the model's strengths and weaknesses. And I think also people don't want to believe that intelligence, AI intelligence, artificial intelligence is spiky.

Jeff Huber [00:04:26]: We want to believe that it's AGI. We want to believe that artificial superintelligence is coming, that these things are going to be better than humans in all ways. And while that may happen someday, I'm not sure there's really strong evidence today to prove that that's the case or not the case. Again, it may happen, but today we don't know. But I think the best way to think about these models is that they are very spiky and what they are good at and what they are not good at. And of course this is like classic computers as well. Classic computers are incredibly good at arithmetic, much better than humans at arithmetic. And of course they're way worse at other things.

Jeff Huber [00:04:56]: And so that to me is the most intellectually honest way to approach models and their strengths and, or sort of and their capabilities is to be sort of honestly try to pursue, you know, the map of their strengths and weaknesses.

Demetrios Brinkmann [00:05:10]: I also want to talk about search and just break down search because it has been something that I, I know you think about a bunch and we can kind of contextualize how we had keyword search or we have, we have all of these, but keyword search, then we go semantic search and then we start doing hybrid search. And then we now are like thinking about search within agents. All of that feels like problems that you need to have a strong grasp on if you're building with AI.

Jeff Huber [00:05:45]: Yeah, maybe like a quick sort of addendum to that. Like there's this term that's been thrown around now for a few years, like vector database. And we've always thought that that's like a dumb term. These are just like VCs trying to like meme a category of enterprise software as they do. Right. But like that's not actually a. The problem to be solved. And that's actually also not what developers really actually want to buy.

Jeff Huber [00:06:08]: As a result, these sort of things are very obviously correlated problem to be solved. People want to buy, they want to solve the problem of information retrieval broadly. And of course, dense vector search happens to be one useful tool in the toolbox. We'll unpack in a second. Why, but certainly not a panacea to all of your search problems. And we've never claimed that it is either. The way that I kind of describe what chroma is then is that it's not a vector database. What we're building is modern search infrastructure for AI, or maybe modern retrieval infrastructure for AI.

Jeff Huber [00:06:40]: And the modern piece is in contrast, legacy search systems where chroma scales much better, much easier to run, it's much more cost effective. And we've really been able to put the last 10 years of distributed systems research, both on the theoretical side, on the applied side, like into this architecture to make it that much better. And then the 4ai part I think is really interesting to unpack because it means many things, like it means the AI developer. So it's no longer just a search engineer who's implementing these systems. It's everybody, every single engineer on earth. And frankly, we're even Seeing like semi technical, non technical users also starting to like sort of wade into these waters. For AI also means AI search and AI workloads. AI workloads are different than classic search workloads.

Jeff Huber [00:07:25]: Classic search workloads, you have like one giant index that everybody's querying. In AI workloads you have many indexes. Every team, every workspace, every user will have one or many search indexes that they want to query over. And so that's actually a much different shape workload at kind of the technical level. And then I think the other observation here is that in the past humans were the ones doing the querying and humans were the ones doing the last mile of search, which is digesting the search results. And so much of search for so long has been about 10 blue links, because that's all that a human brain really can assimilate and absorb and process at a time. But now with LLMs, they are doing a, the majority of the querying and they're not just doing one query. They could be doing 100 queries and for every query they're not just processing 10 blue links, they could be processing like tens or hundreds of thousands of pages of documents.

Jeff Huber [00:08:14]: And so that's a much different thing. At the end of the day, it's.

Demetrios Brinkmann [00:08:18]: Funny also just to interject about how you have the shape of search so different with these many teams that are doing different queries on different corpuses of data. And sometimes it overlaps, sometimes it doesn't. I hadn't thought about that. And then the next thing of like search now has transformed from Templar links where we as a human interact with it in a way that we're looking for that headline. Is this relevant to me? We click into it, we kind of look, maybe, maybe not. No, just try to process as much information as you can and then give it back out and see if we can get that like squeeze the juice out of this grapefruit as much as possible.

Jeff Huber [00:09:03]: Yep, exactly. And I think like the way to reason about this like pipeline then of like, how do you bring information to the model or how do you give the model a tool that it can use to go get information? I think like a two stage process is a good way to think about it. And the first stage is you want to maximize recall, you want to get all possible relevant information, because if you miss one piece of relevant information, you could get it wrong. And so of course if you're maximizing recall, you're not going to be maximizing precision sort of by definition. Right. So first Stage is maximize recall. We'll unpack that in a second. And the second stage is maximizing precision because also context rot, you know, context windows are not perfect.

Jeff Huber [00:09:38]: They're very sensitive to distractors as well. And so you want to then basically amass a pool of relevant information maximizing recall and then cull that pool down to only the most relevant bits before you have the model do a kind of final reasoning pass over it. And so this kind of two stage pipeline I think is a helpful way to think about the goals because do.

Demetrios Brinkmann [00:09:56]: You see still that as you mentioned, it's very prone to distraction. I've noticed that sometimes you give it more information and then that just fucks everything up.

Jeff Huber [00:10:09]: Yeah, oh yeah. And we've seen for sure within, done some experiments in kind of agent learning and they're sort of intuitively go make sense like oh, if you give agents access to prior similar situations where it did a thing and it was successful, kind of give it a few examples of that dynamically few shot prompting when it comes to a similar situation again. Oh, this would be really, this would be helpful. I, the human would find that helpful probably. And what we find is the models just tend to sort of like slide straight into that local minima and they're like, oh my gosh, thank you, you already gave me the answer. I don't have to think about this anymore. And like the most insidious distracting information is the information that looks relevant but is not for some subtle reason. And so yeah, this is definitely a hard problem.

Demetrios Brinkmann [00:10:50]: Wow, okay. And that just a bit of a tangent goes into this whole thing on memory of task completion and how you can help the agent complete tasks that it's already done before or that it has kind of done before. And so you're like, hey, you, you've done this before, remember how you did it. But if it is slightly different and it looks the same, then you send the agent off and it doesn't do it. And you're like, why? You were so close, you had everything you needed.

Jeff Huber [00:11:21]: Yeah, yeah, it's definitely not a solved problem. I think that, you know, kind of the data pipelines of retrieval are a part of the solution of what people want to call memory, which I think is like context engineering in some sense. Right. There's sort of a right path. How do you glean insights from prior contexts and how do you store those? And then there's like a read path, query path. How do you use those prior insights in the next generation either for the continual the same trace or for Historical traces. But, yeah, it's still an unsolved problem. I think the data piece is certainly an important part of it, but not also, again, a panacea.

Jeff Huber [00:11:53]: There probably needs to be some learned representation to really be the level of expressive that we want these things to be able to be. And I think the goal really is to be able to communicate tacit knowledge to models and models themselves, glean tacit knowledge as they're doing things, and kind of build up a library of skills. And I think the library of skills, again, can both exist in documents loaded into a search index, but also probably need to be some way encoded into the weights of something.

Demetrios Brinkmann [00:12:18]: All right, so I got you off track with precision and recall. And you're talking about how basically you want to maximize both of these. First it's recall, then it's precision, and then.

Jeff Huber [00:12:28]: Yeah, so maximizing recall. I think it's worth unpacking that because you asked a question about dense vector search, text search, lexical search, hybrid. What is all this?

Demetrios Brinkmann [00:12:37]: So many terms that's true.

Jeff Huber [00:12:41]: Much misinformation on the Internet about this. And, you know, I. I'll make the point. You know, basically, unless you have data, it's just. You're just an opinion, you know, and there's just so many opinions flying around. And I think there's like, it's just so mimetic too, you know, like, oh, like so. And so does this, therefore I should do that. And you're like, have you even thought for a single moment about either why.

Demetrios Brinkmann [00:13:01]: They do that or their state, their scale?

Jeff Huber [00:13:03]: Exactly.

Demetrios Brinkmann [00:13:04]: I don't know if you remember, back in the day, we used to always say in the community, just like, when people would come in and be like, I'm gonna build this gigantic ML Ops platform. I need these tools and this because. And then we're gonna throw it all on Kubernetes. And it goes to the meme of, like, I have 12 concurrent users. I guess I'm ready for Kubernetes. And then we would be sharing the same article, which is, you are not Google. It's like that classic blog of, like, don't trick yourself into thinking that you need something that, that Google needs. Because Google's scale is a whole different level.

Demetrios Brinkmann [00:13:42]: Their company, their business is a whole different level. And so back to this point of, like, you need to really, like, kind of be discerning on what you see and why they're doing it that way, just because they do it. Take inspiration, but don't think that it's the Bible.

Jeff Huber [00:13:56]: Exactly. And so unpacking this, like, goal, this first stage or first step of maximizing recall, like how did, how to think about that? Well, you should think about it first of all, you know, you shouldn't just copy and paste what somebody else has done. You should spend the five minutes to think about it. And for some applications it's going to be which information or which files are currently open. For other applications, it's going to be which files or information were recently touched or recently created. Recency can be important. For other situations, it could be semantic similarity, it could be text sort of matching, lexical matching. And I think a good way to reason about the strengths and weaknesses of dense and dense and sparse.

Jeff Huber [00:14:35]: Dense. And lexical search is. Dense is useful when you don't know the language of the corpus. So maybe I'll say that the other way. Full text search is very good if you know the words to search for. And actually in most enterprise search workloads, you'll see Lexical full text search as being the primarily useful tool. Because I'm looking for a document that I made in the past. I kind of already know where it is.

Jeff Huber [00:15:02]: I can't remember where it is, but I kind of know what I named it. You know, I know what words are in it. And like, so it's actually like not that hard to find document using lexical search. Semantic search becomes really helpful when you need that semantic expansion. You need that term semantic matching. I don't remember even what the words were, but I need to go and find it.

Demetrios Brinkmann [00:15:21]: Yeah, it was around this idea.

Jeff Huber [00:15:23]: Exactly, exactly. And so these tools are again, not. I think most applications, it's not an either or, they're complimentary. Now it might be 9010 or 10 90, depending on the use case. Right. Like which of those are the most helpful. But I think it's almost never an either or. And of course that's what people call like hybrid search, which I think is a dumb phrase because nobody knows what it means and it means 10 different things.

Jeff Huber [00:15:43]: And so we should just not call it that. You know, it should just be, oh yeah, I'm using dense vector search and lexical search. Just say that, don't call it hybrid search.

Demetrios Brinkmann [00:15:51]: So it's a mouthful though.

Jeff Huber [00:15:53]: It is a mouthful. But at least it's. At least people know what you're talking about. Yeah, yeah. So I think that going back to like the, the dense and sparse and lexical search, I think also there's this recent paper from Google on the limits of single dense embeddings. Great. Mathematically provable. I think everybody kind of already knew it.

Jeff Huber [00:16:11]: But again, it was this good. It put math around, which we already kind of knew and proved it, which is great. Dense embeddings have their limit, and of course the entire Internet. There's this graph that shows BM25 search as this perfect flat line. And then all these dense vector approaches failing at much lower quality. And again, that's fine, that's not wrong. But if you go and look. If you go and look at the actual data that they use for that experiment and you look at the queries, they are all lexical searches.

Jeff Huber [00:16:49]: And so of course BM25 is going to freaking crush it at that. Because BM25 is full text search term, frequency based. And so of course BM25 is going to be very good at that task. And of course, again, the nuance is important and helpful, but the mean that went into the Internet was that, oh my gosh, all this dense vector stuff is BS.

Demetrios Brinkmann [00:17:10]: It's useless.

Jeff Huber [00:17:11]: You just need BM25. And maybe that's true for your use case, but maybe it's not. And again, I'm just really now ranting about why people aren't thinking enough. So I'll stop there.

Demetrios Brinkmann [00:17:21]: I want to bring in one other piece which I think is important with search, which got brought up a few weeks ago on the podcast with my buddy Nishi. And he was saying how for him, a lot of his search woes and problems come in when someone searches for romantic dinner. And it's like semantic search kind of can do that. The idea is there, but it's not quite there. And he was having the hardest time because what was happening is you would come in and. And be like, all right, well, romantic for you means what you need a bit of personalization. So if you're talking to the LLM and you're saying, I want to plan me a romantic dinner with my wife. All right, but romantic is for you is one thing.

Demetrios Brinkmann [00:18:10]: For me, it's another thing. And. And then he was saying you can just really have problems on how we have uxed the new type of.

Jeff Huber [00:18:21]: Yeah.

Demetrios Brinkmann [00:18:21]: Experience when you're trying to bring like, old stuff into the new UX that we have where people can ask anything and say anything.

Jeff Huber [00:18:30]: Yeah.

Demetrios Brinkmann [00:18:31]: And I wonder if you've seen that where, like, okay, these. These two types of searches are. Are not breaking down completely, but yeah, you don't have them there. And another example that he gave, which I'll think about too, is how. Where semantic search didn't really work is if you are a vegetarian and you say, I want vegetarian pizza, it's not normally going to give you cheese pizza as an option because that's not tagged as vegetarian, you know, so. And even in like in vector space, like, vegetarian pizza.

Jeff Huber [00:19:07]: Yeah, yeah, yeah.

Demetrios Brinkmann [00:19:08]: Cheese pizza is way over here.

Jeff Huber [00:19:09]: Totally, totally, totally. Or even more complicated, if they say, plan me at romantic dinner tonight, you have to know, like, which business is open and which ones are close to you and you're vegetarian and like, you know, it becomes a pretty complicated query. I think it is something that like an LLM can internally process, like fairly well. Like, it doesn't have to be all rigid, explicit system. But I think, you know, the ranting dinner piece, like, I think it's possible to get that out of vector space because kind of how I would approach that, like, let's say I'm Yelp, right, And I want to support like, romantic dinner. I'm going to take all the reviews and then I'm going to have an LLM generate maybe like 20 to 30 tags for every review. And then those 20, 30 tags can become clusters. So I understand now which clusters of ideas are commonly associated with this establishment.

Jeff Huber [00:20:03]: And then I can associate either the centroid of those clusters or maybe all the information those clusters with that business. And then now when I search romantic dinner, and if somebody else said like, whatever a synonym to romantic is cozy and intimate evening meal, you know, like, actually you might be able to get that from the reviews. So I don't know that, like, that is impossible to get out of, like vector space. But obviously that level of forethought that you want to support that kind of a query and then the level of then work required to kind of make that query possible again requires you to think it's not just all for free. Yeah, yeah.

Demetrios Brinkmann [00:20:39]: You can't just slap semantic search on it and be like, we're good now. Yeah, type whatever you want.

Jeff Huber [00:20:43]: I mean, you can, but like your competitor is doing more work and therefore your competitor's product is better.

Demetrios Brinkmann [00:20:48]: Yeah.

Jeff Huber [00:20:48]: And that means they're going to win.

Demetrios Brinkmann [00:20:49]: Yeah, the experience is going to be much better.

Jeff Huber [00:20:51]: Exactly, exactly.

Demetrios Brinkmann [00:20:52]: Yeah. I tend to these days when I look for coffee establishments because I like V60 and I like the, you know, the specialty coffee.

Jeff Huber [00:20:59]: Yeah.

Demetrios Brinkmann [00:21:00]: I don't type in cafe because that can give me anything from Starbucks to Dunkin Donuts, you know.

Jeff Huber [00:21:07]: Yep.

Demetrios Brinkmann [00:21:08]: And what I'll do is I try to search like V60 as a term and you have the cafes that come up and they'll they'll show. But this is like super easy because it's like a review has V60 in it.

Jeff Huber [00:21:21]: Right.

Demetrios Brinkmann [00:21:22]: But sometimes I'll notice that the review doesn't have V60. It's just like, oh, this is a specialty coffee place or this is a coffee roasters. And so there I'm like, hey, this is interesting search issue.

Jeff Huber [00:21:34]: Totally, totally. And that shows how you've adapted to the strengths and weaknesses of the system. Like, you know, I think one of the interesting UX challenges and opportunities now with LLMs is because we all became very good at Googling.

Demetrios Brinkmann [00:21:48]: Yeah.

Jeff Huber [00:21:48]: And like, how do I compose this question I have into a set of terms and minus signs and friction. Like we all learn how to search kind of, but within the limits of Google's ontology.

Demetrios Brinkmann [00:22:02]: And there was professional search Googlers. Right.

Jeff Huber [00:22:04]: I mean, you know, people that are way better at people that are way worse type PDF, you know, like the whole shebang. And like now with LLMs, it's just a sort of, it's viewed as more of like a magical box. And that's a good thing because users don't have to like, do the mental, you know, gymnastics of how do I change the question that I have into this particular kind of domain syntax that Google is going to accept. Like, that's actually a better user experience. But of course that means that users are going to ask a lot more questions and a lot wider variants. And you know, in some ways I think that's the cool part about that from like a user research perspective is like, now applications do have a larger surface area of like, how they can help their users. Yeah. And of course that's a blessing and a curse.

Jeff Huber [00:22:47]: Right? The blessing is like, you can do a lot more. It's pretty, it's pretty cool. The curse is that like, you have to do a lot more. And like, you know, again, it's not all for free. So.

Demetrios Brinkmann [00:22:54]: Yeah. And you're, you're now supporting all these different use cases in all these different ways that users are doing it. And if one of them is kind of a shitty experience, you're struggling. Yeah. I, as a user am not going to look at this new AI feature that you created as something that is worth my time.

Jeff Huber [00:23:13]: Right.

Demetrios Brinkmann [00:23:13]: And right. Actually it kind of brings me back to context in a way because if now we're talking to these AI features so much and we're continuously going back and forth with them, you put yourself in the position to have that context. Right. But yeah, you have to almost be like a janitor of the context and really get good at cleaning it.

Jeff Huber [00:23:38]: Yeah.

Demetrios Brinkmann [00:23:38]: And one thing that I saw last night from a guy at the event, he was like, I'm trying to create like a jupyter notebook style thing for the context window. And, and I want to be able to like turn off different pieces of my context because it's not needed. Because maybe there's sometimes that I'll just ask something and then it's there in the context and it's always referencing.

Jeff Huber [00:24:00]: Yeah.

Demetrios Brinkmann [00:24:00]: And so now it made me think like, oh man, like context kind of etiquette or not etiquette, but context, like cleanliness is a thing in a way.

Jeff Huber [00:24:11]: Yeah, A hundred percent. You both see it in like multi turn kind of user model conversations, ChatGPT, et cetera. You also see it inside of agentic loops where there's an LLM running in a loop and it's getting some context, generating some output, and then it needs to generate, then it needs to receive that output into itself and then generate more information. And I think your intuition is exactly right. Which is like this eventually breaks down. We've seen stuff inside of Claude code. There's the compact command which does a summarization and tries to clean it up. I think I saw something from the cloud code team on Twitter recently where like they don't use compact, they actually just.

Jeff Huber [00:24:53]: They just dump it. They just clear the entirely.

Demetrios Brinkmann [00:24:55]: They clear it.

Jeff Huber [00:24:56]: Yeah, yeah, yeah. And actually we've done some research that has like shown that that sort of verified that result. That like actually naive summarization of the context history is like no better than just starting from scratch. Yeah, and of course we start from scratch. It's also sort of, you know, cheaper and faster because you're kind of like flushing all that baggage. So this sort of begs the question though, like, how should we think about prior state being available to sort of the current iteration for again either multi turn user conversation or an agentic loop? And I think one way to think about this is like this term compaction, context compaction or context vendor distillation or some word there. And it is not a solved problem. I don't think it would be a solved problem, but there's a few interesting ideas.

Jeff Huber [00:25:44]: One is that you use a prompt of some kind to help the model isolate the relevant actually relevant high signal pieces and then pass those down to the next step. Because there's a lot of cruft, especially with coding agents. You just see these huge dumps of logs and these huge jumps of test results and it can't pay attention to all that. So giving a model the ability to snip out the pieces that are actually important for the next turn of the model. Detailed prompt, I think is more helpful than a generic prompt. And then the other thing that I'll say is that another interesting approach to this that I've seen some people take is that give the next sort of turn of the loop, whether it be the next model or the agent, the ability to search in the history of that actual individual conversation or trace or whatever. And these things are not also mutually exclusive. You can do both, obviously, but it's not easy.

Demetrios Brinkmann [00:26:36]: How are you seeing the search done within the context or within the model doing that? Because that does seem like the obvious step to take.

Jeff Huber [00:26:46]: Right?

Demetrios Brinkmann [00:26:47]: Like, hey, if we just make the search the big piece here. Yeah, it's kind of like the theme of what we've been talking about this whole time, like how important search is and how not solved yet it is or still, like there's still a lot to be done in the new paradigm.

Jeff Huber [00:27:02]: Yeah. I say the way that most people are doing this right now, the predominant strategy people use, you call it search, but it's. It is kind of search, but it's also maybe not search is they'll just use like a file and the file will have like the list of to dos.

Demetrios Brinkmann [00:27:18]: Right.

Jeff Huber [00:27:18]: Like, I want to put this feature, I need to do these like 15 things in this order. And then each like checkbox gets encapsulated by one kind of agent loop agent trace, which minimizes or helps cut down on like the. The baggage of all the context from all of the different steps.

Demetrios Brinkmann [00:27:38]: It's like the subagent idea.

Jeff Huber [00:27:40]: The sub agent idea. Exactly. This is why sub agents, I think are like a powerful idea is like literally. It's because context rot, I think, implies the existence of context window encapsulation. That's why a sub agent is a good idea in many cases. Of course, there are challenges with subagents, which all of a sudden subagent has to be able to report up to the orchestrator agent the right compaction information as well.

Demetrios Brinkmann [00:28:00]: It's like, I got your next paper, man. How many subagents until you get subagent rot.

Jeff Huber [00:28:08]: We've looked at this a little bit. We've done some experiments with deep research agents which are this shape of thing a little bit different. It's not a checklist that you're running down, but it is. The orchestrator says, okay, the user asked this question, I want to look in these 85 places. So I'm going to have 85 subagents go out and do a bunch of research. But obviously those 85 subagents can't bring back all the research that they find. They have to compact before they boil it up to the orchestrator. Right.

Jeff Huber [00:28:34]: And I think in, I mean, you know, state of the art today maybe is like the Deep Research implementations inside of Claude, you know, OpenAI, et cetera. And I don't know if you've used those with like, topics that you know about, but you're, you're usually like, this.

Demetrios Brinkmann [00:28:53]: Is not very good. Exactly. Yeah. You're like, kind of.

Jeff Huber [00:28:57]: I guess. But I saw a great tweet that was like, I used Deep Research the other day for something that I know a lot about and it wasn't very good, so I'll just use it for everything else instead. Yeah. Which is like, hits the point, hits the nail on the head. Right. So it's a good way to put it.

Demetrios Brinkmann [00:29:12]: But you know what I think is also something that never happens is when you have these sub agents and they go and do their research and then they come back.

Jeff Huber [00:29:20]: Yeah.

Demetrios Brinkmann [00:29:20]: They're never going to tell you, I didn't find shit. It wasn't relevant. Like, don't use me as a sub agent.

Jeff Huber [00:29:26]: Yeah.

Demetrios Brinkmann [00:29:26]: That is not one of the. Out of all the possible outcomes. I can guarantee you that's not one of them.

Jeff Huber [00:29:32]: Broadly speaking, you know, the ability for models to know what they don't know still seems like unsolved. And of course you can put into the prompt, like, if you don't know, please reply, I don't know. And then that can work. And it doesn't make. Some models are more cautious than others. Like CLAUDE is like much more cautious about claiming to know things versus, like, OpenAI is like willing to take more leaps, for example. And again, that's not always. It depends on the context, whether that's a strength or a weakness.

Jeff Huber [00:29:58]: But I think probably the conservativeness of CLAUDE is a reason that developers tend to really like Claude because it doesn't.

Demetrios Brinkmann [00:30:09]: Blow up your database. It doesn't just go delete shit.

Jeff Huber [00:30:12]: I mean, I'm sure there's examples of it deleting shit.

Demetrios Brinkmann [00:30:15]: So then I feel like I got you off track again. When we were talking about. What were we just talking about with the. So you're spawning these subagents and they're coming back with their information. It's. I've heard it put as like an accordion, where you can look at an accordion and it goes out and it gets all this air and it gets all of this stuff and then it comes back in and you like condense it down to these like five slits.

Jeff Huber [00:30:39]: Yeah, yeah. And that is, by the way, maximizing recall, maximizing precision.

Demetrios Brinkmann [00:30:42]: I like that. That's a great visual of. All right, we're just going to get everything and then we're going to figure out what's important.

Jeff Huber [00:30:49]: Exactly.

Demetrios Brinkmann [00:30:50]: So you recently put out Android Chroma or what is it exactly?

Jeff Huber [00:30:55]: Last week we released Chroma Swift into beta and then imminently Chrome Android.

Demetrios Brinkmann [00:31:01]: Is it somebody that uses Chroma on their phone or is it that now you have a Chroma instance on the edge? Like, give me the break.

Jeff Huber [00:31:10]: I think the intuition here is that intelligence will be everywhere. Certainly in the cloud. It already is increasingly on devices that you own, your laptop, your phone, and inside of every device as we know it, in the future, maybe we'll have devices that were previously inanimate become animated. We could talk to the plant behind me or something, I don't know.

Demetrios Brinkmann [00:31:36]: Or the cameras. I'm sure there's a world where we don't have to hit autofocus and we just say, all right, folks, yeah, that.

Jeff Huber [00:31:44]: Would have been great this morning. And then within that, right, there are trade offs, there are pros and cons to doing search and retrieval locally on whatever device that is versus the cloud. And they don't have to be mutually exclusive. You can also do both. And actually in most use cases that we see, people doing stuff on the phone end up also having sort of a cloud syncing story going on. But of course, local, it's privacy preserving, so you can give the user those guarantees. Local is also going to be ipso facto faster because you're not going over the network. And then local is also going to.

Demetrios Brinkmann [00:32:30]: Be, you don't have to worry about, like you can be in a place where there's no reception.

Jeff Huber [00:32:36]: Right. So, yeah, offline connectivity. And then also, sorry, cheaper as well. If you're doing the compute on the device you already own, you've already paid the money for that compute and you can just now use that compute as much as you want. And so those are very good reasons to do things. I think probably the privacy preserving reason is the biggest use case that I feel exists there. It's just really great to be able to have users do things on their own device and not have to worry about the data being egressed unless they explicitly want it to be. But I think there's a lot of really exciting stuff there.

Jeff Huber [00:33:08]: I mean, we're A little early still in the Arc of this, but I think also the reason that Chrome is written in Rust, and because Chrome is written in Rust, it can run anywhere. There's some kind of early, early work in getting Chroma to run on robots. Nice, because they're also going to need their own level of memory. So maybe the largest install base of chroma in 10 years is going to be, you know, inside of like 5 billion robots. You could think about it that way.

Demetrios Brinkmann [00:33:41]: And what was it? What was the engineering feat that had to happen? Did you. To make it super lightweight so that people could just grab it and have it on their device without any worries that this is going to brick my phone?

Jeff Huber [00:33:56]: Yeah. I mean, Rust gives you a lot of safety and guarantees, which is useful. We have now a single node version of Chroma, which you can run as a server. We also have a fully distributed version of Chroma which can run across many nodes in a cloud environment that's also serverless. And then that single node version can also be run as a library embedded into any given context. So today people very frequently use Chroma in Python with just like Python bindings to all of the underlying Rust, and they'll use that inside of Jupyter notebooks or inside of Python scripts or demos or all kinds of other contexts. And it's really useful to not have to have a server if you're just doing something lighter weight. But of course, when you're ready to go to a server, the API is the same and you just change how you connect to it.

Jeff Huber [00:34:52]: And then when you're ready to scale to petabytes of data in the cloud, you again, you just change how you connect to it. And so, like, I think that has always been our goal. It's aspirational, it's not easy to do, but we wanted to make sure that, like, wherever you have a search or retrieval workload like Chroma is there to serve you.

Demetrios Brinkmann [00:35:11]: I like that.

Jeff Huber [00:35:12]: And you don't have to, like, pick and choose. Do I want this or that? The answer is you can have both and it's the same API.

Demetrios Brinkmann [00:35:19]: What made you think this was necessary? Were you, like, hearing people ask you for it? They were like, I'm building this app, but I need something super small. I need to have it on device. Yeah, I mean, not.

Jeff Huber [00:35:32]: Not everybody, I guess. But the ideal version of running a company is you do have a vision of what you're building.

Demetrios Brinkmann [00:35:37]: Pulling the visionary card on me, I'm.

Jeff Huber [00:35:39]: Not going to actually. I'm Going to walk it back like immediately. But hear me out. Oh, you have a vision of what you're building is 10 years in the future, but the reality is that in business you get zero points for being right 10 years from now.

Demetrios Brinkmann [00:35:50]: Yeah, that's true.

Jeff Huber [00:35:51]: Does not matter. Does not matter at all. What you have to do is be right six months from now and ultimately you really have to be. This is cliche to be clear, but like, you have to be obsessed with like your customers and what they are ready for, what they need and want. So there's no reason that we couldn't have done Chroma on mobile a long time ago, but nobody cared until very recently. It feels like in the last few months we've started to see more people really want that.

Demetrios Brinkmann [00:36:23]: But why do you think is there apps that are being. What apps are being built?

Jeff Huber [00:36:26]: It's, I mean it's a lot of AI chat stuff. It's a lot of like media you have on your phone that you want to like, you know, search and retrieve over, you know, I can't, I can't claim to predict like why the exact timing of these things happen, you know. Similarly, we were chatting off camera before we got rolling here is like, you know, accuracy, we've always thought is incredibly important. You can only manage what you measure. Like, why would you not want to know how this thing is in the real world? And then nobody has really seemed to care that much. A niche of developers and engineers have always cared, obviously, but like the masses haven't. And so we've sort of like held off like doing much work in that domain particularly kind of the. Most of it is like not research, most of it is sort of like product.

Jeff Huber [00:37:13]: Right. We've kind of like held off for the reason that like, not. We didn't think it was important, but that like the, our. Our customer base didn't yet believe it was important.

Demetrios Brinkmann [00:37:22]: There wasn't enough asks from the community like, hey, we really need to get this accuracy thing solved.

Jeff Huber [00:37:28]: I mean some. Yeah, exactly. Sometimes people say they want a faster horse and they actually want. Is a car, you know, but they. Maybe that's true, but like when they say they want a faster horse, you know, you, you sort of can't jump all the way to selling them a supersonic aircraft, you know, and so like, I think just like being very sensitive to like the timing is, is kind of an underrated, underrated skill. And I mean you can use the word like taste, I think. Yeah, taste. Or maybe another.

Jeff Huber [00:37:54]: The Lindy version of the word. Taste is just wisdom, you know, like, and it's, it's hard, right, because like on one hand again, you know, you have a vision of what you want to build and you need to stay very true to that vision and believe in it incredibly deeply. At the same time. You kind of need to be very, I don't use the reactive, but you need to hold that loosely, especially your near term plans. And it's like, you know, no, no plan has survived the first minute of battle. You know, like that's like, that is absolutely the case.

Demetrios Brinkmann [00:38:23]: Yeah. That's such a great point because you get all these inputs, I imagine every day of people that are asking for things and ideas that you're seeing out there and then you think, wow, we're gonna do it like this. And when you bring it to the masses, then it's like, actually we might need to change it how it's done like this. But it's through those iterations that things happen.

Jeff Huber [00:38:48]: This is why I tell myself that, you know, the time that I spend online, you know, is like a good use of my time, your Twitter time, the taps, taps me into like the, the, the public consciousness, if you will. Yeah.

Demetrios Brinkmann [00:39:00]: So this is very visionary on the basically Chroma running anywhere at any scale. What else are you thinking about as far as like the vision of how things are going to change? Is it on the precision side? Are you going to start building now for precision because it feels like people are interested in it.

Jeff Huber [00:39:20]: I think it's very good intuition. We don't have kind of the specifics either planned out or kind of ready to share all the specifics yet of what we're thinking about. But I'll share a high level observation which is the reason that Chroma started is I had worked in applied machine learning and developer tools for 10 years and building technology with machine learning now we call it AI. Thankfully we've dropped calling it Genai. I'm very happy about that.

Demetrios Brinkmann [00:39:48]: Oh, did we. I still see some shit. Somebody the other day was like Gen AI Ops. And I'm like, oh, really? That's, that's a term.

Jeff Huber [00:39:58]: It's not a term.

Demetrios Brinkmann [00:39:59]: Yeah, we're just going to say it right now.

Jeff Huber [00:40:01]: That's not.

Demetrios Brinkmann [00:40:02]: No, sorry, I digress.

Jeff Huber [00:40:03]: No, no, yeah, yeah. The goal of Chroma, again, going back to that experience and honestly the pain that I felt building stuff with applied machine learning, a lot of it was in the computer vision domain is, is this is really powerful. It could change the world. But it feels a lot more like Alchemy than it does like engineering. And I'm sure you've seen the XKCD comic where the person's like, you know, oh, this is your machine learning pipeline. And there's a guy standing on a pile of garbage and then he's like, the follow up question is like, well, what if it doesn't work? And the guy in the garbage is like, oh, you just, like, you know.

Demetrios Brinkmann [00:40:39]: You just mix it.

Jeff Huber [00:40:40]: Yeah, mix it up and try it again, you know, and that's like satisfying. Still the case, right? This actually is still kind of the case today.

Demetrios Brinkmann [00:40:47]: Yeah, that's true. We just change some proms, we turn the temperature, we try and figure out the recall or whatever. Yeah, yeah, that's so true.

Jeff Huber [00:40:55]: And, and like, I think, you know, a lot of the labs, maybe because of their sort of fundraising requirements, have to sort of market this idea that like, we're building God. Like we are literally building a deus ex machina. We are building a God will emerge from the machine and it will solve all of our problems. And we don't have to pull that sort of party line because we don't care, frankly about that. And I think it's wrong. And so our term has always been AI is useful. And we have stickers. I'll give you your latest.

Jeff Huber [00:41:28]: AI is useful. And of course it's only useful if you give it the right information and the right context. And it's only really useful if it can learn from that information too. And so I think that's the stuff that we are really excited about and really seems to be where the people on the bleeding edge of agents are kind of running into the edge of current capabilities is we have these large state of the art reasoners. They're powerful in many ways. They have their own weaknesses, obviously, but you can kind of work around them. We have also, you know, these retrieval systems, search systems, they are very good at what they do as well. But it kind of feels like the map's not complete yet because sort of, again, by just definition, we're not getting the capabilities that we want.

Jeff Huber [00:42:15]: And again, the capability that I want is like, if I pick up a cup one time and I put it back down, I know how to do it again. I don't have to think for 45 seconds for how to do that. And that ability to kind of learn from experience is something that would be incredibly useful for making useful stuff because the real world is fractally complex and the real world has so many edge cases. And I think for people that have worked in machine learning a long time, the phrase getting mugged by reality will resonate quite deeply. And so we need systems that can learn to adjust edge cases. Otherwise, you know, we're just kind of doing what we've done, what we did in computer vision and autonomy in the first era where we're just like putting on like 100 band aids on top and around this thing to try to make it work.

Demetrios Brinkmann [00:43:07]: Well, that's why it became so hard to put any ML model into production because it's. It's exactly that. Like, and there's. If you're in the enterprise, there's just so much alignment that needs to happen and so many meetings that you need to have cleared because it is a lot more risky than software. It's not running the way that you think it's going to run every single time. And so you have to be okay with that for your use cases. Right.

Jeff Huber [00:43:35]: You probably remember this phrase from. I think it was maybe 2017, maybe it was 26, 2018. It was like inside. I think it was either the title of a Google Paper or it was inside of Google Paper. That was. Machine learning is the high interest credit card of technical debt.

Demetrios Brinkmann [00:43:49]: I still still quote that we had D. Scully on here back in the day and it was like legendary. It was one of the first people that I was interviewing and I was like, oh, how am I on this call right now?

Jeff Huber [00:44:00]: Yeah.

Demetrios Brinkmann [00:44:01]: And it is that. And you know what's funny is there's that new way of looking at it where you remember the diagram in that.

Jeff Huber [00:44:09]: Oh, yeah.

Demetrios Brinkmann [00:44:10]: Where the model was a small piece and then everything around it. Yes, it's the same thing.

Jeff Huber [00:44:14]: Yeah.

Demetrios Brinkmann [00:44:15]: It's just that now the stuff, the boxes around it have changed.

Jeff Huber [00:44:19]: Yep.

Demetrios Brinkmann [00:44:20]: But it's. That's where the hard part is. And it's a little bit easier the. Because now we can hit an API for the model. We don't have to create the model. But still it's the same idea.

Jeff Huber [00:44:31]: Yeah, yeah, yeah. What's old is new again. Yeah.

Demetrios Brinkmann [00:44:43]: Did you get a professional to design this?

Jeff Huber [00:44:46]: Well, TJ is a professional, so.

Demetrios Brinkmann [00:44:47]: No, I mean like the. The Office.

Jeff Huber [00:44:49]: Oh, the Office. Yes. We also had a professional do that, but not somebody on our team, obviously.

+ Read More

Watch More

Exploring the Impact of Agentic Workflows
Posted Oct 15, 2024 | Views 7.8K
# AI agents in production
# LLMs
# AI
Beyond Prompting: The Emerging Discipline of Context Engineering Reading Group
Posted Sep 17, 2025 | Views 956
# Context Engineering
# LLMs
# Prompt Engineering
Foundational Models are the Future but... with Alex Ratner CEO of Snorkel AI
Posted Dec 29, 2022 | Views 1.4K
# Foundational Models
# Snorkel AI
# Foundation Model Suite
# Snorkel.ai
Code of Conduct