Sign in or Join the community to continue

The Current State of Agentic Retrieval - Qdrant Roundtable

Posted Jul 01, 2026 | Views 13

# Agentic Retrieval

# AI Agents

# Qdrant

Share

Speakers

Neil Kanungo

Head of Developer Relations @ Qdrant

Neil Kanungo is an experienced professional with expertise in data science, developer relations, and product growth. Currently serving as the Head of Developer Relations at Qdrant, Neil previously held the position of VP of Product Led Growth & Developer Relations at KX, where significant increases in product registration and user activation were achieved. At TIBCO, Neil managed a team focused on enhancing the adoption of TIBCO Spotfire through various initiatives, including tutorial videos and live webinars. With a strong technical background, Neil has developed innovative solutions in analytics, machine learning, and data visualization across multiple roles, including Engineering Data Analyst and Asset Integrity Engineer at Enterprise Products. Neil holds a Bachelor of Science in Radiation Physics from The University of Texas at Austin, a Master of Science in Mechanical Engineering from Texas Tech University, and is pursuing a Master in Applied Data Science from the University of Michigan.

+ Read More

Ewa Szyszka

DevRel Engineer @ Qdrant

Ewa Szyszka is a Developer Relations professional based in San Francisco with a background in Computer Science and Hardware Engineering, passionate about bridging the gap between technology and the developer community. She holds a BSc in Computer Science and an MSc in Electronics, bringing a strong blend of deep technical foundations and communication skills to her work.

Her DevRel experience spans technical content creation, community engagement, and public outreach - with a focus on delivering high-quality content at pace. Eva is also a polyglot, speaking Polish, English, German, French, Spanish, Japanese, and Korean, enabling her to connect authentically with developer communities around the world.

Outside of tech, she enjoys aviation, motorcycles, and tennis - always chasing the horizon.

+ Read More

Evgeniya Sukhodolskaya

Senior Developer Advocate @ Qdrant

Developer Relations at Qdrant with 8 years of IT experience across software engineering, machine learning, and technical management, and 4 years in Developer Relations. Holds a Master’s in Machine Learning, Data Analytics, and Data Engineering. Passionate about NLP, data-centric AI, and the role of vector search in advancing AI technologies.

+ Read More

Dylan Couzon

DevRel Engineer @ Qdrant

Dylan Couzon is based in New York City, and he helps developers build better AI applications. He is passionate about AI, programming, open source, and robotics, and enjoys sharing what he’s building and learning along the way.

He’s always on the lookout for great networking events and meetups, especially in NYC, and loves connecting with the developer community.

+ Read More

Andrei Cristea

DevRel Engineer @ Qdrant

Andrei is a Berlin-based Developer Relations Engineer at Qdrant, a prominent open-source vector database. With a Master’s degree in Artificial Intelligence from TU Munich, his expertise bridges AI, data infrastructure, and knowledge engineering.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

AI agents are only as good as the information they can find, retrieve, and remember.

In this community roundtable with the Qdrant team, we explored the latest advances in agentic memory, vector search, retrieval systems, and production AI architectures.

As AI agents move beyond simple chatbots into systems that can reason across large amounts of information, retrieval is becoming one of the most important layers in the AI stack. The discussion covered the real-world challenges of building agents that remember what matters, forget what doesn't, and consistently retrieve the right context at the right time.

If you're building AI agents, RAG systems, or production AI applications, this conversation offers practical insights into where retrieval is headed and what it takes to build reliable, scalable agentic systems.

+ Read More

TRANSCRIPT

Demetrios: [00:00:00] What is up, everyone? Hope you were jamming to that music as much as I was. I can see the chat is already getting off to a good start. I'm excited because we've got a packed house today to talk all about good old state of agentic retrieval. There's so much that we're gonna get into, and we've got so many incredible folks here today.

Demetrios: But I wanna set the scene before we bring out the guests of honor, and I wanna talk to you about why this session feels so important, and it is probably why you actually came. You know? There's so many different challenges and changes that we've had with retrieval over the past two [00:01:00] years. I have seen such an evolution, and so I thought, "Why don't we get together some people who have been knee-deep in this whole retrieval game to talk to us about what it was like, what did we have back in the day, and what do we have now, and what are ways that we can optimize it so that we are finding success right now?"

Demetrios: So without further ado, I'm gonna bring on the team from Qdrant. We've got the DevRel folks coming out. First up is my man Dylan. Where you at? Hey, Dylan. How you doing, dude? Hi, everyone. I'm pretty good. We've got Jenny. We've got Ewa, Neil, and Andrei. I think this is the most people that we've ever had on a round table.

Demetrios: So let's just kick it off, start strong with some hard-hitting questions. [00:02:00] I'm gonna ask What has changed in the last two years? I, I've seen so much. What do we need to focus on?

Neil Kanungo: Yeah. Uh, sure, yeah, I can take that. Uh, hi everyone, Neil here. Um, yeah, thanks for joining the session and thanks for the intro, Demetrios.

Neil Kanungo: Um, there's so much happening in retrieval and search. Um, I, uh, before I joined Quadrant, I didn't fully appreciate the, the field of search and, um, we kind of just expect search to work and we definitely notice when search doesn't work well. And we notice it through just getting like terrible results and, um, or it takes a long time to get results.

Neil Kanungo: We can't find what we're looking for. And the more data that we capture and from the real world, the more we need to get that information, uh, get, get the data, the right data we need, get the information out of our, the data we're [00:03:00] storing. So what, back to the question, what's changed? Um, one of the biggest things that's changed is agents and the topic of today's, uh, round table.

Neil Kanungo: Um, agents, a lot of them don't really search well, uh, I would say in my opinion. Um, there's a lot of They, they kind of search like a, like a, uh, a beginner would just search by just, like, kind of throwing a lot of things into the, into the hat and just, like, trying to, like, pull, uh, pull, pull results, and they go through lots of turns, and they burn a lot of tokens, and they, um, try to find this context.

Neil Kanungo: But they're not using all the tools available for search. And this-- And if you don't have the right context in your AI, that's gonna really make your AI less efficient, less effective. Um, it's gonna prevent you from reaching your goals. So agents in h- uh, in one of the things that agents are doing that's also interesting is, like, a human might [00:04:00] search, like, once or twice a minute or so.

Neil Kanungo: Agents are searching thousands of times per minute, so they're just going boom, boom, boom, boom, boom, hitting it, and they're not able to evaluate those search results the same way a human can because, um, humans have a lot of more judgment that they can apply. So yeah, I'll stop there, but those are s- those are some things that have really changed.

Demetrios: Dude, you're preaching to the choir on the different ways that agents search and then the ways that they interpret that data. And so I, I have some questions here about, like, hey, now that we've got agents and these agents are figuring things out, they're kind of... We've all seen it. They will stumble through different problem statements, or they'll try and figure it out, and they don't necessarily go about it the most efficient way possible.

Demetrios: I wonder if you all have found tricks on nudging the [00:05:00] agents to be like, "Oh yeah, by the way." Like, there's obviously the go look in that file trick, but there potentially could be plenty more that you've seen because this, at the end of the day, is a search and retrieval problem, and the faster that you can get those agents that knowledge, the faster they can hopefully do what you've asked them to do.

Neil Kanungo: Yeah. Uh, I'm, I'm actually gonna take, jump in here again, but I'm gonna pass to Dylan on this, so Dylan, get ready. Um, so, um, just to set, like, a little more context there is, um, you know, it's not just the efficiency, but it's the effectiveness of search. It's like finding the most re- like relevant response, uh, relevant results, and I think that, um, it goes into evaluation.

Neil Kanungo: How do you evaluate that the results are good enough to use? Should you try again? Should you change your query? And if you [00:06:00] should try again, when do you stop? When, uh, like how, how long does the agent go before it stops? What tools can it use available to it? And Dylan's actually done, um, uh, self-correcting agent loops, um, that, uh, he can talk a little bit about.

Neil Kanungo: So Dylan, I'm gonna hand to you to talk a little bit about what you've found.

Dylan Couzon: Um, yeah, absolutely. Thank you. Um, so my first point, uh, that I'm gonna talk about is that, Neil, you just mentioned, uh, evaluations, and I feel like a lot of people are familiar with, like, the general, you know, evaluation frameworks and, uh, evals that you can run on an agent.

Dylan Couzon: Uh, but also in the information retrieval space, we have a lot of our own, like, evals and metrics like NDCJ, MRR, that are, like, really relevant for Uh, you know, search pipelines and that, that we use extensively. Uh, but going into your, uh, what, what you mentioned about like self-evaluating agents. Um, so, you [00:07:00] know, there's a lot of techniques, uh, for, for that.

Dylan Couzon: You know, the most obvious one is to spin up like a small agent and say, "Hey, was that result relevant? Did it contain, you know, all the information that, um, the user was looking for?" Um, but the problem here is that, you know, you add another like LLM round trip, so it is extra latency, extra cost, and, and it makes your pipeline slower and more complex.

Dylan Couzon: And so something that we've been working on was, uh, computing statistical, uh, signals that, you know, very cheap s- signals, uh, based on the, the results. And so, so for example, you know, when, when you retrieve, um, you know, using like dense or sparse embeddings, basic- basically, uh, you get like a, uh, a score a- as a result.

Dylan Couzon: So for like dense embeddings, it is between, uh, you know, like zero and one, which one being like the most confidence. [00:08:00] And, you know, you re- return like the top K results, so the top five or the top 10. And basically, uh, you can do some statistic based on like the spread of those scores or, uh, how f- the difference between the top, the top one and the top second results.

Dylan Couzon: And basically from here, try to create a programmatic, um, you know, signals that will, that, that will be able to tell you if, uh, the quality of your retrieval was good or not without virtually any adding any compute to your pipeline. And then o- once you, you have that, that signals, that signal that tells you, you know, the, the retrieval was g- good or not, then you can basically route your retrieval agents, uh, to like different paths.

Dylan Couzon: Uh, and you can choose, for example, to do like, um, more expensive retrieval method, like a cross encoder or late interaction model, [00:09:00] or if the result was not good enough, uh, or not, like not good at all, at all, it's like, you know, there was like huge divergence in, in between like all the scores that you retrieved.

Dylan Couzon: This is when you can i- invoke an additional LLM. So the f- uh, in our testings, this was like really a good way to tell apart like good from bad retrieval in a way that is really cheap and allows you to spend more money only, only when, uh, it was necessary. Uh, and then, uh, we also have some- something, uh, on that topic on skills, so I'll pass it over to Jenny.

Evgeniya Sukhodolska: I think it's super funny we're working like, you know, a, a, actually a multi-agent chain out there because we're passing tasks to each other. Mm. And Demetrios, you're the evaluator of how the pipeline is going, right?

Demetrios: Nice.

Evgeniya Sukhodolska: So we're just emulating search and informational retrieval from our heads in the going.

Evgeniya Sukhodolska: And yeah, like, the [00:10:00] little addition is I think everybody knows about the agentic skills also, right? Um, so they're also applicable in search. The problem is, um, agents need to know how to use search, as humans actually sometimes also do it bad because they also don't know how to search properly for something, but they do it differently.

Evgeniya Sukhodolska: With agents, there is a good solution of teach them how to approach it, is to show them skills. For example, we, um, designed some bucket of search engineer skills which we're trying to maintain in Qdrant, which explain our agent, which approaches our vector search engine, on what to use and how to combine it to get different outcomes.

Evgeniya Sukhodolska: For example, if you have a problem that your search doesn't show for you as an agent relevant results at the top and you feel like there is something more to it, then there is a specific markdown part of the skill which [00:11:00] tells, "Hey, you should do this, this, this with us, with our APIs." Um, I think we can share the link later, or I could try to share the screen, but I'm so afraid to share.

Evgeniya Sukhodolska: Mm. You know- ... it's the hardest thing. That's why, like, I'm senior developer relations, but I never manage to share a screen without sharing my private chat, so let's see. Worst case, we're just dumping a link and showing how the skill evolvement for the search pipelines is also helping our agents to use our search infrastructure properly.

Demetrios: Classic. Okay, I've got some follow-up questions, and there are some awesome questions that are coming through in the chat. But, um, Eva, I wanted to give you some time in case there's anything you wanted to tag on here

Ewa Szyszka: Uh, sure. So, um, Jenny mentioned a little bit of how we can teach agents how to know, um, search.

Ewa Szyszka: So essentially when we [00:12:00] added the, uh, agents, we moved from a very static system of RAG to something much more dynamic. But actually the search primitives, uh, that are underlying remain the same. So one research paper, uh, that came quite recently, um, last month, that I found really interesting on the topic is called SIRA, which is the Super Intelligence Retrieval Agent.

Ewa Szyszka: And the whole idea there is to, um, achieving super intelligence, which is when the multi-round search is compressed into a single corpus retrieval. So we've essentially avoided the latency that Dylan was, uh, uh, talking about, the, the latency lags. And, um, what they've done that was quite relevant to, to our today's topic is they figured out how to use the LLM to enrich the documents, so have this iterative loop, uh, whenever the search vocabulary was missing.

Ewa Szyszka: So they looked at the [00:13:00] queries that were generated, um, evaluated what queries that were missing, and essentially predicted omitted vocabulary and added to it before, um, they passed it on to more traditional search paradigms like, uh, BM25.

Demetrios: Okay. There's a lot to unpack with all three of you saying this stuff, because basically

Demetrios: Let me just say what I caught real fast, and then you can correct me if I'm wrong. There ... Dylan, you're talking about different ways that you can use statistics to leverage and get better results with the retrieval, and with the agents recognizing what to grab and what to retrieve and what is useful.

Dylan Couzon: Yep.

Dylan Couzon: Um-

Demetrios: Tell me what-- Yeah, tell me what you were-

Dylan Couzon: Sorry, sorry. Uh, yeah, so basically it was like a cheap way to, uh, get early [00:14:00] signs into, uh, bad or good retrieval without, um, you know, a-adding any latency or cost to your pipeline, and then routes dep-- uh, based on those signals

Demetrios: All right. Then from there, Jenny, you mentioned using skills and how you all at Quadrant have created a bunch of skills.

Demetrios: Are those open source? I didn't catch that one. That was the quick

Evgeniya Sukhodolska: Everything that we do, everything-ish that we do is open source. Right. So yeah, obviously.

Demetrios: Drop that into the chat here, and I can relay it to the, um, chat or put it in wherever, whatever chat you want, but we definitely... I wanna see all of those skills.

Demetrios: And lastly, Eva, you were saying that there is a paper on the super intelligent retrieval agents and the way that they're able to leverage, uh... I didn't catch. Now, now my memory's starting to [00:15:00] fade, uh, trying to hold all these things in my hand, head. The super- Yeah ... intelligent retrieval agents were doing what now?

Ewa Szyszka: Um, so it's just a framework of how do we know that we've built an agent that essentially does successful, um, retrieval.

Demetrios: Incredible. Okay. Cool. So before I hit some of these questions from the chat, and, and I'm just, like, looking at the chat about... Big question that's coming up is like, "Hey, what about ground truth?

Demetrios: How do we figure that out? What is..." 'Cause w- w- to know if the retrieval is correct, right, we also need to know what the, the actual thing is, the ground truth that we're looking for. Does anyone have opinions there?

Demetrios: I see some heads shaking

Evgeniya Sukhodolska: I would say, I would say a proper bunch and, uh, each of us [00:16:00] has, like I would say, different, uh, sides of it and also about what ground truth is. I think, um, our problem is that, uh, we got the agentic systems and we got super excited and we're like, "Are they solving search fully?" But the problem is that evaluations weren't fully solved before with the classical search.

Evgeniya Sukhodolska: Now with agentic search, it gets even more tricky because you're like, is the ground truth that the agent successfully fulfilled the whole task and arrived to the right point, or is there is a ground truth on the each point of the iteration of what it does in the process? And there are several approaches to it.

Evgeniya Sukhodolska: There is, uh, ground truth in retrieval, as probably people know can be created from some golden datasets per se. So some from the data that you already have and you know that this question should get this answer from your dataset. [00:17:00] Usually, um, in production it's not so easy to get it but, uh, LLMs got pretty good at helping to generate the synthetical datasets with golden truth.

Evgeniya Sukhodolska: The one tip and opinion that I have as the person who previously worked in crowdsourcing, um, as a devil of crowdsourcing approach, so where the humans were gathering these datasets, is that don't immediately expect dumping your productional data to LLM, that it will create your ideal golden set.

Evgeniya Sukhodolska: Evaluation and creating the golden truth still is the work where you need to be in the dialogue, like you're in a dialogue developing a code project. So the relevance notion, the ground truth notion still comes from you as a domain expert, and that can be, like LLM could be used as a tool to develop the ground truth dataset that you can then inject in your evaluations.

Evgeniya Sukhodolska: But when it comes... I know I have too much to [00:18:00] say. One little thing, one little- No, no. Keep talking ... one, one little thing. Um, but when it comes to the whole thing as, um, the whole pipeline to be successful for you, like solving the task, um, there are very different approaches on how to teach model to do the correct search trajectories, which goes a little bit in more into reinforcement learning domain.

Evgeniya Sukhodolska: And I feel like there is a lot of emerging there, teaching agents to do the right decisions in search. And there are the ground truth is the result being correct given the task, given the search task, the search input. Maybe your audience knows Ralph Loops, so they're also to some extent applicable to this search.

Evgeniya Sukhodolska: But Dylan here is coming from ARISE, and he was doing evaluations as, as the bread and butter, so...

Demetrios: Well, so Dylan, this [00:19:00] dovetails nicely into one of the questions that's in the chat. Um, someone was asking about how teams are versioning and evaluating retrieval pipelines in production.

Dylan Couzon: Um, yep, absolutely. Um, so you know, something that I want to preface for, preface with first is that we are a vector search engine, so you know, we only provide only one of the cogs in, in the machine.

Dylan Couzon: And you know, this is really by purpose. We don't want to be selling, like, or providing, like, a retrieval agent. We really want to s- stay very close to the metal and provide, like, the be- the best retrieval engine possible. Uh, but when it comes to those kinds of, like, evaluations and versioning, um, I will usually suggest to use, um, like, an, uh, evaluation framework or platform.

Dylan Couzon: Of course, you know, I cr- come from ARISE [00:20:00] AI, so I'm, um, I... my, my opinion might be biased there. Uh, but there's a bunch of, uh, free, uh, evaluation tool- tooling that can help you, um, do that, that versioning there.

Demetrios: Awesome. Uh, and I think, Andre, you had some other, uh, thoughts about this too. Where are you? I gotta change the view to see you.

Demetrios: Hi. Hey, there he is.

Neil Kanungo: There we go, yeah. Andre's been doing, uh- I've

Andrei Cristea: got a lot of, uh, space, uh- Yeah ... on the screen. So I, I would like to say that, uh, from my experience, what can also help if we jump back for the, for the golden set, is when you kinda have some kind of assumptions based on the human, uh, uh, on some human manual labor work, what can be good for your golden set, so that the LLM has some kind of, uh, criterias that it can evaluate about.

Andrei Cristea: And I would like to say that it's like an iterative [00:21:00] process in which you try to improve your LLM as a judge, yeah, time from time, so that the golden set that it is creating and the evaluation in general becomes better. So it's a long process. It's iterative process that just becomes better as you train it more.

Andrei Cristea: Yeah.

Demetrios: Awesome. Well, we've got more questions in the chat. Uh, see the... Let me just, let me just grab this. So Eva, "Missing search terms are graphed in dynamically. How does that handle prompt injection shenanigans, if at all?"

Ewa Szyszka: Ooh, this is a good question. So in the paper, and, uh, feel free to share the link in the chat, um, hopefully you have it.

Ewa Szyszka: I, I haven't found the cybersecurity aspect of it, but I think this would be like a really nice segue [00:22:00] to, um, agentic harnesses. Uh, and as you design your system, um, retrieval will not solve everything. You still need to add tools onto, um, your agent that would make it more secure. But I think this is a really important design principle to, to consider, uh, because prompt injection is definitely a very serious, uh, serious threat.

Ewa Szyszka: And I think especially recently we had, um, OpenClaw and Hermes, um, agent that came out, and with OpenClaw there was a lot of hype, but also we saw when we did not properly sandbox it, that it was able to just go completely wild and access all of our data. Um, so yeah, I think that is not specifically mentioned in that paper, but consideration that, um, every single good, uh, agentic retrieval system should have.

Demetrios: Yes. So let [00:23:00] me continue with a few more questions from the chat, and then we will go on to the next topic that I had in my notes. There is, um, my biggest pain point from John is he's saying, uh, "Biggest pain point I've had with RAG stacks is defining edge and entities dynamically with LLM analysts. Can you speak to different approaches to that problem?"

Demetrios: And I'll just throw that to the crowd. Anybody have strong thoughts?

Evgeniya Sukhodolska: I agree because I've heard it a lot on the conferences, but I believe it's, uh, more of the graph ontology construction problem. Uh, and we as the vector search engine don't do the ontology and construction. We usually combine, uh, because in many, many domains it's a very good complementary thing. For example, in memory of agents, [00:24:00] it makes sense to mix both graph and vector search.

Evgeniya Sukhodolska: That's how many agentic memory providers do, um, for example, Cogni. Um, but, uh, I've heard at least from practitioners on the conferences, and don't quote on me, I would ask Neo4j guys on that, that, um, if you atomize the task of ontology instruction in the sense you don't try to build the whole, um, ontology of the domain, but you do it bit by bit, uh, in the more atomized setting, that actually performs much better, like the outcome is much better.

Evgeniya Sukhodolska: Um, but I also know that Andriy before worked with ontology, so maybe you have like a little point of view on that.

Andrei Cristea: Uh, yeah. So can you repeat the question again so that I, I have the latest, uh-

Demetrios: The biggest pain point [00:25:00] I've had with RAG stacks is defining edge and entities dynamically with LLM and analysis.

Demetrios: Do you have any approaches on that?

Andrei Cristea: Oh, I would say that, uh, yeah, the problem is that, uh, these are, uh, kinda as, uh, as Jenny already said, some kind of ontologies that you try to provide, uh, to the vector, uh, search and to the semantics. So you, you, you kinda try to put the explicit semantics into the vector semantics. And I think these are laying down two different, uh, layers, yeah.

Andrei Cristea: One is application layer, and another one is, uh, is, uh, more of a implementation, right, of the technology. So I would say, uh, oh, right now what you can do is basically just, uh, like the, the current, uh, state of art we can say is that [00:26:00] there are lots of, uh, uh, GitHub, uh, profiles and GitHub, uh, projects that are trying to combine Neo4j approach, yeah, with explicit entities and, uh, uh, also nodes with, uh, something that is, uh, vector-based.

Andrei Cristea: And there are some, uh, successes there because, uh, what does the graph tell you? It can tell you the relations, it can tell you the dependencies, and in some sense it can give you additional context that semantics not always can get, you know. Because like you can have something that is 2000-- from 2001 and something from 2014, and you want something specifically that is from 2014 to be more, uh, aligned on top.

Andrei Cristea: Uh, you can, you can fix it of course with, uh, some custom, um, some custom scoring, some [00:27:00] custom, uh, uh, relevance metrics that you provide by yourself maybe. Uh, a-again, it depends on the domain. But I would say there are these approaches that they basically try to combine, uh, vector and graph, uh, in one, uh, stack.

Andrei Cristea: Yeah. Or you can try to encode in some way, uh, uh, information into the rag, but yeah, it's quite hard I would say. Yeah.

Demetrios: Awesome. Well, let's keep it cruising because I wanna get into memory, and it feels like that is a perfect segue into memory and how folks are approaching this, how you all have seen the best in the business do it.

Demetrios: And especially like I know there is a lot of talk about long-term memory and the architectures that you have for those, and also if they are that valuable because sometimes I will not necessarily want something to be [00:28:00] remembered, but it gets remembered and then for every single question that I'm asking my agent, and I'll give you a concrete example of this.

Demetrios: I told, uh, I told the chatbot that I was vegetarian, and now for things that have nothing to do with food, it says, "Well, given your vegan lifestyle..." And I'm like, "I asked you a question about my taxes. Get the fuck out of here. Why would you need to reference my vegan lifestyle?" So anyway, that's just a little bit of a tangent.

Demetrios: Maybe we can talk long-term memory or we can also-- I'm gonna throw different things out there like episodic memory. Do you wanna go event-based versus semantic memory? How do you do the factual versus conceptual? Who wants to take this one? I'll throw, throw it up there and let anybody rise to the challenge

Evgeniya Sukhodolska: Let me try to jump in because I relate [00:29:00] a lot to this part about the vegan taxes. I think it's the classical, uh, you know, LLMs like, "Please don't think about elephants." I think Dylan taught me that. And then immediately whatever you do is going to recall this elephants. That's why it's also remember when you do anything around memory is also to think about forgetting as a conception, but it's a very hard one, an interesting one, and there are like several toolings on how to do that.

Evgeniya Sukhodolska: Um, so basically whatever I wanted to say is very quick is that I think People out there probably tried Hermes or, uh, OpenClaw agents, because who didn't? It was my first wow effect, to be honest. Um, and we recently, recently being like day before yesterday actually, uh, it's, you know, hype-driven development a little bit, released a plugin which allows to store episodic and semantic memory, um, of a Hermes agent in Quadrant.

Evgeniya Sukhodolska: [00:30:00] And I don't have a super sick demo or anything, um, but I chat with my Hermes agent a little bit, and some of the memories of me trying to prepare for this MLOps meetups got actually stored in my Quadrant cluster. I can show it, but before also, like a quick note in case somebody just is not very well-versed with the concept of different memories.

Evgeniya Sukhodolska: There is a working one, which is the context window, the chat you're in. There is a semantic or factual memory. It's the one that something is true about you. For example, that Demetrios is, uh, vegan and likes to do taxes. Uh, do you? No? Okay, doesn't like to do taxes. Then, uh-

Demetrios: Yeah, half that is right.

Evgeniya Sukhodolska: Happens to all of us.

Evgeniya Sukhodolska: Then there is a procedure, procedural one. It's actually more like skills, so what to do in order to achieve something. And the fourth one would be episodic one, this perfect thing for the vector search actually. It's all the recalls of what you have discussed in the [00:31:00] past, and that it could help your agent to understand how to deal with this stuff better considering all its private, uh, previous knowledge.

Evgeniya Sukhodolska: So I wired my Hermes agent to my Quadrant cluster specifically for that. To show you, let me try to demonstrate it. It's going to be a very lame demo, but we have a cooler one, so don't get, don't get discouraged. Um-

Demetrios: This is nice because people were asking for demos in the chat, and so I think I just asked in our background chat if anybody else has one.

Demetrios: Dylan, you have one too, huh, that we can throw later on? All right, cool. Yeah, I'll follow up

Dylan Couzon: with a, with a demo as well.

Evgeniya Sukhodolska: Okay, I'm g- I'm gonna tell you, you're gonna love Dylan's one and you're gonna maybe like mine. What am I currently showing? Is it my Slack or is it like UI of collections in [00:32:00] Quadrant?

Demetrios: Hold on, I gotta check it out. Uh, I see this Oh, no, us. So yeah, Qdrant. You're good.

Evgeniya Sukhodolska: Qdrant. I'm good.

Demetrios: You're good.

Evgeniya Sukhodolska: Okay. Uh, so as you see, this is a very impressive memory of 37 points approximately in Qdrant. But basically what happens, uh, all of the facts that I have in the conversation with my Hermes agent, which I'm not gonna obviously share because I ask embarrassing questions all the time, um, that's why I wipe the memory.

Evgeniya Sukhodolska: So all the facts and all the turns in the conversations are saved, uh, with a different metadata of the sessions that they happened in, and that will help agent to recall some similar information. For example, I'm a big fan of this GEJPA approach, which Jan Lekun is recently saying that it's the next big thing for embeddings.

Evgeniya Sukhodolska: So agent can find similar memories. You can see [00:33:00] that with dense vector search, for example, about this method, you will be able to find some other memories which are kinda about the same fact. Um, and this plugin, um, you can also visualize kinda the memories. There is not so much to see yet because there is not too much memories.

Evgeniya Sukhodolska: But we can see that, for example, with the semantical embeddings, there are similar memories about me preparing some smart facts about the transformer's attention window really being very important for context windows, being very important for vector search, and that's why vector search, um, won't be dead as much as RAC can become dead at some point.

Evgeniya Sukhodolska: Um, and if I would start chatting now with my Hermes agent, for example, that, uh, I will add the fact that, uh, well, that Demetrios is vegan, so now all of [00:34:00] my inputs are also going to be polluted about this. No,

Demetrios: it was the other one, uh, that was true. I like doing taxes.

Evgeniya Sukhodolska: And Demetrios likes to do... I'm not showing you my Telegram where I'm chatting with Hermes, but that's what's happening.

Evgeniya Sukhodolska: So technically, if the demo gods are nice to me, we should see that the points are gonna get updated at some point. But if they are not, the demo gods are just not nice to me, and you will see the nicer demo.

Demetrios: We can come back.

Evgeniya Sukhodolska: We can come back because I think my Hermes agent obviously decided to sleep in the moment I decided to demonstrate something.

Evgeniya Sukhodolska: But the general conception is that basically you have your memory bank because information about you that you do now will grow and grow and grow. At some point, MD files won't be enough to recall all of the information, and then you need some, well, index which will organize these memories [00:35:00] Let them forget and let them be surfaced in the right moments.

Evgeniya Sukhodolska: And I think this is very important, where are we going, because the information, as Neil said at the very beginning, grows and grows and grows. Uh, Demo got absolutely obliterated me, but I know that Dylan has a very cool one, so everybody can forget this. And- Yeah.

Neil Kanungo: I, oh, go ahead. Sorry. Were you finishing?

Demetrios: I think, um, we've got...

Demetrios: Well, I wanted to talk for a minute about forgetting and that whole thing, because I know that can be very difficult, and there's also a question in the chat that I wanna bring up, uh, about, like, vector databases still being the preferred choice for long-term memory. So maybe, Eva, can you talk forgetting real fast, and then if you have any thoughts about vector databases.

Demetrios: I can imagine I know which way you're gonna fall on that one, but, uh-

Ewa Szyszka: Let's, let's go. Um, hopefully you won't forget that [00:36:00] one.

Demetrios: Yeah.

Ewa Szyszka: So, um, yeah, I mean, Jenny showed what's, like, the benefit of, of memory and, uh, dealing with all of that. But, um, I think memory is also polluting a lot our context window. So, uh, forgetting is actually a massive and really interesting topic of how do we even decide.

Ewa Szyszka: Um, being vegetarian might not be relevant when you file your taxes, but on your next session you might be trying to optimize your diet, and that information will be very important. Um, so this is the world of decay functions and also relevance feedback. So agents, um, have this nice thing that you can iteratively, um, tell them what exactly in that particular search qu- query is relevant.

Ewa Szyszka: Um, and you can boost that. So this is the ability where I think vector search actually really shines because you can, uh, [00:37:00] figure out for that particular session what information should I include in my decay function and just forget, and that can be based on temporary timestamp or that could be filtered by, um, keywords, et cetera.

Ewa Szyszka: Uh, and then there's another thing, which is some memories actually over time that you're putting into the sections, uh, into your sessions, might, uh, be duplicates. So that's another really important thing, is in order not to completely jam-pack your context window and actually get out of your engi- um, agent what you want, you can use vector search to de-duplicate those memories so you have very clean context window, and next time you're filing your taxes it's just that, and you can boost it and decay based on certain filters and scores the information that's not relevant for the session.

Demetrios: That's awesome. Neil, I feel like you had some things to say.

Neil Kanungo: Yeah.

Demetrios: I [00:38:00] haven't talked to you in a while.

Neil Kanungo: Yeah. Um, I don't have anything insightful. I just wanted to um, hand over to Dylan with a little bit of context. We're talking about memory and forgetting and search, and we, we weren't planning on showing this demo on...

Neil Kanungo: But since there were requests for demos, we'll go ahead and show it. Um, this is a demo Dyma- Dylan's gonna show on actually on-device search. And so when we think about memory and forgetting in AI, like physical AI is becoming a really interesting area, and how agents and AI on physical devices need to be able to remember certain context, um, be able to use that context, and then how humans can interact with that is all something really cool that Dylan can, uh, can show.

Neil Kanungo: So yeah, Dylan, handing over to you.

Dylan Couzon: Absolutely. Thank you. And yeah, so you know, we're kind of like steering away from like the agentic topic. [00:39:00] Uh, but this is the good thing about vector search is that it's not only limited to, to RAG and agentic. And so we're, today we're, uh, so I'm gonna show- showcase you a project that I built for a robotic use case.

Dylan Couzon: Um, so can you guys see my screen okay?

Demetrios: Not yet. Not yet. Hold on.

Dylan Couzon: Without you.

Demetrios: Let me do my job. There we go. Now. All right.

Dylan Couzon: Perfect. Um, so this is like, just like the, the GitHub project. It is completely, uh, public and free and open source, so y- if you guys want to, to run this yourself. Um, so today we're gonna run it on like a prerecorded video, but it works with like any, any kind of input.

Dylan Couzon: You can connect your, your camera, and it will start, uh, working live.

Demetrios: Can you make it a little bigger? Y-

Dylan Couzon: yes, absolutely. There we go. Now we're getting there. All right. Um, so basically we're starting with zero. Uh, that agent knows nothing about the world. It does not [00:40:00] have, like, a database of, like, labels of what i- items looks like, of what is a chair, what is a floor lamp, what is a coffee table.

Dylan Couzon: And so we have, like, three kinds of models that run in parallel. So it's all, all running on device. It's all running locally. Um, you know, if I turn off my network right now, you would lose me, but I would not lose, uh, th- that product. And so basically what- what's happening here is that we have first a, an image recognition, uh, an object recognition model that is called YOLO, and then a second model that is an image-to-text model, and this model basically creates a label and also a description for each item that it sees.

Dylan Couzon: And then what we can do is that basically we can do semantic search on every item that, uh, the robot has seen before. And so, you know, it started with, like, a completely clean memory. When I s- you know, when I open up that, that app, that [00:41:00] robot has never seen anything before, and now it's able to recognize every single object and memorize them and then recall when and where it saw those objects.

Dylan Couzon: So, you know, the applications for roboti- robotics are pretty, pretty much endless. And here, basically you can... So, you know, this graph is a 2D representation of the actual embedding space. And so you can see the m- uh, the robot, the robot's brain being built in real time, and you can see, uh, all the memory of basically all the concepts that are close together.

Dylan Couzon: So you see, like, all the hallways are grouped together. Here we can see, like, there's, like, the dining table and chairs that is grouped together. Here, you know, we can see that we, we went into the bathroom so that m- those memories are pretty far away in the embedding space. And then you can really re- recall, um, you know, everything that you has, that you have seen before, [00:42:00] and the, the robot will be basically to, uh, able to recall every single mirror that he has, uh, seen.

Dylan Couzon: And yeah. And so, you know, this is just to... So, you know, we're, we're not selling, uh, embedding models. We're not selling vision r- uh, uh, models. We're really just that memory layer. We're just, we have a model that creates embeddings ba- based on those image and text, and we allow for very quick search and recall on those memories.

Demetrios: That's so awesome Whose house is that? Is that your house?

Dylan Couzon: Oh, I wish. I

Demetrios: was gonna say, that you negotiated a pretty good salary if that is your house.

Dylan Couzon: No, that, that was just, you know, like, an online tour from, like, a, a leasing agent or a realtor.

Demetrios: Oh, that's awesome. That is so cool. The fact that it's on device too with these, the [00:43:00] small models, is really impressive.

Dylan Couzon: Um, yep. So, so you know, this was, like, a way to demo, like, Qdrant Edge, which is the embedded version of Qdrants, and this version can run on a Raspberry Pi, and not even, like, the top-of-the-line newest one. I have one from, like, five years ago that runs it f- Hmm ... just fine.

Demetrios: Hmm. Awesome. Now, there was a question that I said in the chat I was going to ask, and I totally skipped over it, um, I told folks.

Demetrios: So we're- I wanna get to it. It's, um, asking about vector databases still being the preferred choice for long-term memory, or are knowledge graphs and structured memory stores gaining traction?

Neil Kanungo: Um, I, I would say, um, from what I'm seeing, it's not an either/or, but a both. Um, [00:44:00] where, where th- they both have their uses, um, in different contexts, but, you know, like, dense, dense vectors, dense embeddings are gonna give you a lot of like semantic meaning, but like vector stores are not really the best for like relationships between, uh, those different entities.

Neil Kanungo: And so using like we, we have a great partnership with Neo4j, um, and, uh, there's a video I put in the private chat, maybe you can sh- share Demetrios, but like GraphRAG and using knowledge graphs with vector, uh, stores together, uh, can make a really good substrate for memory overall. And just to kinda like shout out a couple other, um, companies doing this is like Cogni and Memzero.

Neil Kanungo: Um, Cogni has, um, they use, um, both like transactional data or, well, they use like transactional data. They use, uh, vector data [00:45:00] stores, then they use, uh, knowledge graphs all together. So you can check those out too

Demetrios: Nice. Uh, a question came through about the demo, and it was mainly about how detailed all of these captions are.

Demetrios: Like, are you getting things down to the level of types of material and style? Or is it just coffee table?

Neil Kanungo: For Dylan.

Demetrios: Sorry, can you repeat that question? You were looking at the chat, huh? You were, were busy or you were rewatching the demo thinking, "Damn, this is a really good product that I made."

Demetrios: Pat yourself on the back, man. He's like a really cool house. H- h- how do you know me so well? That's well played. So, so basically, how much detail do you get from those different captions or whatever it's being labeled? Is [00:46:00] it going down to the like, hey, this is a postmodern coffee table made out of glass, or is it just like coffee table?

Dylan Couzon: Yep. Uh, so it really depends on like, you know, the image to text model, uh, that you, you want to use. Uh, so I use, uh, SigLIP 2, if I remember correctly, which is a very like, you know, basic like open source model that can, that can run on your, on your device. And so that model will usual- And I did not try to ask for much in-depth definition.

Dylan Couzon: Um, you know, I was just looking for coffee table, table here. So maybe that model has the capability to have a more in-depth definition. Uh, but if not this one, I'm sure that, um, you know, image to text models are very good these days. And if you want a really like detailed, um, explanation of, or like description of the object, I'm sure this is something that can be done.[00:47:00]

Demetrios: Nice. So we've got, um... Uh, I wanna bring us back a little bit to the topic du jour, which is agentic retrieval. We were going hard on that, and the chat was loving it, and then we also, uh, the chat was asking for some demos, and we got some cool stuff. But There's a lot that we can talk about still with agentic retrieval, and so maybe we can veer in that direction.

Demetrios: Neil, I feel like you have some things you wanted to say.

Neil Kanungo: Yeah, sure. Um, I can, you know, I, I think we got, like, roughly 10 minutes left, and so, um, as we center back on agentic retrieval, some general things that I think the audience should, you know, some ground-setting, ground-leveling things that I think the audience should all be aware of [00:48:00] with vector search is that, um, even whether it's pertaining to retrieval, uh, agentic retrieval or just any retrieval, it's not only semantic search, and this goes back to the Neo4j and Graph Rag kind of question, too.

Neil Kanungo: But, um, vector search is really vectors and embeddings are gonna be the highest density format in which you can store information. You're taking, in dense embeddings, you're taking tons of unstructured data, and yes, you're storing that into a fixed dimensional, uh, vector, but there's also sparse embeddings.

Neil Kanungo: There's also-- There's different embedding models. All vectors are is a format, and you, as you use different embedding models for images and videos and memory conversations and all this, like, embeddings are just a way, they're a vehicle for this. And we can do keyword search and lexical search, and we can do all these different exotic things or maybe [00:49:00] exotic's the wrong word, just advanced skilled ways to find the needles in the haystack.

Neil Kanungo: And, um, that becomes really important for agentic retrieval because just centering back on agentic retrieval, you want to have access to information the most efficient way. And we, we have looked at, um, file search and other approaches, and there are places for file search. There is not necessarily you must use vector search for everything.

Neil Kanungo: Um, you must set up an embedding model, and you s- you must set up your chunking. You must do that for everything. It may not make sense in certain scenarios. But when you-- As we start collecting more data and our, our agents become more sophisticated, our, um, their search tools become better, I think that, um, vector search has a really strong place there.

Neil Kanungo: And I also, like, intentionally use that term vector search instead of vector database because, um, you want a [00:50:00] system that optimizes for Not just storing and, um, not just storing and having like a blind retrieval method to get your vectors out. You wanna actually have something that is focused on the retrieval side of the equation, the search side of the equation.

Neil Kanungo: So like that's why at Quadram we consider ourselves a vector search engine. Um, but yeah, I'll stop kind of rambling there. I don't know if anyone else wants to add thoughts to, um, where they see vector search fitting into agentic retrieval overall.

Ewa Szyszka: I can jump in on the scale. So, uh, Neil mentioned how, like, we're operating at millions of billions of, of vector scale, and search will certainly get much, much bigger. So early search primitives, like the BM25s of, of the world are already five orders of magnitude smaller in terms of memory that they consume on the [00:51:00] machine, uh, with the latest one.

Ewa Szyszka: And, um, another research that I wanted to share with you comes from, uh, the group, research group called SID 1. Um, they have a really interesting approach, and they think that reinforcement learning, uh, for agentic retrieval is the way to get to that next five order of magnitude efficient, um, agentic retrieval when the context window gets bigger, when the number of vectors that we have to parse through gets much, much bigger.

Ewa Szyszka: Uh, so I'll share the link so you can read, um, into their approach, uh, and why reinforce- reinforcement learning exactly.

Demetrios: Yeah. I know, Jenny, you had mentioned that earlier too. Maybe, um, Eva, can you talk a little bit more to that? Because I like this idea. We were talking about RL too for evals and being able to create these, uh, environments which are [00:52:00] Super trendy these days, I guess you could say.

Demetrios: And so you have these environments, and then you can eval them, but the RL for retrieval, what does that even... Like, break that down for me a little more.

Ewa Szyszka: Sure. So what I got out of that paper is that the TLDR is basically they're using a multi-turn RL, and it's a mixture of synthetic and real questions. So Jenny mentioned before the golden dataset, and the reinforcement learning approach would be not that we have, like, human-generated golden dataset, but it is the environment where you learn, you interact with over time, um, with some real questions that resemble that golden dataset, and some of them are completely, um, AI generated.

Ewa Szyszka: And there needs to be a reward system, uh, designed in place that rewards every single time that our agentic [00:53:00] system hits the correct answer and punishes it whenever, um, the learning step was not made correctly. Uh, so the paper outlines how you can design those, but, uh, I, I won't be diving too much into detail there.

Ewa Szyszka: So that's the, um, that's the high-level overview of, uh, what exactly they're doing there.

Demetrios: Yeah. That is so cool to see, and I feel like there's so much potential there. The thing that, uh, I always wonder, and I also see that folks in the chat are asking this, um, it's more along the lines of like, "Hey, I've got this agent.

Demetrios: How do I either set up a framework or encourage it to do different kinds of searches?" or search techniques at different times. So I wanna do some kind of semantic search when there's XYZ. Like, is there a framework? Is there a way that, i- [00:54:00] is it a skill that we were talking about earlier that you can say like, "Hey, agent, here's...

Demetrios: Whenever you need to retrieve something, here's what you should go through"? And that feels like probably the easiest win. I see, Jenny, you're shaking your head. You might have some thoughts.

Evgeniya Sukhodolska: No, because I think it's exactly, you answered the question. There are two ways, right? There is, like, the way, zero shot way.

Evgeniya Sukhodolska: You teach it procedurally, which is, like, the set of skills, the set of search skills, and it has its pros because it's easy enough and you can adapt it easily enough because it's just writing, uh... I mean, it's not just, but it's composing the set of instructions, and we kinda know how to approach that. And the second branch that we are seeing now emerging is to actively teaching agents to search through these reinforcement learning environments.

Evgeniya Sukhodolska: I don't see now easy ways of any practitioner to do that, but I feel like [00:55:00] that might be the next thing in the future because everything becomes more accessible now, right? Uh, so maybe we will get our RL gyms. Do you see my bi- biceps? Yeah. I'm saying the RL gym. But we will get them soon where you can actually give the set of what you wanna achieve and the set of the search tools and your problem, and it will converge to the, like, the small search agent will be able to converge to a path which actually is, well, what you wanna see in your specific pipelines.

Evgeniya Sukhodolska: Um, we're gonna stay and watch that because, uh, we represent kinda the default layer that humans or agents build upon. I wouldn't discard humans still from all of this picture, by the way, because I'm human. So I still, I still sometimes want to search stuff myself. So I think it's just important to have the tooling which will be usable by all the [00:56:00] categories of the search users.

Evgeniya Sukhodolska: I think it was a tautology, but you get me, I guess. Mm. And I think what we were also trying to say that, um, vector search is also so much more than just retrieving you a semantically similar fact. I feel like we kind of fell into this trap of the RAG cage, where you just think about it as the machine that spits out you the text chunk based on your question.

Evgeniya Sukhodolska: But I think there are other emerging interesting parts where it could be used for anomaly detection, data analysis at scale, image-to-text search, videos to whatever, audios, so all of that stuff. And at some point, agents will be also able to use it 3D, and I'm really looking forward to see how they're going to approach this part of vector search.

Demetrios: Well said. I think that is a perfect way to wrap it up. I [00:57:00] know that there are still folks in the chat that are stoked and asking questions. I will just mention that everyone who is in the chat and watching this right now, on your left-hand sidebar, there is a Button that you can press that says Match. If you go there and more than two people do it, you will randomly get put together with somebody else that is watching this.

Demetrios: So you can meet someone that is just about as crazy on retrieval as we are, and you can talk to somebody that is watching this too. So it's a great way to bond with the rest of the community if you wanna stick around for more time. But for this session, I think we are gonna wrap. I know that there are some great places the Quadrant team hangs out.

Demetrios: You all have an awesome Discord, so I'll drop a link for that in the [00:58:00] chat. And then of course, if you weren't, um, already bought into w- everything that Quadrant's doing and following these good folks that are here with us today, you should definitely do that. Go and give Quadrant a star if you haven't already on GitHub, and stay up to date with what all their amazing dev rel is doing.

Demetrios: Follow them on LinkedIn and X and all the places. So thanks everyone for this excellent session. I will bid you farewell. And for those that wanna hang around, click the Match button on the left-hand side bar, and I'll see y'all later. Thank you.

Neil Kanungo: Thanks everyone. Thank you. Yeah, thanks. Cheers. Bye.

+ Read More

Watch More

Current State of LLMs in Production

Posted Oct 18, 2023 | Views 1.8K

# Natural Language Processing

# LLMs

# Truckstop

# Truckstop.com

Exploring the Impact of Agentic Workflows

Posted Oct 15, 2024 | Views 7.9K

# AI agents in production

# LLMs

# AI

The Current MLOps Landscape

Posted Nov 18, 2020 | Views 397

# Interview

# Panel

# Investing

# airstreet.com

# essencevc.fund