MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Before Building AI Agents Watch This (Deep Agent Expertise)

Posted Sep 05, 2025 | Views 6
# Context Engineering
# AI Engineering
# Prosus Group
Share

speakers

avatar
Nishikant Dhanuka
Senior Director of AI @ Prosus Group

Data Science Leader with 15 years of experience in the field of AI. Passionate about building products that are not just innovative and AI focused, but also customer-centric. Having lead and being part of various AI/engineering teams over the years in some of the best organizations; I'm a strong advocate for velocity while also following high-standard engineering best practices. Recently I've found myself completely immersed in Generative AI. With the rapid pace of development in the field, I am excited about the potential it holds. Always open to connecting with professionals who share a similar passion for AI and Data Science.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Nishikant Dhanuka talks about what it really takes to make AI agents useful—especially in e-commerce and productivity. From making them smarter with context (like user history and real-time data) to mixing chat and UI for smoother interactions, he breaks down what’s working and what’s not. He also shares why evals matter, how to test with real users, and why AI only succeeds when it actually makes life easier, not more complicated.

+ Read More

TRANSCRIPT

Nishikant Dhanuka [00:00:00]: People talk a lot about system prompt, tweaking the prompt or choosing different model. What's the best model out there? One of our hard earned lessons is I spent so much time on context engineering with the team. Let me explain what context means. Right. So in my role I build AI agents in these different industries and these agents are kind of used by 2 billion customers around the world.

Demetrios Brinkmann [00:00:24]: Talk to me about what agents you've been building.

Nishikant Dhanuka [00:00:28]: So in general there are two classes of agents that I've been working on. One is agents for productivity. So internally we built a tool called Tocan. You know about it. This tool is used by 15,000 users internally at process. And these are people from finance designers, product managers, engineers and people use it for different things. So we started building this tool three years back. The technology has evolved so much.

Nishikant Dhanuka [00:00:51]: Initially we did a lot of things to make the LLM work so that you know, there was an intent detection at the beginning and so on. It blows my mind that, you know, now you don't need like the model itself is so better that you don't need these things. So sorry. So one, one category is tools for agents for productivity. So tocan. Second is agents for E commerce. And there I have experience in building agents for online shopping assistant. So OLX is one of our portfolio companies.

Nishikant Dhanuka [00:01:19]: So with OLX we built a shopping assistant. The idea is that it helps understand the user intent. So you can say that you know, I want the latest headphone or you know, I'm going for a hiking trip and I don't know what to buy. Help me. So it can take very broad requests from the user. Understand connect to the OLEX catalog. So that's an example. And currently I'm working on a project with in our food delivery business where we are kind of reimagining how people would order food in next year or two years where you know things like not so usually if you think about how people order food right now they go on the search bar.

Nishikant Dhanuka [00:01:59]: You enter burger. Right. But how about it? Also, you know, you can go in, you can say that, you know, I want to have a romantic dinner with my wife and then it understands you, it connects to the catalog and so on.

Demetrios Brinkmann [00:02:10]: Okay, so we've got four themes that we want to touch on.

Nishikant Dhanuka [00:02:14]: Yeah.

Demetrios Brinkmann [00:02:15]: First one which is on everyone's mind is the context engineering piece. You've got some takes on that.

Nishikant Dhanuka [00:02:20]: Yeah, totally. So Andhra Kapathy recently published a tweet where he said that he likes the word context engineering more than prompt engineering. And you know, I was reading it and in my experience in building these agents, I spent so much time together with the engineering teams working on the context. So I was completely. As soon as the moment he published the post, I think the context engineering word got viral. And when I think about context engineering, it reminds me of the day with, you know, when, you know, data engineering, data scientist. You know, I was a hands on data scientist at some point in my career and you know, we always said that, you know, data science or AI is the shiny thing. But you know, if you're garbage in, garbage out, if you don't work on your data, context engineering for me is like that.

Nishikant Dhanuka [00:03:03]: We are doing a podcast. I see people talking on Twitter on different podcasts and people talk a lot about system prompt, tweaking the prompt or choosing different model. What's the best model out there recently about MCP tools and so on. But for example, you know, if you take about the latest models, if you think about the latest models for my use case, it's not really about that model. If I use model A versus model B, and if both of them are state of the art, they already do a good job. What makes the difference between A and B is A is with context, B, B is without context, then A would be much better than B. And let me explain what context means, right? So a lot of times, so imagine a food delivery chatbot, aside from people asking for food. When you put a chatbot in wild, people ask about many different things, right? Imagine you know, you want to order food, free food.

Demetrios Brinkmann [00:03:55]: That's what I would instantly.

Nishikant Dhanuka [00:03:56]: Exactly, exactly. So people care. Different people care about different things. So there are people who care a lot about free food. So cheap food promotions, there are people who care about maybe you have a particular coupon or your payment card. So a lot of time people come to our chatbot, they say that, okay, do you accept this payment card? Show me, show me sushi on promotions right now. Or maybe it's lunchtime and. Or maybe it's breakfast time and I am asking for pizza and none of the pizza places are open.

Nishikant Dhanuka [00:04:24]: And you know, you don't know about my use case as a user. Maybe I'm someone, I'm an executive assistant, I'm planning already for lunch or maybe I want to have pizza. So in this case, for example, the opening and closing hour of a restaurant becomes the context. So if your LLM or if your agent doesn't know about that, then, you know, we have tried this before where if your agent is just good at searching for pizza and it doesn't know about the opening closing hour, it doesn't know about the promotion, it doesn't know about the payment information. If you put it in real world, it's not useful for people because, you know, I care about a lot of these things, users care about a lot of these things. So one of our hard earned lessons is I spent so much time on context engineering with the team, so, you know, I cannot stop talking about it.

Demetrios Brinkmann [00:05:10]: And I don't want to brush over. There's a few different places where it's a very difficult problem that you were just saying when it comes to ordering the food and the promotions, for example, and you were saying that it's not like there's a database that you can query that has all the most up to date promotions. So the data challenge in real time, knowing what promotions are live from what companies and then feeding that into the context window is where the real challenge is. And it's almost, I look at it a little bit like when you're crafting that prompt, you put in variables and getting the proper data into those variables is the unsexy job of the data engineer. Or it's more data engineering like than anything. But now I guess we're calling it context engineering.

Nishikant Dhanuka [00:06:06]: I think we all agree that most of the enterprise data, it might look good from outside, but it's messy. So unfortunately if you go in an enterprise, it's not like there's a database where you can say just give me everything on promotion. So a lot of data is real time. Maybe a restaurant is running a real time promotion during lunch from 12 to 3 on sushi. Right. It's very difficult for a database to have that. So a lot of that information is in real time databases. It's scattered over different databases.

Nishikant Dhanuka [00:06:33]: So the unsexy work here is that I have data engineers in the team, engineers in the team who spend a lot of time kind of connecting all this. So imagine now a user request comes in and they say, show me sushi. On promotion. You kick in these data pipelines, you bring the right context, you make it a part of the prompt and then the response that you get is what the response that the user wants to hear.

Demetrios Brinkmann [00:06:57]: And there's not a way to do this with search that is more simple.

Nishikant Dhanuka [00:07:02]: No. So search is the next topic I want to talk about. But search solves a different problem. So search is more about, for me, search is about when you're, when you're searching for dishes or when you're searching for restaurants. That's what that's A burger, vegetarian burger, pizza or McDonald's. But when you're searching for does McDonald's have promotion? Does McDonald's accept payment card? Does McDonald's, is McDonald's open right now? That is context. So that's the difference for me between search and context. Context is made up of four, four things.

Nishikant Dhanuka [00:07:33]: In my opinions I spoke about one of them. There's another one which I want to cover.

Demetrios Brinkmann [00:07:38]: One of them being the dirty data, getting it the right context.

Nishikant Dhanuka [00:07:41]: When yeah, it's composed of four things. Two of them are simple. Everyone talks about it, system prompt, user message. And I think we have heard enough about, you know, you need to have the best system prompt. User message is the dynamic message the user sends. So you put that in the prompt as well. We spoke already spoke about, you know, bringing the enterprise context, the dirty data pipeline and the fourth context which is very important is user history. And this is where for example, it's connected to memory.

Nishikant Dhanuka [00:08:08]: So you know, there's been a lot of discussion on long term memory, short term memory. The way I think about is there are so many gen AI agents, products out there. Imagine, you know, everyone is building a shopping assistant, everyone is building a food ordering assistant. If even I think about me as a user, I don't have a loyalty for a particular product. I'm using ChatGPT today, I'm using another product next day. For me, something that creates stickiness to a product is if that product knows about me. So If I'm using ChatGPT as an example for last 15 days and now you give me a new tool, I would use it. Unless ChatGPT knows so much about me that.

Nishikant Dhanuka [00:08:49]: Okay, I.

Demetrios Brinkmann [00:08:50]: The switching cost is high.

Nishikant Dhanuka [00:08:51]: Yeah, the switching cost is high and that comes from memory. And that is also part of context where you, you put the user history as part of context.

Demetrios Brinkmann [00:09:00]: Yes, it's funny you mentioned that because somebody said when they were joining the community they were very interested in the problem set of being able to port that memory and that context from one provider to the next.

Nishikant Dhanuka [00:09:12]: Yeah, that's amazing. If you do that then you could switch, right? Because if you do that.

Demetrios Brinkmann [00:09:18]: But it's not like you can just hit like download CSV and then upload it to the next provider.

Nishikant Dhanuka [00:09:23]: I think there's a lot of data privacy issue comes with that because it contains a lot of information about you which you might want to share or not. And again, one last point there is. So I'm sure you have seen these diagrams that people share about memory. Long term memory, short term memory, you know, episodic memory and so on. So memory can be handled in many different ways. It's almost like, you know, when you're having a conversation, maybe there's a model which is running behind the scene, it's noting facts and that's a part of memory. But to be honest, one of the things that we tried, which is much simpler than any of this and it works, is let's say you're building a shopping assistant or a food ordering bot and you already have a product, right? So there's already OLX or iFood. So there's already users who are on OLX who did not have conversation yet, who are on Ifood.

Demetrios Brinkmann [00:10:17]: You've got that data, you got the data.

Nishikant Dhanuka [00:10:19]: You already know what food they ordered, you already know what items they browsed. You can easily use that as a simple cold start context. And I have seen that create wonders where you know, when you launch your product. You don't need to be, you know, my advice here would be when you launch your product because there's already so many things to solve and you know, your obsession should be about product market fit and you know, not having the best technical solution out there, you should just use the context that you have from your existing app and put that in the memory, put that in the context or the system prompt and that already does wonder. That will give you a runtime of three months. And then you start when people start having conversation more, they start using your product more. Then you get some dynamic memory. But I think that's one of the lesson that I've learned where I think the way people think about it already from day one, it's complicated.

Nishikant Dhanuka [00:11:15]: There's a simple solution to cold start.

Demetrios Brinkmann [00:11:18]: Okay, so that is context engineering.

Nishikant Dhanuka [00:11:20]: Yeah, that is context engineering. That's it. Search, Search. Search is such a, you know, so my experience has been building AI agents in E commerce, shopping, food. And search is a fundamental tool when it comes to that. It doesn't apply for all the AI agents out there. Let's say if you're building an AI agent for supplier side, you know, for a car dealer or you know, for a restaurant, search is not that important. But when you're building an E commerce agent, search is the most fundamental thing.

Nishikant Dhanuka [00:11:55]: It's actually the start of the user journey. If you don't get the search right. Let's say if I search for burger and if I get pizza, I drop at that point, you know, it breaks the trust. I, I will, maybe you have other tools to manage my cart, you to do other things. But I will never go further in the journey. So search is very important and I spend a lot of time with the engineering team on, you know, fixing search. Let me talk about few things that, you know, we have learned. So most of the enterprise search is still keyword based.

Nishikant Dhanuka [00:12:27]: And you know, not to put it down, it's for the right reason because keyword based works. If I say burger, there's a taxonomy defined for burger and you know, show me burger. You don't need fancy semantic search to do that. So keyword based works. But if you imagine people searching on a search bar versus people talking to an agent, the chatbot you put out there, the kind of conversation and especially when you have voice enabled, the way people express themselves to an agent, to a chatbot is so different. The kind of queries are so different. I would go and I would say that, you know, I'll already give these examples, right? You know, I want to have a romantic dinner with my wife. Press enter.

Nishikant Dhanuka [00:13:06]: Very, very broad. Or if I'm building a shopping assistant, we have seen people just go and say that I'm going for a hiking trip, I am a beginner, I don't know what to buy, help me, or help me furnish my house. These are the kind of queries that you get. And you cannot deliver these queries with keyword search. So we spend a lot of time thinking about first, semantic search. And semantic search is not new. It has been around. It's a search based on embeddings.

Nishikant Dhanuka [00:13:33]: So you, you know, you know, for the search engineers out there, so you there, there are already standard way of doing these things. Semantic search is hard. It's still hard to crack. And keyword search is still important. So most of the time the solution is something hybrid where you know, you get a query, you look if a keyword search can answer and can answer it. If not, you go to semantic search. So it's not new, but it's difficult and we spend a lot of time talking about it. So let me explain that with an example.

Nishikant Dhanuka [00:14:02]: When I say so I'm vegetarian. When I say vegetarian pizza, if I have keyword search, then I would get items that have vegetarian pizza mentioned either in their title or description. But pizza margherita, that's vegetarian. Maybe, you know, it's not obvious to mention vegetarian in the title of pizza. Maybe the, the restaurant person did not mention it. And then all of a sudden pizza margherita will not. So this is keyword search. This is the limitation of keywords.

Nishikant Dhanuka [00:14:28]: Now let's say you move to semantic search. This can be solved with semantic search because then you have embeddings. And then pizza margherita embedding is very close to the embedding of vegetarian. It's solved. Now let's say, come to the example that I say romantic dinner. That itself, that cannot be solved. So romantic, you know, what is the embedding for romantic? Romantic is different for you, romantic is different for me. That cannot be solved by semantic.

Nishikant Dhanuka [00:14:51]: And we see a lot of these queries which are broad, which are ambiguous, very fuzzy. So this is where again, semantic search is the first layer. But we spend a lot of time building a pipeline of search where when these queries come in, you have a step before search and you have a step after search. The step before search, we call it query understanding, query personalization, query expansion, call it whatever. Where again you use an LLM. So you already have an LLM for the agent. But let's say now you have search as one of the tools within that tool you have a pipeline where the first step of the pipeline is already using an LLM to understand the query. So romantic dinner, it understands.

Nishikant Dhanuka [00:15:30]: And maybe if it has my user profile, it says this is what romantic means for Nishi, or romantic in general. An LLM says that. Okay, let's break it down into, I don't know, cupcake is romantic and so on candle it.

Demetrios Brinkmann [00:15:43]: Yeah, come on, cupcake.

Nishikant Dhanuka [00:15:45]: What the hell?

Demetrios Brinkmann [00:15:46]: What kind of romance are you talking about?

Nishikant Dhanuka [00:15:49]: We have different definitions of romance, apparently. I mean.

Demetrios Brinkmann [00:15:56]: Out of all the ways that I would describe romantic dinner, cupcake was not one of them.

Nishikant Dhanuka [00:16:02]: I hope my wife is not listening to this, but yeah, guilty as charged. So anyway, yeah, so there's a query, there's a query understanding step which breaks down your query maybe into multiple queries. Then you run it through your search pipeline. Keyword search, semantic search. Then there's a re ranking step where again you use an LLM where you know, initially you get a lot of candidates. Then this is, you know, re ranking is again, re ranking is an old concept from the machine learning world where we had these algorithms, ltr, learning to rank and so on. But in the new world of LLM, so those algorithms are still important, but they also, in my experience I've seen them fail when it comes to these new kind of queries where you have a lot of context from the user and there you can have another re ranking layer where again you use an LLM. You say that, okay, this was the query from the user.

Nishikant Dhanuka [00:16:56]: These are the 1000 responses from the first two steps. This is the context about the user, re rank what is. And then you get these three or these 10 options that you present to the user. So that's what our typical pipeline looks like. And again, I can't stress it enough. It sounds simple, but search is something that has haunted me in each of my projects. It's difficult. It's difficult, it's messy.

Demetrios Brinkmann [00:17:18]: It feels like a new paradigm of search too, or just like a building on what was already there and trying to leverage the new technology of LLMs.

Nishikant Dhanuka [00:17:29]: Yeah, totally, totally. Now with all the advancements in LLM, generative AI agents like Agentic Search and LLM helping in search and LLM augmenting search. So you're not replacing. But semantic search is still important, but doing some steps, things before, things after. So that's the new paradigm. And so you're right, like search is really being reinvented with LLMs. And I think, I think very few people talk about it, but if you're building an E commerce agent, search is fundamental and it's very important to crack that.

Demetrios Brinkmann [00:18:02]: Yeah, let's talk UI.

Nishikant Dhanuka [00:18:06]: Let's talk UI first.

Demetrios Brinkmann [00:18:07]: Let's talk UI.

Nishikant Dhanuka [00:18:07]: Let's talkUI. Okay, so again, I have launched many agents, many genai experiences, and typically the way we launch them is we a B test. And I have burned my hands so many times. You know, it's. Again, it's, it's. There's a lot of pain there that I'll give you an example. You know, we want to build online shopping assistant. The first thing we do is, you know, let's build a chatgpt for everything, right? So let's build a conversational experience.

Nishikant Dhanuka [00:18:36]: We build that, we test it internally. I tested it. I spend a lot of time, you know, long hours at night. It's super interesting. It knows you. It's connected to the catalog. You can you say that, you know, I want to furnish my house. It, it shows you furniture.

Nishikant Dhanuka [00:18:50]: It shows you, you know, chair, sofa, separate sections and so on. It's beautiful. It works. You're so excited that you're also so.

Demetrios Brinkmann [00:18:56]: Benevolent when you're using it, right. You aren't thinking, like, how can I prompt this to give me free furniture? You're thinking, how can I prompt this to get the right furniture? You also know what prompting is, so you probably explain it differently.

Nishikant Dhanuka [00:19:09]: Totally.

Demetrios Brinkmann [00:19:10]: I'm already seeing how this is going to fail miserably.

Nishikant Dhanuka [00:19:13]: Yeah, totally. So we are so excited. I'm so excited we launch it. We think we build the best product out there, we launch it, we a B test it. Our test, like it really falls flat on our face. We are like really like I'm refreshing the results and it's terrible. It's not even, you know, it's not even bad. It's terrible.

Nishikant Dhanuka [00:19:33]: Then you think, you know, but it's.

Demetrios Brinkmann [00:19:34]: Terrible because of nobody's using it or.

Nishikant Dhanuka [00:19:36]: The way that they're using. No people are using it, but they.

Demetrios Brinkmann [00:19:38]: Don'T like it and they're giving you that feedback right away or you don't.

Nishikant Dhanuka [00:19:43]: Immediately get feedback, but you can bounce off of it. You can measure the conversion numbers. So A and B does the magic of a B testing, right. So you have a conversion number on A, you have a conversion number on B and you can see the conversion number.

Demetrios Brinkmann [00:19:55]: Something's broken on this data.

Nishikant Dhanuka [00:19:58]: Yeah. You look at the agent and everything is amazing. It's still amazing. So why does it not work? So, and this has happened, you know, again, you know, I'm, you know, talking about failures. It's also important to, you know, embrace failures. And this has happened many times. And you know, my learning here and you know, again, I'm a technologist, you know, I'm not a designer, I'm not a user researcher. So I also learned it the hard way is the way I think about is it is.

Nishikant Dhanuka [00:20:23]: So there are two dimensions. One is technology. Second is user adoption technology. Like I feel that technology today is more advanced than the use cases. It was not always like that. So four years, four years back I tried to do a project where I felt that I'm so ambitious with my idea, but the technology is not ready. These days, whenever I try to do a project, I feel, ah, technology is, you know, few steps ahead. How can I use it? That's the problem.

Nishikant Dhanuka [00:20:48]: So that's technology. But user adoption is not going at the same pace as technology. Think about it. We all use ChatGPT, you know, ChatGPT. It's been three years now and I think it's been internalized. People are now understanding we're learning new.

Demetrios Brinkmann [00:21:02]: Ways to use it. Every day we're practicing saying, oh, this actually I should ask ChatGPT instead of just doing what I normally would do, like ask a friend or ask a doctor.

Nishikant Dhanuka [00:21:12]: Yeah. And you and me, we are probably maybe biased sample that, you know, we are both in, in this field and you know, maybe we are in a bubble but you know, even my daughter uses it. I see my, my mom does something with it when we go in, you know, social circle. People who are not in Technology. I see. So it has already kind of penetrated outside. So I won't say that, you know, it's a hype or it's in a vnera bubble. Like it's going.

Nishikant Dhanuka [00:21:34]: So chatbots for productivity, chatbots for these kind of use cases, general use cases. It's a thing in the world where we live in today and people, more people know about it. But think about it. Do you use a chatbot to buy stuff? I don't. Do you use a chatbot to order food? I don't. And imagine it's been three years. It's. It's surprising.

Nishikant Dhanuka [00:21:57]: It's surprising. You know that. And we all. There's so many demos, people are talking about it. We are at race summit, the hackathon. People build this stuff. Booking.com, there are these assistants, travel planner, all these ideas in our head. But as users, I eat food, I shop for things I go for, I book flights, I book hotel.

Nishikant Dhanuka [00:22:17]: But I, so far, and I'm in this field, I'm obsessed with this technology, but I haven't made a single of this.

Demetrios Brinkmann [00:22:24]: And you think that's a UI problem?

Nishikant Dhanuka [00:22:26]: I think it's a user adoption. It's a UI and user adoption problem.

Demetrios Brinkmann [00:22:30]: It feels like the UI inadvertently introduces so much friction. And it's not the way that we're used to doing things on the Internet for our shopping experience that we say, you know what, it's easier to do it the way that I know how to do it.

Nishikant Dhanuka [00:22:47]: Yeah, exactly. Exactly. So after we failed, we did a bunch of user research. By the way, I have a newfound respect for designers and user researchers. I think now I appreciate. So I worked closely with designers, user researchers in last few years, and I think I understand this field much better now and a lot of respect. So, in fact, these days when we put together a project, you need a designer and user researcher, because technology is one side. You need to understand in the end, you want to solve user problem.

Nishikant Dhanuka [00:23:18]: Right. So after the test failed, we did a user research, we spoke to, we did surveys, we called in some users and some of our learnings. 1. So if you are the user and if I give you, okay, now I give you a new way of doing so. You're familiar with the ui, you go, you use it every day, and now I give you a new ui, it's friction. You will not use it naturally, you will use it if it's really solving a fundamental problem for you. It's really, it's something fundamentally different. Maybe something you used to struggle a Lot that.

Nishikant Dhanuka [00:23:57]: I don't know, maybe you're looking for houzz. And you had a lot of constraints and a search bar was, you know, you were not able to define with the filters and search bar. And now if you go, I give you a voice experience and you just spoke to it, and it. It just understands you, then you use it. But if it's just incremental, if it's a better way of doing search.

Demetrios Brinkmann [00:24:16]: Yeah. And it's a whole new interface.

Nishikant Dhanuka [00:24:18]: Yeah.

Demetrios Brinkmann [00:24:19]: That's a pain in the ass.

Nishikant Dhanuka [00:24:20]: Yes. You would not use it.

Demetrios Brinkmann [00:24:22]: Yeah, I would be pretty pissed, too, that you changed the interface on me.

Nishikant Dhanuka [00:24:25]: Yeah, exactly. So that's one of our first learning, is that when you give people something new, when you change the ui, it has to be something you. The value for them has to be very. And they should know it immediately in the first 30 seconds that, okay, this is the value for me. Because if they feel that, okay, why am I even doing this? Is it, you know, why don't I just go to search bar and use my filters?

Demetrios Brinkmann [00:24:47]: Especially when you're trying to introduce AI into it. Because it feels like, oh, these guys just want to be able to say to their stakeholders or their stock. What do they call it? The shareholders.

Nishikant Dhanuka [00:25:00]: Yep.

Demetrios Brinkmann [00:25:00]: These guys just want to be able to say to their shareholders, yep, they're using AI, so the stock price goes up.

Nishikant Dhanuka [00:25:06]: It's so important to handhold the users. Often we build tool and we just say, you know, there's the saying, right? So you build, they will come. Doesn't happen. So you build, then you need to handhold. So onboarding, guiding the users. A great example that I always like is, you know, I have Alexa in my house. It's a black box that sits there. It's so inviting.

Nishikant Dhanuka [00:25:28]: It says, talk to me about anything. Then you talk to Alexa. And out of 10 things, I talked to Alexa about eight. Eight. It fails. Two works. But then that's a design problem. And that's the case also with many conversational chatbots.

Nishikant Dhanuka [00:25:44]: We say it's a plain screen, Right. And plain screen is nice, neat, I like it. But at the same time, if I'm the user and you now give me this thing, and behind the scene, there's an agent which can do a lot of things. Maybe there are 20 tools connected to that agent. I don't know. I didn't build it. So you need to onboard me. You need to guide me.

Nishikant Dhanuka [00:26:01]: And over the last few years, I've been working with designers, and there are some kind of excellent ways to, you know, some. And these are not new in the world of design. There are. You can, you can, you know, maybe when I enter, you can already have some boxes that I can interact with. You can have tool tips as I go. So it can be a guided, it should be a guided journey for the user. So that's a second learning. So I've tried both a Chatbot, so you know, immediately go for Chatbot, it fails.

Nishikant Dhanuka [00:26:28]: Then you try ui. You say that, okay, this is the ui. Then let me try a new UI powered by Genai. Make it different. You know, there are a few things happening on the screen. It's much more dynamic, that has better results than Chatbot. But you know, that's also that that is constrained, that's limiting. Chatbot has more flexibility.

Nishikant Dhanuka [00:26:46]: Right. So, okay, that also doesn't work. So I'm kind of coming to conclusion that the best interface is a mix of UI and chat. So these days, so there's this word that, you know, we discuss a lot in our, you know, with my colleagues is the concept of generative ui. Even the UI is generative. You know, let me first tell you what this means. It's like I'm talking to an agent, I'm having a conversation. Instead of just replying to the conversation, you always present me some UI elements.

Nishikant Dhanuka [00:27:18]: Sometimes it might be, you know, show me some items, some carousals, sometimes, you know, related item and so on. And these UI components can be dynamic. So the agent needs to decide based on my user, based on my previous message. Maybe, you know, with the design team I build 10 UI components and then based on the user request, sometimes you get similar dishes or, you know, similar items or you know, sometimes item carousal, sometimes something else. And what we are seeing, we still need to test it more. But what we are seeing is that this experience because, you know, we buying is a very visual experience. You know, conversation is very limiting. You know, I just want to, sometimes, you know, even me as a user, I want to scroll, I want to click, I want to swipe left, right.

Nishikant Dhanuka [00:28:04]: I make decision about what I want to eat based on the image. You know, this image makes me feel hungry, so I want so.

Demetrios Brinkmann [00:28:11]: But you're dynamically creating these different widgets on the fly only depending on the input that I give you from chat or also if I click on something then you can show me more like that.

Nishikant Dhanuka [00:28:28]: Both. That's a great point.

Demetrios Brinkmann [00:28:29]: It feels like that is very mix of traditional machine learning and new agents in a way. Because if I'm clicking on something That's a recommender system problem.

Nishikant Dhanuka [00:28:40]: Yep, yep, totally. If you're clicking on something that's a recommender system problem, you know, the traditional world still applies and I think that helps in the, you know, that's how TikTok, for example, you know, you TikToks, you, you're looking at these feeds, you swipe left, right, and their recommendation algorithm gets better. But if you think about, you know, again, if you think about this movie, Iron Man, Jarvis Jarvis is an assistant who's watching you. What you're doing in that environment, it feels so natural. So we have built, you know, it's still not live, but you know, it's at a proof of concept where we build an agent who, who's watching the screen. So it's not just answering you, but it's also watching your actions on the screen and then it talks back to you depending on those actions. So whenever you add an item to the cart it says aam. So good choice.

Nishikant Dhanuka [00:29:28]: Or you know, I knew that you would like this item or something like that. And that interaction feels so much more natural than you saying everything based on, you know, typing chat where, you know, it's just watching you.

Demetrios Brinkmann [00:29:41]: It's almost like you having to suck that idea out of your mind and put it into the interface is a lot of friction.

Nishikant Dhanuka [00:29:48]: Yep.

Demetrios Brinkmann [00:29:49]: And if the agent can just watch you scroll and click, then it can be there with you. And it's much more of a copilot experience.

Nishikant Dhanuka [00:29:57]: Yep. And I've seen that in action. We don't have a product live yet, but this is something, this is an idea we are playing with right now.

Demetrios Brinkmann [00:30:02]: But also the user adoption problem there is, I imagine you're going to get a lot of folks that are like, I don't want you watching everything that I look at.

Nishikant Dhanuka [00:30:10]: Yeah, yeah, yeah.

Demetrios Brinkmann [00:30:12]: So there's that trade off.

Nishikant Dhanuka [00:30:13]: Yeah. So we have, we have not cracked it yet. Totally, you're right. So this is something which we are exploring and every user is different. So you need to, I don't think there's a silver bullet that I can share.

Demetrios Brinkmann [00:30:23]: There are people that are fully okay with that.

Nishikant Dhanuka [00:30:26]: It's like if I'm fully okay with that, my, you know, the way I think about, you know, data privacy, of course it's important, but you know, my mentality is that take my data if you can, give me value. So if I see value in return, I'm happy to give my data. So. And different people think differently. So last learning on UI before we move to next topic is contextual so being contextual, so often we try to build a chatbot, which is like the solution for everything, right? So we give that as an interface. But one thing that we have seen much more useful is, and this creates a lot of friction because it's a new interface, you don't know the capabilities and so on. You're used to regular interface. So one thing that we have worked better is that imagine your regular UI and imagine, you know, there's a, maybe there's a floating button there and depending on what you're doing, maybe you spend five minutes looking for things and you know, looking for items.

Nishikant Dhanuka [00:31:26]: It pops up at the right time, contextual. And it helps you with a very narrow task, very micro task. I'll give you an example. Maybe I want to buy a headphone and I'm looking at a headphone and it pops up. It says, do you want to compare this headphone with the latest headphone from Apple? It's amazing. That's an aha moment for me. If that happens, I would click on that. And you don't need an entire conversation.

Nishikant Dhanuka [00:31:51]: This opens and the comparison happens using an LLM. And by the way, that's a very simple. You don't need agent for that. It's a simple LLM call connected with tool.

Demetrios Brinkmann [00:31:59]: Well, I guess the hard part is if you want to put it into a table dynamically creating that table. And what this checks this box. This doesn't check this box.

Nishikant Dhanuka [00:32:08]: Especially on a mobile screen, that doesn't look good. But you know, we have designers and you know, there are different design ways to solve it. There are different design ways to solve it. But the point is instead of having, you know, just a chatbot that does everything and user has no idea about its capability, if you define some micro job to be done and you help to the help the users at the right time, we in our experiments, we find that much more effective.

Demetrios Brinkmann [00:32:35]: Maybe real fast we can detour into how you make sure it's coming up at the right time and giving you the right suggestions.

Nishikant Dhanuka [00:32:43]: This is where evals comes in.

Demetrios Brinkmann [00:32:45]: Ah, perfect segue.

Nishikant Dhanuka [00:32:47]: Yeah. This is where evals come in. This, this is again, this, you can compare it with the traditional world of push notifications. So all of us get these notifications from different apps at different times. You know, I use Domino's app, I like pizza and Domino sends me notification. And a lot of times the notifications are bad. So and if the first few notifications are bad, then you stop. Either you silence them or, you know, even if you receive it Your brain doesn't process it because you know it's bad if you get a good notification.

Nishikant Dhanuka [00:33:18]: If you start with on a good. If you have a good start where the notification actually maybe it's 1:30pm and I'm in a meeting and I'm really hungry and you know, the app knows that I'm vegetarian and it pops up that you know this, it knows that I like falafal. That's the context. So if it pops up at 1:30pm that you know, Nishi, you, you haven't had your lunch. I noticed that you haven't had your lunch. I don't know how it will know it, but.

Demetrios Brinkmann [00:33:43]: Well, you didn't order anything from our app, so we're assuming you haven't had your lunch.

Nishikant Dhanuka [00:33:47]: Yeah, so, but that's, that's the thing. That's the notification, which I would love. So again, I think this is an old problem. It applies to push notifications, it applies to other things. And now it's still true in the world of LLM where if we pop up contextually and again, the idea here is to ab test and try. So the idea, again, the idea is that don't do too much, don't overdo it. Don't send 10 messages, but you can. So if we take food as a context, so there's lunch, dinner.

Nishikant Dhanuka [00:34:19]: So breakfast, lunch and dinner, maybe you optimize, you say that you know, at the right time when it's lunchtime, you pop up something and then you use the profile of the person to whatever information you have about the person to write something which the person can relate to. And again, this is where LLM, you know, we have done some experiments with the LLM and it does well. But again, you know, no silver bullet. These are few ideas which we are trying that don't be, don't overdo it. And whenever you do it, make it personal. You know, people say don't make it personal. I say make it personal.

Demetrios Brinkmann [00:34:56]: I could see how you get people that are. In my case, for example, I'm checking out on one of the apps, looking at some food and at this particular restaurant, I've already added a few things to my basket and it says, oh, hey, I know you like vanilla milkshakes. These guys make an incredible vanilla milkshake and that pops up at just the right time. And do you want to add that to your basket? So again, it just reminds me of like traditional recommender system in a new way. And so it's almost like both with search and with this, you're doing these old tasks that we did in Predictive ML. You're just adding an extra layer on top of it to make it even more personalized and even better.

Nishikant Dhanuka [00:35:44]: Ideally, we talk a lot about this internally, that the problems are still the same. Right. So we are still recommending. Netflix organized this recommendation system competition, I don't know, 20 years from now, 10 years from now. So we still talk about recommendation, we still talk about search. And recommendation is not completely solved. Search is not completely solved. Sending notification.

Nishikant Dhanuka [00:36:03]: So it's the same problems, but now the toys are different, the tools are different, and those tools are helping solve the same problem in a better way. Yeah.

Demetrios Brinkmann [00:36:11]: Okay, going back to evals, what is your take on those?

Nishikant Dhanuka [00:36:16]: Yep. So when I think about evals, so we, we talk, you know, Process is also an investment firm, so I speak to a lot of founders and we talk a lot about moat, you know, what is the moat of the product? How do you differentiate your product? And you know, a lot of time, you know, you see people say that, you know, people don't share their system prompt. You know, system prompt is moot. And you know, I understand that you spend a lot of time to engineer that, but I believe in evals so much that I think, you know, and I'm not a founder, but if I would be a founder and if my system prompt leaks, I would not be worried. But if my evals leak, I believe so much in evals, then I, if I would be a founder someday, I would say that evals is the real moat of your product and not your system prompt. A lot of times when we launch these products, before we launch them, how do you know if it's good enough? Right. And again, this is the same problem that you can think of in a regular software engineering world where you build software and software development, life cycle is much mature now, where there's quality assurance, there's a field, the entire field, quality assurance, testing, testing and production, regression test and so on. So there's this entire thing.

Nishikant Dhanuka [00:37:27]: And now with AI applications and now when the entire product is about AI, we come to this new world where and especially now, LLMs are non deterministic. So it's not your regular AI. So how do you know if something is good enough? And we spend a lot of time, we debate a lot. So there are people in the team who disagree and everyone is passionate and someone says that, okay, we did not think about this case. It doesn't work. Other people say that, okay, maybe you're being too particular. It works. It generally works 80% of the time.

Nishikant Dhanuka [00:37:57]: So the answer to that is evals. So you can do that in a systematic way through evals, where the idea is that you build a system to check. Evals is nothing for me, but a system to check if what you build is good enough. That's the first case. Do you launch it? And second, once you launch it, once you start getting real user queries because it can also degrade. Users can also ask different things which we are not prepared for. Evals is that the traffic light that keeps you informed so that you can. So there's a part of it which is pre development, during development.

Nishikant Dhanuka [00:38:33]: And there's a part of it which happens like in production.

Demetrios Brinkmann [00:38:36]: Yeah, I've heard it explained as again going back to like traditional predictive ML. You have offline online, there's like the offline training, batch jobs type thing and then there's the online like boom, real time type of predictions.

Nishikant Dhanuka [00:38:52]: So few mistakes which I have made myself and I see a lot of people make. So I'll talk about that in evals. One, I think it's a mistake to wait for your product to be launched to write your evals because you know you want real user data, right? So evals need data. So imagine you have a chatbot and you need to see how people are using it in the wild, but it's too late already because you launch it, maybe your chatbot sucks and you will know it much later. So this is where we do a lot of simulation, synthetic data generation, where before it goes live and it doesn't have to be again, it can be simple. It's like you get together in the team and everyone is playing with the chatbot and you create evals based on that. Or you use another LLM where maybe you are the loan engineer in the team, you give it 10 queries and then you use an LLM that generate 100 more queries and that's your synthetic data.

Demetrios Brinkmann [00:39:50]: Or you give it different Personas.

Nishikant Dhanuka [00:39:51]: So a lot of people, they wait for, they wait a lot before evals. But I think first step of evals is simple. It's even a person in the room just, you know, firing queries. You create a data set of 20, 50, 100, and then that's a data set. And then you see how your agent responds. And maybe you manually go through it. And if you go through it manually, you know, in half an hour you'll realize that, okay, these are the scenarios where it does okay, these are the scenarios where it doesn't do good. That Influences your thinking that you know what metrics are important.

Nishikant Dhanuka [00:40:22]: For me, how should I use an LLM as a judge? LLM as a judge is a popular concept, right? For evals. Because human labels are expensive. I'll assume the audience knows about it. Another thing that you know, which I've noticed, is that people immediately run to LLM as a judge. But again, there are a lot of low hanging fruits. I'll give you an example. So LLM, so these days, you know, we are a bit spoiled. So LLM as a judge, let's just, you know, users.

Nishikant Dhanuka [00:40:49]: And so you, you get the input, synthetic data, you get the output from the agent. Now you ask an LLM, you prompt it a bit that, okay, did this input satisfy the user intent? And it's easy, you can do that and immediately people run to that because it's easy. But there's something which is even easier, which a lot of people miss is lot of time you have more deterministic because you never know that maybe the LLM itself is making mistake, right? And then you have the final label which is a mistake. A lot of time there are more deterministic metrics. So I'll give you an example from food ordering. So imagine a food ordering experience. User comes in, you have the agent conversation. In the end, if the user generates a cart and usually we send a link to the cart, then that's, that means that conversation was amazing.

Nishikant Dhanuka [00:41:36]: It was positive.

Demetrios Brinkmann [00:41:38]: They bought something.

Nishikant Dhanuka [00:41:38]: They bought something.

Demetrios Brinkmann [00:41:39]: So the final metric that you're looking at is did they convert or not?

Nishikant Dhanuka [00:41:43]: Did they convert? But even, you know, it's getting a little bit into the details. But you know, sometimes people come to the cart stage, but they still don't order. So there's a.

Demetrios Brinkmann [00:41:52]: So it's not always.

Nishikant Dhanuka [00:41:54]: Yeah, but hey, but I'm saying that.

Demetrios Brinkmann [00:41:55]: Was there a cart?

Nishikant Dhanuka [00:41:57]: If you think about a funnel that you know, a user making search, a user going couple of steps. And if you think about a later stage in the funnel, could be about making order, could be about adding to the cart, could be about something else depending on what the use case is. But those are positive signals that this conversation actually went through different stages of funnel. And it's a deterministic metric because I can see that, okay, I have these hundred conversations, which of these conversations went through this stage of funnel and that's a positive conversation.

Demetrios Brinkmann [00:42:24]: So grab that for the evals, grab.

Nishikant Dhanuka [00:42:26]: That for the evals. And that is so much better than using because LLM as a judge will make mistakes. Look for those metrics, whatever Those metrics are for. And don't run immediately to LLM as a judge. There's a step before that. Another thing I want to talk about is so if you. Again, if you look on Internet and if you see advice from people out there on evals, a lot of good advice. But very soon it gets complex where people talk about, you know, conversation level analysis, singleton, multiple turn analysis, the silent failures.

Nishikant Dhanuka [00:43:00]: On the silent failure, there are a lot of eval frameworks. Tool calling, you know, did we call the right tool? Did we call the. Did we call the tool with the right parameters? Did. If you have multiple tools, did you go. If you can even do a state management. Because maybe a user query calls for five tools. So you can even map that. Okay.

Nishikant Dhanuka [00:43:21]: From tool 1 to tool 2. Did the maximum mistakes. So it can very easily blow up and you'll be kind of paralysis analysis, you'll be scratching your hair. Often we present these evals to business folks and imagine you have a meeting and you have these 12 different levels. People get lost and you lose the audience, you lose everything. So for me, for us, what we have seen when we build product, set, process, it's like the first level of eval is very simple. It's like this is the query from the user. This is the entire conversation.

Nishikant Dhanuka [00:43:52]: You know, multiple queries. Maybe there are 10 messages exchanged you use. And maybe you tried the usual metrics. Now you come to LLM as a judge. Before you go to turn by turn analysis, before you even look at what tools were called, just give the entire conversation to an LLM and ask some basic questions. Ask it that did it satisfy the user intent? Did the conversation went in a direction where you were trying to. The agent was trying to close the order because in the end you care about making a sale. And these are so simple.

Nishikant Dhanuka [00:44:24]: And this business, people, they relate to it. And then an LLM will make a judgment, you know, true, not true, partially true, and so on. And there's already so much information there. If you now see that, okay, these are the things that work. These are the things that don't work. To be honest, a lot of time our evals stop there. We don't because there's already so much information there. There's already so many things for us to fix based on what doesn't work that we never go to level two because level two is, you know, tools.

Nishikant Dhanuka [00:44:50]: Did you call.

Demetrios Brinkmann [00:44:51]: It's so there's. There's almost like a hierarchy.

Nishikant Dhanuka [00:44:53]: Yep.

Demetrios Brinkmann [00:44:54]: And in your eyes, the hierarchy is first and foremost. There's like Looking at the metrics of did they try and actually make the sale? Yeah, because that's our top level metric that we're trying to affect the needle on.

Nishikant Dhanuka [00:45:09]: Yep.

Demetrios Brinkmann [00:45:09]: That's really all that matters at the end of the day. So if the agent isn't closing.

Nishikant Dhanuka [00:45:14]: Yep. Then, yeah, totally.

Demetrios Brinkmann [00:45:16]: You gotta be, who was it in the Color of Money? What was that? Who's the actor that was in it? You gotta always be closing. And you gotta teach your agent put that in the prompt.

Nishikant Dhanuka [00:45:27]: So, yep, that's one way to look at it. But of course, you know, on a lighter side, does the agent follow answer? So if you were the user on the other side, did the conversation went in a positive direction or, you know, the agent asked for a burger and you give them pizza. So did it satisfy the user intent? Did it go in the right direction?

Demetrios Brinkmann [00:45:48]: So there's. Those are like very strong evals that you need to be focusing on first. And then if you need to, then you can start peeling back the layers. Like, did it call the right tools? Did it use the right parameters when it called the right tools? I can see that. It's almost like there's an 8020 here. That Freo's principle.

Nishikant Dhanuka [00:46:09]: Yeah, totally.

Demetrios Brinkmann [00:46:10]: We've got these evals, they're going to give us 80% of the important stuff and it's only these two evals that we got to look at.

Nishikant Dhanuka [00:46:19]: Yep, totally. So that has been our experience. So we also made this mistake where we already got into complicated evals. It doesn't resonate with business. It leads to analysis paralysis. So these days, whenever we do evals, we ask it practical business questions first and then there's already a lot of insights there to act on and only then there is value in going deeper.

Demetrios Brinkmann [00:46:42]: But it's not as actionable, I imagine. And like you say, analysis paralysis, where you're like, there's so much that we need to do here. So coming back and saying, is it answering the user's question? Is the user experience nice? Is the metric being moved like the metric we care about, is it trying to move that?

Nishikant Dhanuka [00:47:02]: Yep. Yep.

Demetrios Brinkmann [00:47:04]: Those are very, very basic and very useful.

Nishikant Dhanuka [00:47:08]: Yep. There are two more lessons when it comes to evals. One is about so at process, something we do. This is a recipe that we have tried many times and it has been very successful. Is labeling party. So we call it labeling party, where, you know, we invite. You invite your team, you invite some stakeholders. It's important to invite, you know, business folks as well.

Nishikant Dhanuka [00:47:31]: You get 15 people in the room, you order Some pizza. That's why it's a party. It's important you book one and a half hours, you spend time with these people to actually go through real conversation. And then this is what the agent responded. Now if I'm imagine and then the question you ask them is that imagine you are the user on the other side and then you ask a couple of these questions that okay, did the conversation go in the right direction? Did it try to close the order? And a couple of other things. Did it kind of break any guardrails or whatever? It depends on, you know, what the agent is trying to do. So you define a couple of questions, easy questions. So don't talk about technical stuff like tool calling and so on.

Nishikant Dhanuka [00:48:14]: Business related questions. And what would come from that is that you spend one and a half hours and at the end of one and a half hour you'll have, I don't know, every person will label 10 data points. You'll have 150 data points. And when you look at that 150 data points and now run your LLM as a judge, the prompt that you wrote and ask it to label these 150 data points, you'll see that there's a difference between what LLM says as human. Obvious, there'll always be a difference.

Demetrios Brinkmann [00:48:39]: Human label data and the LLM is a judge.

Nishikant Dhanuka [00:48:41]: And then you will see that depending on your use case, sometimes the LLM will be more lenient. Maybe the LLM always said that ah, your agent is perfect. Or maybe sometimes the LLM would be stricter. So depending on the use case, I've seen both. But now when you see the difference where the discrepancy is between human and LLM label, you look at it again manually. All this is manual. And then the way to use this data is you use this discrepancy to inform yourself that okay, this is where the LLM is making mistake. You know, human is right here.

Nishikant Dhanuka [00:49:13]: Now go back to your LLM as a judge prompt. Give it examples. So few short labeling. Give it example that when this question comes in, this is what you do. So I think often we also use, we take think LLM as a judge. Very easy. We just write a prompt, we expect magic to happen and get something but our experiences, you need to iterate on that prompt.

Demetrios Brinkmann [00:49:34]: Dude, we're going full circle. It's context engineering.

Nishikant Dhanuka [00:49:37]: Yeah, yeah, it's context engineering. All these things are related. But you need to improve your LLM as a judge. So we spend a lot of, so we do these labeling parties, we improve the Prompt we give it few short examples based on whether this. And now your LLM as a judge, you run it again. We see that, okay, now it's closer. Earlier it they mismatched, I don't know, 30% of times. Now the mismatch is 15%.

Nishikant Dhanuka [00:49:57]: And again you need to keep doing it every two weeks do a labeling party. So that is a recipe that we have tried and it's been very successful.

Demetrios Brinkmann [00:50:04]: So yeah, that's an actionable insight.

Nishikant Dhanuka [00:50:06]: Yeah. And it's a manual process, team building. Yeah, it's rewarding. It's nice as the builders of the system to sometimes look into the data.

Demetrios Brinkmann [00:50:15]: Yourself and hear from the business side of the house. I imagine, hey, this makes no sense. Why would we care about this? Or why would the LLM say this?

Nishikant Dhanuka [00:50:26]: Yeah, you discover a lot of other things which you were not even maybe hoping to hear. But that happens. Last point there. It's actually kind of, it's related with labeling party. So when you do a labeling party, how do you offer. It's very important how you offer people, how they can look at the conversation. And this is where for example we use a lot of these external tools, Langsmith Langfuse. So they are these observability tools where it captures the interaction.

Nishikant Dhanuka [00:50:56]: But at the same time, if you're looking at Langsmith Trace, it's very user unfriendly. It's like, you know, bunch of jsons. If you have multiple tool calls, it's like, you know, I love looking at it, you know, I'm a technical person, but the business person gets lost. So one thing again which we do, which works very well is with now with very low effort, by using V0vercel and so on, you create custom annotation tool. There are standard tools as well in market for labeling data like Label Studio. There are other tools out there, they're good. But our learning is that every use case is different. You need to visualize different things for the user.

Nishikant Dhanuka [00:51:32]: I'll give you an example. If I'm looking at food conversation or a shopping assistant conversation. If I'm the user who's evaluating it, aside from the conversation, I also need to see that, okay, these are the items which were shown to the user at that point. These items and I need to see it visually, otherwise how will I judge these items are from these restaurants? Maybe these restaurants are open or closed at that time. Because if you don't give me that information. So annotation tool plays a big role and what we find ourselves doing a lot is you know, with a couple of. It's like hacking. We just quickly put together a tool in half a day using V0vercel, and people love using it.

Nishikant Dhanuka [00:52:10]: And then you get. The audience is much more involved than going through a trace in Langsmith. So, again, it's a small point, but it makes a big difference.

Demetrios Brinkmann [00:52:18]: User experience.

Nishikant Dhanuka [00:52:19]: Yep. User experience. Full circle.

Demetrios Brinkmann [00:52:25]: Sa.

+ Read More

Watch More

Building Reliable AI Agents
Posted Jun 28, 2023 | Views 1.3K
# AI Agents
# LLM in Production
# Stealth
Lessons From Building Replit Agent // James Austin // Agents in Production
Posted Nov 26, 2024 | Views 1.4K
# Replit Agent
# Repls
# Replit
Building Conversational AI Agents with Voice
Posted Mar 06, 2024 | Views 1.6K
# Conversational AI
# Voice
# Deepgram