MLOps Community
Sign in or Join the community to continue

From Zero to AILO: Lessons learned from building iFood's AI agent // Nishikant Dhanuka & Chiara Caratelli

Posted Nov 27, 2025 | Views 79
# Agents in Production
# Prosus Group
# iFood
Share

Speakers

user's Avatar
Nishikant Dhanuka
Senior Director of AI @ Prosus Group

Data Science Leader with 15 years of experience in the field of AI. Passionate about building products that are not just innovative and AI focused, but also customer-centric. Having lead and being part of various AI/engineering teams over the years in some of the best organizations; I'm a strong advocate for velocity while also following high-standard engineering best practices. Recently I've found myself completely immersed in Generative AI. With the rapid pace of development in the field, I am excited about the potential it holds. Always open to connecting with professionals who share a similar passion for AI and Data Science.

+ Read More
user's Avatar
Chiara Caratelli
Data Scientist @ Prosus Group

I'm a Data Scientist with a PhD in Computational Chemistry and over nine years of experience using data to solve complex problems in both academia and industry. Currently, my work at Prosus involves developing practical solutions using advanced machine learning methods, with a particular focus on AI agents and multimodal models.

I thrive in fast-paced environments where I can collaborate across different fields to create impactful, real-world applications. I'm passionate about exploring new technologies and finding creative ways to integrate them into meaningful solutions.

Outside my day job, I enjoy experimenting with machine learning projects, automation, and creating content to share my experiences and insights. I'm always eager to learn from others, exchange ideas, and build connections within the data science community.

+ Read More

SUMMARY

In this session, we share the development journey of Ailo, iFood's conversational AI agent. We cover the practical challenges and victories of navigating from concept to production, highlighting how robust MLOps practices and the integration of our proprietary Large Commerce Model (LCM) enable us to interpret complex intents and create the best personalization experience for our users.

+ Read More

TRANSCRIPT

Nishikant Dhanuka [00:00:05]: Hi everyone. So we want to talk about from zero to AILO. AILO is the name of the food ordering agent we built at iFood. So we. So through this session we want to share some of the lessons we learned while building it. Who are we? So I'm Nishikant, I'm Senior Director of AI at Prosus. I have 17 years of experience in AI at Prosus. We are building a lot of agents in production.

Nishikant Dhanuka [00:00:31]: I'm particularly very interested in the topic of agents for E commerce. And AILO is actually one of the projects, one of the real examples of using agents in E commerce. I'm joined here by my colleague Chiara.

Chiara Caratelli [00:00:44]: Hi everyone, I'm Chiara. I'm a data scientist at Prosus Ignition team and I'm building agents with application E commerce with our portfolio companies. And today we're going to talk about iFood.

Nishikant Dhanuka [00:00:57]: Okay, let's get started. So if you're in Brazil, you don't need this slide. IFood is the most popular food delivery company, the biggest food delivery company in Brazil for everyone else. So Ifood, for example, just to give you an idea of the scale of the company, so there are 160 million monthly orders that iFood makes 400,000 delivery drivers, 55 million monthly users. And iFood is present in 1500 cities in Brazil. And it's not just a food delivery company. So iFood is also a Brazilian tech company. And again in the Genai wave, it's much easier for companies to get started with AI.

Nishikant Dhanuka [00:01:42]: But iFood has been kind of doing AI since last 10 years and growing exponentially again just to share some numbers. So there are 150 proprietary AI models and a lot of them are known Genai models that model for logistics, for example. So whenever someone makes an order, this model connects how to assign a driver to that order. 14 billion real time predictions are made every month, 16 petabytes of data in the data lake, and so on. So iFood is a truly. So so it's a food delivery company, but it's truly a Brazilian tech company. Now how often does it happen that you open so wherever in the world you are, you open a food app late at the night, but you're not sure what to eat. So let's say this is Friday, your day is over and you're hungry, you open the app, you're anxious, the app is too crowded, there are too many options.

Nishikant Dhanuka [00:02:42]: So you end scrolling for many minutes and you're still not sure what to eat. So this is where anxiety kicks in. You can have different thoughts. So it's Friday, you had an intense week. You think you deserve something special, but you don't know what to eat. You want a pizza, but you're tired of eating the same pizza all the time, so you're looking for other option. You just want. You don't know what to eat.

Nishikant Dhanuka [00:03:11]: You just want your food to arrive quickly. So these are kind of different emotions a user can go through. And a lot of this might sound simple, but if you imagine every user is different. So for example, here we share three real profiles. Let's say there's a user like me who enjoys Brazilian dishes. There's a user like Chiara who's focused on health and she wants to eat extra protein. There's a user like Maggie, she enjoys eating sophisticated dishes. So if each of us come and we say we are hungry, we want different food items to eat.

Nishikant Dhanuka [00:03:50]: So this is where it gets more challenging because we want to create a food agent which is hyper personalized. So this is what we made with ilo. AILO is the name of the food agent. So it learns from your user behavior on iFood and it also interprets complex intents. So you can use AILO to, of course, if you know what to eat, you can use it like a search bar. You can go and talk burger, you can go and say burger or pizza and so on. But it also understands very complex intent. So you can say I'm hungry or you can say I'm with two friends and I want to eat, so what should I order? And it's important to understand.

Nishikant Dhanuka [00:04:35]: So AILO is not a chatbot, it's an agent. There are various tools. It cannot just search for food, but it can also take action on behalf of the user. So it can apply certain coupons, it can add to the cartoon, it can also make the payment on behalf of the user. Plus, a lot of these decisions are happening autonomously. So it's kind of an agent behind the scene. And we'll share some of these details about the agent. So on the screen you see so AILO.

Nishikant Dhanuka [00:05:03]: So this is kind of a quick demo. AILO is present both on app, on iFood app. So there's a floating button. If you click on that floating button, you get a screen. And when you interact with it, for example, you screen on the same screen that in this case a user is saying, surprise me. And then you get a much richer interaction with a richer UI with some follow up questions. And then you can add the item to the cart and make the payment. But AILO is also present on WhatsApp.

Nishikant Dhanuka [00:05:32]: So again, for people who are not familiar with Brazil, Brazil is one of the biggest. Brazil has one of the biggest WhatsApp active user base. So more than 150 million users are active in WhatsApp and not just active. So people are using WhatsApp in Brazil to make e commerce decisions. So not just to chat with friends. So it's very important to be on WhatsApp. And again, it has been an interesting journey. The learnings are very different for app and WhatsApp.

Nishikant Dhanuka [00:05:59]: So for example, just to give you an idea, we found that users are much open to have longer conversations on WhatsApp because primarily it's a messaging channel, whereas on app users prefer short interactions. They already expect AILO to kind of show the dishes that they would be interested in. These are some comments from the users that you know, ifood has so many options like Netflix. And then this assistant helps us decide what to eat. Now I'll pass on to Chiara to talk a little bit about how AILO works behind the scene.

Chiara Caratelli [00:06:37]: So this is our agent architecture. We have a single agent with a twist. We added some complexity here to be able to process flows in a more independent way. So we have a user message comes in and then our LLM gets a system prompt that depends on the state. So depending on what tool gets used, what flow you're in, we get a slightly different behavior. So this is kind of a multi agent setup. In a single agent we also load information about the context, information about the user that we need in order to give the best suggestions and handle the user requests. And then we have a collection of tools that are related to food, like managing the cart, ordering, searching for food.

Chiara Caratelli [00:07:25]: Some of these tools are AI workflows. So they're intelligent and based on the main agent tasks, they can operate independently and get the best predictions and suggestions for the user. And these are also connected to UI element because one of the things we learned is that in this kind of setup, users don't really want to type much. We need to limit their work as much as possible. So those tools are also connected to UI carousels. They generate buttons depending on the user request, so the user has a much shorter route to responding to the agent. You also have follow up suggestions. So this is in a nutshell how we handle user input.

Chiara Caratelli [00:08:12]: I'm going to share a couple of lessons that we learned while building ilo. First of all is how to handle personalization. So our users, as Nishi said before, they come from different backgrounds, they have different preferences and in food delivery, this really matters because we want to get users suggestions as fast as possible and we want users to like these suggestions. Right. So when a user searches for something, it can be something simple as a pizza. It's actually not so trivial. What we do is getting a representation of the user preferences that we collect based on their behavior in the app. So there is some offline processing that we do on the user profiles.

Chiara Caratelli [00:08:57]: And based on these representations we can build the context. So the agent takes into account the conversation context, whether the user has indicated some preference, but also their order history, their preferred food categories, the time of the day, if it's breakfast or dinner, it's going to change of course, and we put that in the context that we send to the tool. So the tool has a self contained workflow that has all the information it needs to be to be able to retrieve the best options. And this is an example of the input and output that we get. I'm going to guide you through it a little bit. So what we get is converting the user input into search queries for something simple as pizza. We going to do some semantic and exact type of search. But we can have much more complex scenario like generic food requests like I'm hungry or something like that, where we really need to expand into different queries that represent the user preference as best as possible.

Chiara Caratelli [00:09:57]: Then we get results and we use this user context to re rank and pick the best options for the user. So these results end up as a tool response, but also as UI elements. So we display them directly in the app interface and we also give them to the agent of course, because it has to know what options did we show to the user because this the user might have more questions after that. We also manage this context smartly of course, because it can grow a lot. And I'm going to tell you in a moment. So you can see we have three different profiles. In the case of pizza we can show very different options. If a user likes meat, we're going to show you pizza with more meat heavy options.

Chiara Caratelli [00:10:45]: We have low carb or the sophisticated ones. And this is not just one option that we show. We show multiple ones of course. So we need to make sure that this catch the user interest as best as possible. Second lesson, managing latency. So users are hungry, they don't want to wait. I said before, we don't want them to type too much. We want to have the simplest flow as possible.

Chiara Caratelli [00:11:10]: So how did we manage latency at the beginning we built a simple agent, we didn't pay too much attention to this. We wanted the flows to work, so the latency was pretty high. We had around 30 seconds, P95 and we decreased that up to 10 seconds by looking at what flows we could simplify. So we tried to make sure that we could have a flow to handle complex requests. But when things were simple, we didn't need to do all of that. So we created some shortcuts that were faster. And this could be searching for food, like getting preferences, searching for promotions available. We try to separate those cases.

Chiara Caratelli [00:11:53]: The other is context handling. I think you've heard this enough during today's event, but it's very important. And the other is compressing the prompt and also using smaller models when possible. I'm going to give you some examples of what we did for these three steps. It's not exhaustive, but I selected some of the actions we took. So first of all, we looked at all the context that was processed during our user requests and realized that part of it could be moved to asynchronous processes. For instance, compacting the context of the previous messages or picking the best information from the user behavior. All these processes didn't need to be done synchronously.

Chiara Caratelli [00:12:44]: So what we did was moving tokens that were processed during this flow to asynchronous processes. So that didn't decrease the total amount of tokens, but it decreased the total amount of tokens that run in the slowest flow, so that decreased latency. And you can also see here that the total token also decreased because of some prompt optimization that we did. And I will show you in a moment. This is a topic that we've investigated. This is a research that was done by Paul van der Boer from our team. And it's about token tariffs in different languages. Unfortunately, languages that are non English are penalized and this leads to higher latency and also quicker context routes because you have more tokens.

Chiara Caratelli [00:13:31]: So we made sure that all our prompts were in English and to save those token compared to the Portuguese version. The other is we learn how to deflate blowdown prompts. I think every agentic project has this at some point. It's very easy to add rules to the system prompt. So whenever there is a bug in production or user complain or we find some error in evaluations, it's very easy to add an edge case to the system prompt to cover that. But this gives bloated prompts and it's usually a code smell when you see this. So we tried to understand why this was happening and the first step we did was creating evaluations for all these cases. So if there was a case in production that led to the error, that led to the edge case being added to the front, we added an evaluation case for that.

Chiara Caratelli [00:14:28]: And then we did two things. Most important is to improve the name of the tools and name of all variables that we use. And a good rule of thumb, a thought experiment is to get a person who is not familiar at all with your agent and show them the list of tools, the names, and ask them, do you understand what they do? Would you be able to use them with these instructions? And if not, it means that the tool names are not correct. It can be names that are very specific for our application, but don't make sense in the context of an agent who doesn't know anything about it. So this leads to all these edge cases to be explicitly mentioned. So we try to simplify, improve tool naming and at the end remove these edge cases and all the time running evaluations on those scenarios. And that led us to us simplifying prompt significantly and reducing tokens, therefore reducing latency. It was not the only thing that we did, of course, but I think this is most relevant in this conference.

Chiara Caratelli [00:15:36]: Yeah, I'm going to show you also how we did evaluations. So we had evals on production traces, of course, and regular tests that we run. But this is something new that we experimented on. What we did was defining scenarios with natural language. And these scenarios include instructions on what needs to happen, some steps to set up the scenario, and also the expected behavior of the agent. We did this because it's sometimes hard to specify in a single LLM judge how the agent should behave. But it's very easy if you have something that doesn't work to pinpoint what, what is wrong and what should happen. And we did this in natural language so that it would be easy to maintain for non developers as well.

Chiara Caratelli [00:16:25]: And this by the way, is an agent that runs through these scenarios and behaves like a user. So it pings our endpoint for the agent and it evaluates the responses and also the ui. So you can do a lot of things here. For instance, testing guardrails, you can try for different turns to evade guardrails. So yeah, this helped us a lot to improve the prompt because we had a reliable way to test exactly the scenario we wanted. And this is not the only evils we did. This complements of course, all the production evils that we do. So yeah, I'm going to give the word to Nishi again.

Nishikant Dhanuka [00:17:06]: So Ilo is live to millions of users in Brazil, both on app and WhatsApp. But it's also in the process of building. Right. So we are continuously doing a lot of experiments to make it more agentic every day, just to give you some examples. So again, as I said before, AILO can take action in real world on behalf of the user. So it's not something which just searches for food. It can apply coupons, it can figure out if a customer has a loyalty program with ifood what discounts to apply. We are doing also some experiments so it can check out.

Nishikant Dhanuka [00:17:46]: We are also doing some experiments where AILO is able to make payments on behalf of the user and so on. It is contextual. So the idea is that, you know, so it knows about the weather, it knows whether you are in a new city, and if you go to a new city, it can say that, you know you're in a new city and you know, these are the restaurants in the new city that matches your usual choice. It remembers you. So that's when it starts becoming kind of a true buddy. So in this example, so you know, if you. It can say that, you know, it knows your taste, it says, you know, you ordered this particular food twice this week. So do you want recommendation from similar places? Because one of the pattern that we see, I think it's a pattern throughout the world that people just go for the food, they reorder the food that they usually order.

Nishikant Dhanuka [00:18:39]: But also the user behavior is changing. So I'm sure people would still, a lot of them would still keep reordering, but maybe a lot of people reorder because they find it difficult to express their current needs or the current needs are very vague to an agent. And if an agent can understand the current needs, then people would be open to try new items. It's proactive. So currently. So the idea here is how many times have you installed a chatbot or agent? Initially you're excited, you send a couple of messages, but then the excitement dies because you're not sure, for example, what to ask and how to continue the conversation. So one principle we are baking into AILO is that it's proactive. It doesn't wait for the user to reach out.

Nishikant Dhanuka [00:19:26]: It reaches out to the user at the right moments. So we try to do that at different. So AILO is listening to the events as well on the app. And for example, after a few minutes of an activity, it reaches out to the user, it offers certain items and so on. So these are a few experiments which we are running to Make AILO more agentic every day. If you're in Brazil, you can see it in action. You can scan this QR code and you immediately have AILO on WhatsApp. Or you can go to the iFood app, you can click on the floating button and you can interact.

Nishikant Dhanuka [00:20:03]: And though today it's two of us standing in front of you presenting ilo, but it's a team effort, so we have a team between process and iFood, so sits here in Amsterdam and Brazil. So I want to make sure that everyone is on the screen. That's it. Thank you.

Demetrios Brinkmann [00:20:23]: That was pretty far out, dude. Pretty radical, man. I like it. I got questions. I'm sure the chat has questions. Let me start with the most interesting slide that I saw, which was about the tax you got to pay. Tell me more about that, about tags, the tax, the language tax.

Nishikant Dhanuka [00:20:48]: The tax. Want to talk about it? You don't need those things.

Chiara Caratelli [00:20:52]: Yeah. Okay. So, yeah, our colleague did an analysis on different prompts that are run through the main models that are available right now. And there is a clear discrepancy between the number of tokens used in English and in different languages, which. Which leads to a higher number of tokens for the same amount of information. And it's because these models are mainly trained in English. The only exception is Chinese models, which are more efficient in Chinese. But for Most models like OpenAI models, Anthropic and open source ones as well, we see that if you want to write the same information in a language that is not English, you're going to have to use more tokens and it can be up to 50% or more tokens.

Chiara Caratelli [00:21:43]: So, yeah, it's a problem.

Demetrios Brinkmann [00:21:46]: It's a really cool idea and research around it. There is a few things. It's nice that you gave this talk right after Donay, because it feels like there's a few parallels that you all are doing, whether it comes to the context window and really being the janitor of a context window and making sure that that is primary focus. And then second, secondly, the idea of proactively giving the user what they want and figuring out how to make that intuitive for the user. So I like both of those. I do wonder about the context window because it is such a strong theme throughout this whole day, especially on this track. What are some don'ts? What are some things that you're like, oh, if you want to blow up your context window, do this for sure.

Chiara Caratelli [00:22:31]: Yeah. So first one is tool outputs. Tool outputs can easily blow the context, especially when you upload Data and you let the LLM choose the data. Long conversations as well, where there are multiple tool iterations can also blow up the system prompt. So we need to make sure to have right summarization in place and only select the relevant context window. There are different strategies for that. The other is number of tools. If there are a lot of tools, the LLM needs to have a lot of tool descriptions, needs to make a lot of decisions as well.

Chiara Caratelli [00:23:11]: So yeah, there are ways around that. Of course, multi agent setup is a way if possible, we try to put tools that work together together. So if a tool is always called after another, there is no point having two separate tools, right? You can make it available only if the first one is cold or maybe have a single workflow for both of them. So yeah, these are some learnings we got. We're getting our hands dirty just to.

Nishikant Dhanuka [00:23:38]: Add to it, right. So I think models are getting bigger. There's more and more context window, right? So for example, I think Cloud a models has 200k context window. Then I think the biggest One is with Gemini 1 million context windows. But it also comes at a cost. So the more context doesn't mean put everything in the context. So one of our learning is that it's a needle in a haystack problem because if you put so though you can put more in the context. And to be honest, you know, even if you make mistakes, it still fits in the context window many a times.

Nishikant Dhanuka [00:24:13]: But the performance of the LLM, the output you get is poor. So it's very important to manage the context properly. And I think it's getting even worse with mcp because I think MCP is great. But one of the problems of MCP is that GitHub MCP has 93 tools. So if you connect GitHub you get 93 tools for free, which completely bloats your context. So yeah, so though we don't use GitHub MCP in this project, but I think we spent a lot of time managing the context, as Kiara said, by being efficient with our tools. I think that's the trick.

Demetrios Brinkmann [00:24:48]: Yeah, there's a great question coming through here about WhatsApp and does WhatsApp pose any unique security challenges?

Nishikant Dhanuka [00:25:05]: I can see that for some industries and for some markets, for example, you know, if you are, you know, I can see, for example one of our companies is Zoolix, which is in the second hand marketplace and a lot of scams happen on WhatsApp. So at some point we discussed that with Olex and then we were not sure. Because it's a place where kind of, you know, people reach out and they want to scam. In Brazil, WhatsApp is the way of life. Even if, you know, Ifood ilo is not there. The way people order food is they send a voice note, even a voice note to a restaurant, and then that's kind of a confirmed order. So that's awesome. Yeah, it does have security issues.

Nishikant Dhanuka [00:25:52]: WhatsApp has other problems in terms of, you know, UI limitations as well. But there are some countries where you cannot escape WhatsApp. It's a way of life and it also allows you to. It has a lot of benefits. It allows you to reach out to a wider audience. Because everyone has WhatsApp on the phone, they don't need to install a new app.

Chiara Caratelli [00:26:08]: So, yeah, what we try to make sure is also that authentication is done properly. So if we have a new user or we don't know the phone number of the user, then they're going to go through a flow where they need to authenticate and validate this phone number. So they will need to go through a browser, that's for sure.

Demetrios Brinkmann [00:26:26]: Yeah, yeah.

Nishikant Dhanuka [00:26:27]: So in a way, for example, I think that's a great point. Currently, kind of WhatsApp is connected to the iFood app. So it's like. So it's not. We are trying to make WhatsApp more standalone, and then it would be more kind of open to scam. But currently, if you open, if you interact with AILO on WhatsApp, there's a point where you need to validate on the iFood app that you are who you are seeing. So it goes through kind of an authorization. So there's a continuous movement between app and WhatsApp, which adds to security.

Nishikant Dhanuka [00:26:57]: But at the same time, we also want. We are also experimenting with WhatsApp being a more standalone.

Demetrios Brinkmann [00:27:01]: It's fascinating that your mind instantly goes to more like social engineering as opposed to the technical side. And thinking about, yeah, there's probably an easier way to have security vulnerabilities than breaking WhatsApp.

Chiara Caratelli [00:27:15]: Cool.

Demetrios Brinkmann [00:27:16]: Well, I think there is one more question here that I want to ask you. Is tool router an agent connected to an LLM or is it just a wrapper on all the underlying tools?

Chiara Caratelli [00:27:30]: So it's a wrapper on top of the tools, and we. We also have some code that controls what ends up in the user interface after the tool is called. So we have some infrastructure built around that as well.

Demetrios Brinkmann [00:27:48]: Excellent. All right, folks, I think that's it. Thank you for delivering me food. I am now full and ready to rock and roll for the second half. This is awesome. Thank you, Kiara and Nishi.

Nishikant Dhanuka [00:27:59]: Thank you.

+ Read More
Comments (0)
Popular
avatar


Watch More

Lessons From Building Replit Agent // James Austin // Agents in Production
Posted Nov 26, 2024 | Views 1.6K
# Replit Agent
# Repls
# Replit
Hard Learned Lessons from Over a Decade in AI
Posted Jun 06, 2025 | Views 331
# AI Adoption
# LLMs
# Tecton
Code of Conduct