MLOps Community
Sign in or Join the community to continue

The Latency Goldilocks Zone Explained

Posted May 12, 2026 | Views 6
# Conversational AI
# iFood
# AI Agents
# Prosus Group
Share

Speakers

user's Avatar
Rafael Borger
Head of Innovation @ iFood

Rafael Borger is the Head of Innovation at iFood, where he focuses on building and scaling innovation initiatives inside one of Latin America’s largest technology companies. His work centers around experimentation, rapid prototyping, AI-driven products, open innovation, and what iFood internally calls “Jet Skis,” small, fast-moving teams designed to test new ideas quickly before scaling them.

+ Read More
user's Avatar
Daniel Wolbert
Data and AI Manager @ iFood

Daniel Wolbert is a Data & AI Manager at iFood, where he works on large-scale artificial intelligence, data science, and machine learning initiatives powering one of Latin America’s biggest delivery and commerce platforms. His work sits at the intersection of data infrastructure, AI products, experimentation, and personalization systems.

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Rafael (Head of Innovation, iFood) and Daniel (Data and AI Manager, iFood) pull back the curtain on ILO-Agent — iFood's conversational AI ordering system built for 200 million users across Latin America. Recorded live at AI House Amsterdam, this conversation goes deep on the engineering and product decisions behind building recommendation systems, agentic AI, and why the speed of your AI's response might actually be destroying user trust.

+ Read More

TRANSCRIPT

Daniel Wolbert: [00:00:00] Recommending something that I know about the user is relatively easy.

Demetrios: Mm-hmm.

Daniel Wolbert: But how do I recommend something that I don't know if the user will like or not?

Daniel Wolbert: So who are we here with today? Fellas, what you guys doing? I'm Rafael. I'm head of innovation at iFood. Daniel Hulbert, Brazilian, 37 years, born and raised in Belo Horizonte. I have two kids, Davi and Isabelle. I, I w- I am a husband of Maria Elisa. Uh, I'm a data science manager at iFood. I've been with iFood since, uh, 2025, right?

Daniel Wolbert: Uh, and I have a background of, uh, data science, even when data science or AI was not called like that, right? So I started with my career as a data, as a developer, and then I had the opportunity to go all over through the history, the recent history of data science, [00:01:00] like analytics, data science, machine learning, optimization.

Daniel Wolbert: So, and now I'm currently working at the intelligence part of the iLo, which is our, uh, our hyper-personalization for requests of the users.

Demetrios: Yeah, tell me more about iLo.

Daniel Wolbert: Well, iLo is an experience where it's, uh, different than w- than when you go for the iFood app, right? So it's a conversational agent where you can request for anything that is even something very simple like a pizza, or if you want some romantic dinner with two people, but my wife, she doesn't eat onions, for example.

Daniel Wolbert: So iLo is a solution that gets your, uh, your request, understands who you are, what are your preference, what are your desires, and gives you the best options at the moment.

Demetrios: Is it doing it proactively already? Or is it still that I am actively asking iLo things like question, answer?

Daniel Wolbert: Right now we are doing this reactively.

Daniel Wolbert: Yeah, reactively. Right? Uh, but we are doing the plans in the future. There [00:02:00] is one specific part of, of the, of the process that we are trying to, we are using, which is the undecided user flow. Some of the users, they already use that, right? So you are on the app, you are working on that, right? And you receive, uh, uh, uh, some suggestions and recommendations for that based on this same intelligence.

Demetrios: Mm.

Daniel Wolbert: So we are s- we are starting to think about how to proactively recommend things for the users.

Demetrios: But these are, like, digging under the hood, I'm always fascinated, especially with recommender systems, how different this is from a traditional recommender system.

Daniel Wolbert: Well, when we talk about recommendation systems is a whole world, right?

Daniel Wolbert: I mean, everybody, every single company is trying to see, uh, how can I understand you, how I know who you are, and what is the best product for you-

Demetrios: Mm-hmm ...

Daniel Wolbert: uh, at that moment, right? Uh- It's not that different. I mean, recommendation systems is [00:03:00] something that we've been trying. I mean, there's a lot of people around the world trying to do that, right?

Daniel Wolbert: So we are getting, like, all the, the knowledge that we have previously from our data, and we are experiment different things on that, right? So we have a model which we call LCM- Mm-hmm ... that is a kind of a summarization and, uh, uh, a joint of, uh, uh, lots of different ways of that we can recommend. At iFood, we, uh, use the base of the LCM, which is, uh, which has the, the full characteristics of the user to understand what is the best, uh, request, what is the best, uh, dish for you at that time.

Demetrios: Yeah.

Daniel Wolbert: Right? So, uh, it's not that, not that... There, of course, there is the innovation part, which we are trying different things which people have never tried before, and we are, we are constantly evolving and thinking how we comm- recommend the things for the user, right?

Demetrios: But is it, is it like you're using a traditional machine learning model, and [00:04:00] then you're adding a little LLM on top with sprinkling some flavor on it?

Daniel Wolbert: Not necessarily, right? Using the, the technology we are using, we are using everything that it's available right now, plus the innovation part. So what is the innovation part? AI, AI itself, right? I mean, by AI, I mean, w- if we talk about AI, people that have never know about what AI, they think that AI is only LLMs.

Daniel Wolbert: It's only this- Yeah ... generative AI, right? ChatGPT. Yeah. But AI is beyond that. There's a lot of techniques. There are a lot of, uh, things that... Gen AI is just one part of this whole world of the AI, and the, we, we use a set of different techniques to understand how can we do a recommendation, right? But that's not, that, that's tricky, right?

Daniel Wolbert: That's depends. Uh, for example, if you're a guy that have only ordered Japanese food in your life, right? What would we expect- You like that

Demetrios: sushi. Yeah.

Daniel Wolbert: Yes. What, what w- what would you expect to [00:05:00] recommend to

Demetrios: you? Sushi all day. Miso soup, and maybe a little bit of ramen.

Daniel Wolbert: All right. But does that mean that you are the guy that only eats sushi 24 hours per day?

Daniel Wolbert: That, that's the challenge of the recommendation because-

Demetrios: You need a California roll with that cream cheese for

Daniel Wolbert: breakfast

Demetrios: Well,

Daniel Wolbert: in Brazil, we put cream cheese, we put, like, the snacks on the top of sushi. Yeah. We put some sweets, some cheese.

Demetrios: Mm. Uh-huh.

Daniel Wolbert: But, uh-

Demetrios: Granola. You put some granola on top with a little yogurt with your sushi.

Daniel Wolbert: Well, we, we put, like, guava, which is the, the guava thing. Yeah. And I tried it. It's

Demetrios: really good. I've heard about Brazilians disrespecting all kinds of traditional cuisine with the pizza. That's true. Yeah.

Daniel Wolbert: We have enough pizza, which is good, by the

Demetrios: way. You are. You have, like, the Nutella and banana pizza.

Demetrios: I've heard about that. Yeah. Really. Yeah. Brigadero pizza is, like- It's famous there, right?

Rafael Borger: Yeah, pretty

Demetrios: famous. Yeah.

Daniel Wolbert: But that's the thing. I- we got one case where the user really liked, uh, Nutella and banana. They, they was... I mean, most of the orders from the user, they were related to, [00:06:00] to sweet things, right?

Daniel Wolbert: Mm-hmm. And, um, the recommendation that, that we provided was the pizza that was a sweet pizza, right? The user liked it, of course, but the thing is, how can you do a challenge that is beyond your own knowledge for the user? How can you- Step out ... extrapolate that? Step out, right. Yeah. And this is what we are doing.

Daniel Wolbert: This is what we are, we are constantly testing,

Demetrios: right? Oh, I see. And you've built a profile about someone, because that's what, that's all you know about them when they interact on your app, but you wanna know, if we were to offer them something new, how can we do that in a way that it's gonna be a little bit more successful?

Demetrios: Or at least we're not just firing blind.

Daniel Wolbert: Correct. Recommending something that I know about the user is relatively easy.

Demetrios: Mm-hmm.

Daniel Wolbert: But how do I recommend something that I don't know if the user will like or not? And that's the difference that we are working on. It's a bunch of different techniques, different tools, trials and errors.

Daniel Wolbert: Uh, how [00:07:00] do we do the market fit? How do we make sure that, uh, for the person that likes only sushi, you are gonna like that? How, for the person that, for example, likes goo- gourmet, gourmet, uh, uh, dishes, it will differentiate. It's a trial and error, right? Mm-hmm. It's always a trial and error in, at the end.

Daniel Wolbert: And, um-

Demetrios: But are you creating new clusters?

Daniel Wolbert: In the recommendation world, right, your challenge is to make sure how close am I am as a person to a group of persons.

Demetrios: Yeah.

Daniel Wolbert: Right? And not only that, how close I am to the person that, that is close to a restaurant, that is close to another dish.

Demetrios: Yeah.

Daniel Wolbert: So, how do I get the shortest path between me, Rafa- And a burger close to my city or here in Amsterdam, right?

Demetrios: Mm-hmm.

Daniel Wolbert: And there is no, there is no, uh, rocket, um, not rocket science. There is no-

Demetrios: Easy way ...

Daniel Wolbert: bullets.

Demetrios: Silver bullet. Yeah,

Daniel Wolbert: silver bullet. There's no silver [00:08:00] bullet for that. It's trial and error, right? Mm-hmm. You see, the thing about iFood is that, uh, how can we serve, how can we do a product for thousands, uh, of million of people that have so many different food tastes, so many different characteristics, what he or she likes, what's gonna order.

Daniel Wolbert: There is some people are more price sensitive or not.

Demetrios: Mm-hmm.

Daniel Wolbert: You cannot build just one product for that that fits all. You must make sure that you-- parts of your products will be adjusted depending on the profile of the user, right?

Demetrios: Yeah.

Daniel Wolbert: I cannot recommend, so if I'm a guy that doesn't like sushi or that like pizza, for example, but, uh, I can only have 10 reais to spend today, I cannot recommend a pizza that is worth, like, 200 reais because I'm not gonna buy it.

Daniel Wolbert: So I need to understand not only your [00:09:00] characteristics of what your tastes are, but your economic profile, what you are aiming to, to buy, what is your historical purchases, how do you compare itself with, to your other users in order to get you the best recommendation.

Demetrios: So, okay. W- I also wanna talk about the idea of the jet skis that you were mentioning before, and how you've got these jet skis that are, in a way, the experimentation and innovation arm of iFood.

Demetrios: Break down what it, what this concept is for me, and then also, like, what has come out of it.

Rafael Borger: Jet skis, it is part of our culture, you know. It's, it's a big thing, uh, for iFood. So the concept, uh, regards the ambidextrous org- organization. You know, you have, like, iFood doing hundreds of millions of orders per month, so it's a big ship, and a big ship is very hard [00:10:00] to maneuver.

Demetrios: Mm.

Rafael Borger: So the jet skis, it's about, um, testing, testing things very fast, and if it doesn't work out, that's fine. You go to another thesis and try it again. So the thing is, you need to do it fast, cheap, uh, a lot. But once you find something, you invest big.

Demetrios: Mm-hmm.

Rafael Borger: Always dream big. Start small, but start right now and very fast.

Rafael Borger: So we created big revenue streams for the company in the past years by doing this. Uh, whole business units such as the fintech business, uh, grocery business, iFood Clip, they all were jet skis in the past. So- It's part of the way we do things.

Demetrios: Yeah.

Rafael Borger: The core business, uh, sustain the whole thing, of course, but you need to invest part of the resources and expect big term, uh, returns [00:11:00] on the, on the, you know, on the long term.

Rafael Borger: So that's the whole thing.

Demetrios: Yeah, it's a little bit like the power law, like VC investing where they'll invest in 100 different companies and one or two of them hit, but they hit big and there's not really a cap on how big it can go. And so you were able to create the whole fintech unit and now iLo is one of these big bets that you're making?

Rafael Borger: Yeah, exactly. We are shifting from using iFood to search from an item to be, uh, suggested an item for you. We try it, uh, with half a million users right now.

Demetrios: Mm-hmm.

Rafael Borger: And we started finding some interesting things, like it's six- 16% faster to complete an order from a search to a checkout-

Demetrios: Wow ...

Rafael Borger: with iLo than iFood, [00:12:00] uh, product sh- search when we benchmark.

Rafael Borger: 'Cause you can send an audio saying, "Okay, I want a pizza," and he knows what pizza you like, and- Wow, okay ... you can pay by Pix in WhatsApp like very fast and the pizza appears in your house. So it's faster and it's, as we benchmark, it's 35% higher the probability that a search became a cart addition as well.

Rafael Borger: So we are finding some, uh, interesting results. We still, uh, have some things that we are struggling, like s- scalability, uh, some costs, and I think everybody that's working with AI are facing some, you know, challenges. And of course, there's the product market fit that we still need to shape behavior because people use our iLo as they use iFood and they're not extracting the, you know, the whole potential that this thing could be, like [00:13:00] complex queries such as, "I want a pizza that could be delivered in 40 minutes, should cost less than 50 reals, and I want actually a order for two people.

Rafael Borger: One is vegetarian, the other wants meat."

Demetrios: Yeah.

Rafael Borger: So if you wanna query this within the apps, it's impossible.

Demetrios: Yeah. That's more just like filters, right? Don't you think that could be done with some kind of traditional filtering UI?

Rafael Borger: Maybe, but, you know, the graphic interface I think does not bring the same value as conversational in this particular case, because you can send an audio-

Demetrios: Yeah

Rafael Borger: saying whatever you want. And in a, in four seconds, this will come back as a four images of exactly what you want.

Daniel Wolbert: Yeah. Yeah. Yeah. And that is the other part, which is, uh, the concept of what is important [00:14:00] to me depends on what I use- the user, right?

Rafael Borger: Uh-huh.

Daniel Wolbert: So if you're just saying, "I want a cheap burger," for example.

Daniel Wolbert: If you just use the traditional filter as well, we're just gonna sort by and get, like, the lowest burger that we have. But that's, that's not kinda case for most of the users. The concept of what is cheap to me is different than, than, than what is cheap to you.

Demetrios: Yeah.

Daniel Wolbert: And there is always a boundary between what is cheap and what is a good quality, right?

Daniel Wolbert: So if I want, for example, a cheap snack, for example, if I just sort by, I will receive this, like, very $1, €1 thing with like this is glitter snacks. The user will say, "This is not what I like. This is not what I want."

Demetrios: No, I like those 200 reais pizzas.

Daniel Wolbert: Depending on where you are, that's not so expensive, right?

Daniel Wolbert: But there is this boundary between quality and price, and everyone has this different, uh, different... Yeah. Th- let's, let's just say boundary, right? So if you just use the traditional filters to sort by, the user will take a long time to, "Well, I want to filter by [00:15:00] 30 reais," but they don't like, not like this.

Daniel Wolbert: Yeah. But look, "Well, I don't want something that is far from my home." So when we use iLo, every single aspect of what is good to the user is already combined and, uh, analyzed to give you the best solution, right? So some people, they like some things that are close to your house.

Demetrios: Mm-hmm.

Daniel Wolbert: Some people don't.

Daniel Wolbert: Some people, they are very, uh, rating-oriented for the mission. So iLo can have the capability to understand every single parameter of your decision and combine that to give you the best, best solution.

Demetrios: You mentioned that you don't have those complex search queries, or people don't understand that they can ask all of those things of it.

Demetrios: So you almost have to do that

Daniel Wolbert: on your side. We have a, a very substantial quantities of users still ask for simple things, right? So we can-- we do analysis of our data all the time. So- Yes, this is correct. Like, the users still are not able to understand [00:16:00] the full capability of what, what Ailo can provide because this, they are used to a different interface.

Daniel Wolbert: Yeah. So most of the users, for example, when they are asking for a pizza, they just, just say, "Pizza." What do we do in the background? We take the pizza, we understand your weights of different types of parameters. We get to your profile, we get to your history, we understand how, how, how are-- you are positioned in the relationship between restaurants, items, and personas.

Daniel Wolbert: And we say, "Well, this guy is price sensitive. This guy doesn't like something that is far from your home, but he doesn't like this type of pizza, and these are the possibilities for you."

Demetrios: Mm.

Daniel Wolbert: And then when the user choose something, we learn from that to say, "Well, this is, this is what the user asked for, whether like, like it or not.

Daniel Wolbert: If we made a mistake, we will correct and we will fix it until we get- Yeah ... the, the perfect solution for you."

Rafael Borger: And you know, one [00:17:00] thing that is, is kind of funny, the first PR lunch we did, uh, with Ailo, um, there's like a query that you can type, "Tell me," uh, "Tell about me." And Ailo will answer who is you inside iFood-

Demetrios: Mm-hmm

Rafael Borger: based on your preference and so on. So it's kind of creepy when you do this on WhatsApp because you don't need to do anything. You just add the number and say, "Tell me about me." We get this number, we auth- the whole structure authenticator know who is you, and by the LCM profile, we answer this. And people got more, I mean, like a, a whoa moment, you know, super ended, like, uh, excited-

Demetrios: Yeah

Rafael Borger: when they saw their profile more than the recommendation itself of this thing that really knows who I am.

Demetrios: It knows my dirty secrets, that I order- It knows

Rafael Borger: something like

Demetrios: that ... chocolate at 12 at night when I'm really hungry. Yeah. Yeah. Do you feel like the consumer behavior is going to change? Like, are you making a [00:18:00] bet on the fact that in a year, in two years after we've been experiencing chatbots across all of our app usage, we're going to start asking more of it?

Demetrios: Because I notice with myself that I'm constantly trying to push the limits of what I can get out of my chat experience. So do you think that folks are now going to start saying, "I want a pizza, but I want it delivered in the next 30 minutes. I wanna make sure that it's a, you know, Neapolitan pizza"? And that search filter or that, like, deep query understanding is going to be put onto the user?

Demetrios: Or you feel-- you also are like, "We always are going to have-" that work of having to figure it out and make it quick?

Rafael Borger: You know, that's the $1 billion question, I guess. Like, um, I d- I actually doesn't know if the cons- uh, the consumer behavior's gonna shape [00:19:00] in that way.

Demetrios: Mm.

Rafael Borger: I bet some part of it, yes, but some part will just, you know, be on a Friday night in your home and start scrolling the app and doesn't know what to eat.

Rafael Borger: They do this in a bunch of platforms- Yeah ... Netflix, iFood.

Demetrios: Totally. TikTok.

Rafael Borger: There's too many options, you know? So, I know it's about facilitating this in the end. Like, I don't know what to eat. That's okay, just open this and type anything or just say, "I don't know what to eat," and we'll suggest something, and you're gonna find something out, you know?

Demetrios: Oh, you're the perfect person to talk to about this. I asked Nishi. I was so excited 'cause when he was working on, um... I can't remember what he was... I th- maybe Ilo? I was like, "Nishi, dude, do you know what I would love, is a Tinder food- ... experience."

Rafael Borger: Uh-huh.

Demetrios: Like, if I could just swipe right or swipe- Uh-huh ... left for food.

Demetrios: You give me a picture, and I say, "Yes," "No," and then at [00:20:00] the end-

Rafael Borger: Uh-huh ...

Demetrios: I get to say, like, "All right. I've swiped yes-" Uh-huh ... "on all this stuff. Which one do I actually wanna add to my cart?" Oh, yeah. And you're giving me all kinds of options. Yeah. Maybe there's Thai food, there's sushi- Yeah ... there's this or that.

Demetrios: And I'm there doing it.

Rafael Borger: We actually built this, you

Demetrios: know? No. Yeah. And did it work? Did it-

Rafael Borger: Yeah.

Demetrios: Was it

Rafael Borger: a complete failure? No, no. We are getting, like, very promising results with this. Sh- Like, uh, I cannot disclosure- Yeah ... because we are testing, like, very small right now. Mm-hmm. But it's very cool because we are combining the recommendation and the interface, the graphic interface-

Demetrios: Yeah

Rafael Borger: the both in a experience, and, you know, when you say, "Recommend me something to lunch," it's very open. Yeah. You can do, like, a bunch of things, right? And people will start, like, swiping, and in the end, there's like a kind of a battle, like a, a price compa- Oh,

Demetrios: [00:21:00] nice ...

Rafael Borger: you can see the price range. You can see, like- How long it takes

Rafael Borger: how soon to you can test it. Yeah. Yeah. How long it's gonna take, and in the end, you selected the one, the m- the favorite one, and it can go to sh- Mm ... to checkout. Yeah.

Daniel Wolbert: And when you think about that, let's go back to the r- the, this hypothetical Japanese guy who always eats Japanese food, right? Yeah. The guy's not Japanese.

Daniel Wolbert: He only eats Japanese food. For example, with the Tinder mode experience, let's call like that, right? I can have the benefit of showing some things that are different than Japanese food, and check it if, if they user like it or not.

Demetrios: Mm-hmm.

Daniel Wolbert: So imagine the, the, the, the richness of informations that we have.

Daniel Wolbert: So we can, like, break this barrier of the user always order Japanese food, so now we know that there are other types of food that he, the user might like.

Demetrios: Mm.

Daniel Wolbert: So we also enrich and learn from this experience to give better recommendations in the next time.

Demetrios: Yeah. I also wanna talk for a minute about this idea of latency that you were mentioning yesterday, 'cause I think it's [00:22:00] fascinating how there's almost like a Goldilocks zone now when we work with LLMs, that if it's too fast, we think that we didn't get the best LLM, and so we don't trust the response.

Daniel Wolbert: Correct.

Demetrios: But if it's too slow, you get bored or you just think like, "Oh, this-- it got stuck in a loop, and maybe it can't actually do it," depending on the task, I guess, because now we let Ralph do things for weeks.

Daniel Wolbert: Yeah, you see, the thing is that, uh, since AI is very general today and people are using that in their everyday life, it's the same thing as if you were talking to someone, and you, you say a question, and the person immediately reacts to your a- to your question.

Daniel Wolbert: So if you do a complex question, for example, uh, what, what disease I have, for example, just say, "Well, we have a flu," I will look at you and say, "Well, are you [00:23:00] really sure about that? You didn't hear what I am, and you don't hear my, my, my, my symptoms," and so on, right? On the other hand, if I say, "Well, what ty- type of disease do I have?"

Daniel Wolbert: And you keep looking at me and say, "Hmm, let me think about that." Give me a week. Give me a week. And I say, "Well, does this guy really know what this is about?" Right? Mm-hmm. So that's, that's the same analogy as AI, right? So if you are really, really incredibly fast, um, you might... The, I think the general perception is that you did not think enough to bring me the answer, right?

Daniel Wolbert: Yeah. But if you st- you're, if you're stuck on that, on thinking, think, thinking, and thinking, and the com- the, the question is not that complex, you think, "Well, I cannot trust you because you might bring me some bad, bad results," right? Yeah. Of course, the latest is, latest thing is, latest is, is, is the things that who works with AI, uh, like sleeps and wake ups think, "Well, I should improve my latest.

Daniel Wolbert: [00:24:00] I should improve my latest because the user," uh- That's so true ... will, will, will- Every

Demetrios: engineer's

Daniel Wolbert: dream ... will drop. Exactly, right? Let's get

Demetrios: faster.

Daniel Wolbert: But there's a boundary, right? So theee- There is a study from C, um, computer supporting for cooperative, uh, work, it's something like that, that, uh, it was on 2022 that was on the first surveys that said that, well, what is the perception of the user?

Daniel Wolbert: So-

Demetrios: Yeah ...

Daniel Wolbert: from kind of zero to something like four, it's like a, a little bit robotic. From four to 16 is something that I can accept because- Seconds,

Demetrios: four seconds?

Daniel Wolbert: Seconds, yeah.

Demetrios: Wow.

Daniel Wolbert: Well, four seconds is okay, depending on your, that depends on your question. Yeah. Right? So say, "What is my name?" If you, if you take eight seconds to reply to your name, you say, "Well, that's not a good AI."

Daniel Wolbert: But if you, we are talking about complex questions, if it takes a little bit time, like eight seconds-

Demetrios: Mm-hmm ...

Daniel Wolbert: it's, it's very acceptable, right? Of course, as I said, there is the latency, and there is the [00:25:00] perception of latency.

Demetrios: Yeah.

Daniel Wolbert: So when you look at your, any type of, uh, generalist AI platform, right? So we have the streaming part where like this is being constructed, so you, that's n- that's not, that takes more, but you are gonna see the, the results being coming on your screen.

Daniel Wolbert: If there are some tools that, um, even the AIs, they do that already. So when you are looking for something, there is something well, well, showing that, well, I'm looking that for you, I'm collecting information, I'm collecting the data. So the problem is, if you do not show anything to the user, like even like an icon, the user see, well, that stuck, that was a bug, there was a problem on that.

Daniel Wolbert: But if you show something that it's true going on, the perception will change completely, right? And also that depends on the channel that you are work- working with. For example, WhatsApp, right? WhatsApp us- used to be a very asynchronous channel. Well, I just type [00:26:00] someone, and then I just scroll down, and they just do something.

Daniel Wolbert: Oh, I got a message, I'm gonna reply.

Demetrios: Yeah.

Daniel Wolbert: So if you were working on WhatsApp, you can take a little bit more because you're expecting this to be more like asynchronous. However, if you are on a voice mode, for example, where you are calling and you're calling an AI, that should be very fast.

Demetrios: Yeah.

Daniel Wolbert: But the ch- challenge is how can you be very fast but the, with the same intelligence as if you were working on WhatsApp?

Demetrios: Yeah, you have to construct those systems differently.

Daniel Wolbert: Correct.

Demetrios: The, the thing that you talk about with perceived latency is so true. I actually have two data points on that. One is back in 2022, we had a product manager from you.com on the podcast.

Rafael Borger: Mm-hmm.

Demetrios: And he was saying that for them, they saw user engagement, or they stuck around much more if- You would ask the user a trivia [00:27:00] question while the site was loading, and...

Demetrios: 'Cause You.com, I, I never actually used it, but I think it was kind of like Perplexity. Sorry, You.com for that one. But the whole idea was that it's gonna take a while for us to populate this page. While we're doing that, so that you just don't see a scrolling wheel and then you get bored after two seconds, if we ask you some trivia questions or we ask you, uh, or we write a poem for you about your query- Mm

Demetrios: then you have something to do while you're waiting, and it distracts you.

Rafael Borger: Yeah. A- and that's so true that, like, a lot of, uh, AI squads, they are divided in two teams, intelligence and, and experience.

Demetrios: Mm-hmm.

Rafael Borger: But they work within the same goal. For example, user retention or order taking, and et cetera. Mm-hmm.

Rafael Borger: So when you're saying, like, about user retention, Daniel will say, "Okay, I can get [00:28:00] to four or five seconds of latency." And the experience squad will say, "Okay, how can I retain people for while they are waiting?"

Demetrios: Yeah.

Rafael Borger: And they will create an experience for doing so. For example, on WhatsApp, you can say, "Hey, thanks for coming back.

Rafael Borger: I know that you like," et cetera, et cetera. "I'm thinking about it, just one second, I'll answer it."

Daniel Wolbert: Yeah.

Rafael Borger: Or for example, in a graphic interface within the app, I can do this with the load bar, like, going very fast, and people say, "Oh, that's fast." And then take a little bit more time- Mm-hmm ... while, like, an image of, uh, Ailo is cooking something, you know, and then- Yeah

Rafael Borger: the food appears, so. Yeah.

Demetrios: Yeah, for me, I really like that. The experience side, because you gotta get creative- Yeah ... and you gotta think about, "How can I retain folks? What can I serve them to keep them engaged?"

Rafael Borger: That's very tricky because, you know, you can think about a bunch of things like, for example, voice tone.

Demetrios: Hmm. [00:29:00]

Rafael Borger: Like an agent voice tone. Yeah.

Demetrios: Voice is a whole different beast.

Rafael Borger: It can be, like, a- adapted to different scenarios, different, you know, uh, kind of needs, kind of-

Demetrios: Yeah ...

Rafael Borger: perceptions, right?

Demetrios: Yeah. You almost have to have two different systems, because what you were saying with the latency requirements and the experience of a voice agent versus the GUI or versus WhatsApp, you can't serve them in the same way.

Daniel Wolbert: For the voice, you n- cannot take too much to answer for something, right? So you must find the boundary between what can I leave away, wha- what can I, how can I say? What can I simplify In terms of my whole intelligence to make sure that the user will have a very good experience, but not the perfect one, right?

Daniel Wolbert: But still, the, this experience that the user will like, right? You [00:30:00] see, the-- When you look about the AI, I mean, think about the AI, right? You have your idea, you build your agent, but when you try to prototype it, so you have much more than the AI agent. You have the back end that will, will consume your data.

Daniel Wolbert: You have the front end that will show your data. You have the, all the interfaces, you have the whole infrastructure, right? So one of the things that if I'm serving in an app, I don't require that too much data processing to send it like to the, to the upper layers than if I do in a voice, for example, right?

Daniel Wolbert: I can simply say, "Well, this is the best item. This is the, the, the description of the information. Talk about it." Right? But if I want to render something, the front end layer, for example, I need to get the data to retrieve, to put in on a format that someone will consume that. This is one of the strategies between those channels, right?

Daniel Wolbert: On the other hand, if I, I need to calibrate between, well, be very meticulous about the best [00:31:00] recommendations for you, take more time than you need in the, for example, in the WhatsApp. But when you look at the voice, you have to flexibilize all your thing, right? So instead of giving like the best review burger that has all the ingredients that you need, that takes five seconds, I can flexibilize and say, "Well, this is still the burger that the user like, that, that have these characteristics, but there is only one item.

Daniel Wolbert: Just, uh, recommend that." Because for a voice, for example, that's the same analogy as the text, right? But I'm gonna talk about that, uh, eventually. But for a voice, I say, "Well, I want a burger. What do you expect?" "Well, I found a good burger for you. It has these characteristics." But if I re-read the entire description of the burger- Seems forever

Daniel Wolbert: you say, "Well, this is really boring. I don't like it." So for the voice, it's a challenge because you have to be assertive, of course, with not the, the amount of details because of the latency, but also you need to have, "Well, this is a good burger for you because it has a good price, and it's something that you like.

Daniel Wolbert: Would like to try it?" And you say, [00:32:00] "Yes," and then you go over the, the flow, right?

Demetrios: Yeah.

Daniel Wolbert: But-

Demetrios: I find that fascinating, especially with voice, 'cause you only have one channel, and with an interface, you have many different data points that can pop up. Correct. But like what you're saying with voice, you only have one channel to be able to- Add the most important things right there.

Daniel Wolbert: Yeah. And what happens if you give on your voice experience six pro- six items that you describe that? I mean, you, you will be hearing for 30 seconds and we won't like it. So you need to, to find the cl- calibrate that, well, this is something that we'll talk about after you receive the results that will attract the user to buy something, right?

Daniel Wolbert: As, as if it was a concierge in that, that case. But for the text, for example, and, uh, that depends on what the user expects. So I mean, the user wants convenience. The us- user wants to see things that are really fast, right? So if you buy... If you, uh, if you just build, like, a whole text interface [00:33:00] just to return text and the user can read this whole poem, the user won't do that.

Daniel Wolbert: That's why experience is important. Mm-hmm. That's why you have to get those results, and you have to show them, uh, on a very visual way that the user can understand and can proceed with the, the, the, the choice, right? But as I said, that depends on the channel.

Demetrios: Yeah.

Daniel Wolbert: Does that make sense?

Demetrios: Yeah. That-- You guys probably saw, I, um, yesterday when I was showing the playground, and there's, like, a way to style the calendar card that would come up in my chat.

Demetrios: That was inspired from some of the work that you, uh, you all did with the carousels. Mm-hmm.

Rafael Borger: Mm-hmm.

Demetrios: And I think there's a whole world out there of, uh, rendering different components, and now it came out. You probably saw too, like MCP apps or- Mm-hmm ... I think Claude has apps now. Uh, OpenAI's been doing it for a while.

Demetrios: But just bringing a different experience to chat, so it's not [00:34:00] always text, like you're saying, because there's so many things that you can't really get across with only text.

Rafael Borger: Yeah. It's a combination between, you know, image, text and, you know, kind of explain the reasoning, but in a, in a way that it shows to the user why you are doing, but not talking too much at the same time, you know?

Demetrios: Yeah. Yeah.

Rafael Borger: Saying, "Okay, that's what you want," image and say, "That's why this is good for you," you know?

Demetrios: Yeah. And you, you have to pique someone's interest, and then you can go deeper. Like you're saying, you just gotta, especially if it was the phone, you gotta just get on the call, tell them, "Okay, I got a burger 10 minutes away.

Demetrios: I got another burger. This one you haven't tried," and then there's, uh, whatever, this different kind of option over here. "Any of these sound interesting?" And then you let the person go deeper, because if you're trying to list off everything [00:35:00] from the beginning, it would- Be a disaster

Daniel Wolbert: Correct. Yeah. As I said, that depends on the channel, right?

Daniel Wolbert: On the voice tone, for example, if you are talking to, to your AI agent, you can have the benefit of asking f- some questions first. So I would like a burger. Which type of burger would you like? A chicken or a, or a, or a, a, uh, a beef burger- Vegetarian ... right? Vegetarian and so on. Well, I want the vegetarian.

Daniel Wolbert: Okay, I'm gonna find some options for you. If you do that on the text, like three, four message, probably the user won't like it. It's just that's the same, that's the same idea, same logic. That's

Demetrios: true.

Daniel Wolbert: But the channel is, is totally different. Mm-hmm. That's why on Ailo, for example, we have this, this balance between, um, asking things before refining and showing the results, or if I know enough the, based in on your intention, I can already show you this, these results.

Demetrios: But, yeah, that goes back to my question. Like, in my head, I see it as different systems completely. You have the app experience, you have the [00:36:00] concierge, maybe it's via WhatsApp, and then you have the phone, and are you not building three different kind of pillars?

Daniel Wolbert: I wouldn't say that because the way as we, as Ailo is structured, Ailo is ready to...

Daniel Wolbert: It's a multi-channel, uh, uh, solution, so it's not, uh, based on only, uh, if I want to be more verbose or less verbose. Everything, every aspect of the Ailo can be configured as a remote config, for example, and can be experimented. So think about that flow. When I'm building any type of system, I try to make sure that, uh, I do not do anything that is hard-coded, and I can easily flexibilize the characteristics, for example, of the, of what my agent does, what the number of the items, what is the, the weight that I'm doing between exploration vers- versus exploitation, right?

Daniel Wolbert: Mm-hmm. So Ailo [00:37:00] is a very robust solution that we can get a whole subset of parameters and say that, well, for this channel, I'm gonna... Ailo should perform like this either in the AI, in the, in the intelligence part. For the other channel, it should perform like that. So we can calibrate that, like, remotely and even test them.

Daniel Wolbert: So

Demetrios: it's not

Daniel Wolbert: necessarily- Yeah, so there's the

Demetrios: foundational pieces.

Daniel Wolbert: That's our fo- foundational pieces. Exactly. Right? I mean-

Demetrios: But at some point it branches out. At some point you have to make it very channel specific.

Daniel Wolbert: It's more on this experience part.

Demetrios: Mm.

Daniel Wolbert: As I said, the rendering and how I'm gonna... How can I say?

Daniel Wolbert: How I'm gonna show the results to the user. Yes, we have pillars that have been, had to be processed differently, but when you think about intelligence, I can say that, well, for the voice, you can have to be very fast on your response, and you have to not Talk too much, but for the WhatsApp, [00:38:00] you might have, uh, more results, have, have, have more text.

Demetrios: Yeah.

Daniel Wolbert: So.

Rafael Borger: The intelligence, like overall, is almost the same, but the way we translate it, uh, into a user experience changes very much- Yeah ... from channel to channel, you know?

Demetrios: But with voice, like you have to use different models, right? You, and so maybe you have the foundational pieces of the, the LCM that you have, but then later you can't, or maybe I'm mistaken, the LCM doesn't have a voice mode, does it?

Daniel Wolbert: Well, that depends on the capability of your, uh, of your model that you're working on. So if you have a model that can, uh, unders- understand your voice and can have the reasoning to do all the intelligence behind it at the same time, then you don't need to do these two layers.

Demetrios: Mm-hmm.

Daniel Wolbert: You can do that at once.

Demetrios: Oh.

Daniel Wolbert: Right?

Demetrios: I see.

Daniel Wolbert: So we've tried, we tried to, to, to... [00:39:00] That's, that's another, another hint, right? If you are working on a voice-to-voice message, please do not use one model to just to translate to your voice and then go to another model to interpret from voice to text. Try to do this directly, because otherwise you will get problems with that.

Daniel Wolbert: Yeah. So, uh, the, the, the, the generalistics models, they had it evoluted so they can get different types of inputs, text, image, and voice as well.

Demetrios: Yeah.

Daniel Wolbert: So, and, uh, our strategy, depending on the channel, can be relied on u- using different types of models to do the intelligence part.

Demetrios: You had some moments, like learnings of what you could have done better, and can you share, like what are some things that knowing now you would do differently?

Daniel Wolbert: The first thing on the technical perspective, always going back to latencies, is that it's always data at the end, right? So your agent itself, if, [00:40:00] and you think about your agent, it's more like a way of taking your corporate data and process it somehow and use some, uh, outputs of that. But then think about that.

Daniel Wolbert: Most of the data that you're using, it's not, uh, up to the agent to generate. So most of the agents that are corporate, that we use, they are, they consume datas from other types of sources, from other types of, um, areas like finances and anything like... But they have their own data, they have their own structure.

Daniel Wolbert: So one good thing that it's, it's, it's a hint is that before you build your agents, try to make sure that when you're going to escalate your agent, you are totally aligning with, with your data, the other data owners, right? Because this is gonna be your problem, but this is gonna be the other's problems who serve that.

Daniel Wolbert: So escalation, uh, uh, when you do about-- talk about [00:41:00] scalability, right? You probably you're gonna take this data from the others. You're gonna go from other APIs, and they must be aware that you sh- you need to do that.

Demetrios: Mm.

Daniel Wolbert: So it's more like it's something between the technology part, but like the agreement between other types of areas that you're gonna consume.

Demetrios: Before you really scale, understand your-- understand the way that the data models are?

Daniel Wolbert: Not only the data models, but what is the relationship between your agent and other parts of the companies. So for example, if you're building an agent that you need to get the finance report from another area, but the finance report is not ready to receive your data, your, your information, or it takes too long, there is a, there is a bottleneck on that, what would be the problem?

Daniel Wolbert: You are gonna have a light, high latency on your agent-

Rafael Borger: Mm-hmm ...

Daniel Wolbert: but because of an external party. And you cannot say, I cannot blame them and say, "Well, this is your fault," [00:42:00] because this is a, this is a whole solution. Right. So if you have a data from that, the finan-fi-finance report that takes a long time to, to retrieve, just make sure that you have some unsequence process, some- Ah, I see

Daniel Wolbert: um, some, uh, workflows that can run in the background before you get to the agent. So

Demetrios: the agent works fine, but then when we need to pull in different sources of data or when we need to pull in different intelligence, that doesn't bottleneck the agent.

Daniel Wolbert: Correct. And, uh, you have to have a good, a very good high quality agreement on that.

Daniel Wolbert: So I'm gonna consume your data, I'm gonna consume your API, I'm gonna do this information. Are you ready to receive that? No. Then I'm gonna try to, let's try to see how can we do the scalability without affecting the overall ecosystem.

Rafael Borger: Mm-hmm. Mm. I like that. One thing that is we found very difficult is the way you measure, um, how the customers are [00:43:00] interacting with your product.

Rafael Borger: I mean, for example, you can use Chaneles score to find product market fit. But if you are targeting a specific kind of user in Brazil, if you ask, "Uh, how are you gonna feel if this, this product doesn't exist anymore?" People do not understand this kind of question, and they just answer wrong, and you get the wrong feelings about your product.

Demetrios: 'Cause they answer like, "Oh, it's gonna be devastating to my life." They think it's gonna be too much, or they think it's gonna be less.

Rafael Borger: So there's a bunch of people saying they like, but they do not like, and people saying that they do not like, but they actually like.

Demetrios: Mm-hmm.

Rafael Borger: So the first thing user LLM as a, as a judge.

Rafael Borger: Then you need to deep dive the conversation itself, because people say, "I hate this recommendation." But why?

Demetrios: Yeah.

Rafael Borger: How the agents actually work with you For you hate this. Or even worse, [00:44:00] people, uh, you know, they have a problem with, I don't know, iFood, because every company has a problem with a customer, and they report this problem by using iLO.

Demetrios: Oh.

Rafael Borger: So it's-

Demetrios: So it's like a customer support bot all of a sudden.

Rafael Borger: Yeah.

Demetrios: Yeah. 'Cause-

Rafael Borger: And that's tricky also because what are you gonna do with this? If you answer nothing-

Demetrios: Yeah ...

Rafael Borger: probably the client will think you are dumb.

Demetrios: Did you learn how to create that handoff really quick to customer support or...?

Rafael Borger: We are, we are right now thinking about this, but, uh, you know, the main point is when you are, uh, building conversational agents, it's, uh, it's not, uh, easy as you might think to understand, like, truly understand how your customer really feels about your product, how, [00:45:00] like, for real, like- Yeah

Rafael Borger: what necessity you are, you know, truly, uh, addressing.

Demetrios: Mm.

Rafael Borger: You need to, like, interview, ask questions, see conversations, like on the detail. Yeah.

Demetrios: Yeah. Uh,

Rafael Borger: yeah. We found this in the hard way and- And,

Demetrios: and I could see from the customer's perspective why they would want to use iLO to report different problems that they're having with the app, because it's like the chat experience for me as a user, I think chat's chat, and when I interact with different chatbots, usually I ask it anything, and it will generally be able to handle what I'm asking it, right?

Demetrios: I don't think of, "Oh, there's a customer support chatbot that I have to go to." Mm-hmm. "There's a food chatbot that I have to go to." I don't verticalize it like that- Yeah ... as a user. I just think, [00:46:00] "Oh, I'm gonna chat and tell them that this is messed up."

Rafael Borger: And, and that's, like, a great thing for us to think about, which is for the user, it's, like, one thing iFood, right?

Rafael Borger: He doesn't care if he's talking to a agent or any names. All right? So in the end, in a agent-to-agent world that everybody's talking about, it's gonna be like, "Okay, I'm iLO. I'm here to take your order. If you want... If you have a customer problem, talk with my other friend here that's gonna help you." All right?

Rafael Borger: So, uh, gonna introduce to a bunch of friends-

Demetrios: Yeah ...

Rafael Borger: that is a bunch of agents that's gonna solve a bunch of problems. Or that's one option. Are you gonna orchestrate everything?

Demetrios: Yeah, that's what I was thinking.

Rafael Borger: In one experience, one voice tone, one kind of messaging? And say, "Tell me whatever you want, I'll sell it to you."

Rafael Borger: There's both options. I mean, I don't know which one's better. Maybe you [00:47:00] can, you know, have like a friendly approach saying, "Okay, let me introduce you to this agent that's, that can help you." Or you can just address everything like

Demetrios: Ailo does it all but behind the scenes, and it's just the gateway to whatever you need for iFood is how I would feel like it, it would play out.

Demetrios: But also, it doesn't really matter if it's just agents that, that are doing it. If ag- I have my agent that's doing my bidding, I don't really care how they get it done, as long as it gets done.

Rafael Borger: Exactly. Like the customer, he really cares, like-

Demetrios: No.

Rafael Borger: So, you know, he, he cares enough for you to standardize everything or just connect really quick and just solve the problem, you know?

Demetrios: Yeah. Yeah, right now we're in this period where they do care, because it is me as a human using my time to interact with it. But later on, if I get an agent that can do it for me, then it's a different [00:48:00] story.

Rafael Borger: Yeah.

Yeah.

+ Read More

Watch More

Exploring the Latency/Throughput & Cost Space for LLM Inference
Posted Oct 09, 2023 | Views 1.4K
# LLM Inference
# Latency
# Mistral.AI
Speed and Sensibility: Balancing Latency and UX in Generative AI
Posted Oct 26, 2023 | Views 487
# Conversational AI
# Humans and AI
# Deepgram
Serving ML Models at a High Scale with Low Latency
Posted Jan 20, 2021 | Views 311
# Presentation
# Model Serving
# Salesforce.com
Code of Conduct
Your Privacy Choices