MLOps Community
Sign in or Join the community to continue

Hardening Agents for E-commerce Scale: From RL Alignment to Reliability // Panel 2

Posted Nov 25, 2025 | Views 237
# Agents in Production
# Prosus Group
# E-commerce
Share

Speakers

user's Avatar
Paul van der Boor
Senior Director Data Science @ Prosus Group

Paul van der Boor is a Senior Director of Data Science at Prosus and a member of its internal AI group.

+ Read More
user's Avatar
Arushi Jain
Senior Applied Scientist @ Microsoft

Arushi is a Senior Applied Scientist at Microsoft, working on LLM post-training for Computer-Using Agent (CUA) through Reinforcement Learning. She previously completed Microsoft’s competitive 2-year AI Rotational Program (MAIDAP), building and shipping AI-powered features across four product teams.

She holds a Master’s in Machine Learning from the University of Michigan, Ann Arbor, and a Dual Degree in Economics from IIT Kanpur. At Michigan, she led the NLG efforts for the Alexa Prize Team, securing a $250K research grant to develop a personalized, active-listening socialbot. Her research spans collaborations with Rutgers School of Information, Virginia Tech’s Economics Department, and UCLA’s Center for Digital Behavior.

Beyond her technical work, Arushi is a passionate advocate for gender equity in AI. She leads the Women in Data Science (WiDS) Cambridge community, scaling participation in her ML workshops from 25 women in 2020 to 100+ in 2025—empowering women and non-binary technologists through education and mentorship.

+ Read More
user's Avatar
Swati Bhatia
Product Manager @ Google

Passionate about building and investing in cutting-edge technology to drive positive impact.

Currently shaping the future of AI/ML at Google Cloud.

10+ years of global experience across the U.S, EMEA and India in product, strategy & venture capital (Google, Uber, BCG, Morpheus Ventures).

+ Read More
user's Avatar
Audi Liu
Senior Product Manager @ Inworld AI

I’m passionate about making AI more useful and safe. Why? Because AI will be ubiquitous in every workflow, powering our lives just like how electricity revolutionized our society - It’s pivotal we get it right.

At Inworld AI, we believe all future software will be powered by voice. As a Sr Product Manager at Inworld, I'm focused on building a realtime voice API that empowers developers to create engaging, human-like experiences. Inworld offers state-of-the-art voice AI at a radically accessible price - No. 1 on Hugging Face and Artificial Analysis, instant voice cloning, rich multilingual support, real-time streaming, and emotion plus non-verbal control, all for just $5 per million characters.

Beyond Work: • Fun fact: Yes, I’m named after my mom's car during pregnancy (yes, really!). • Unforgettable moment: Witnessing LeBron James break the all-time scoring record live at Staples Center — a testament to one’s passion and discipline

X: https://x.com/audiliu_

+ Read More
user's Avatar
Isabella Piratininga
Director of Technology & Innovation @ iFood

Experienced Product Leader with over 10 years in the tech industry, shaping impactful solutions across micro-mobility, e-commerce, and leading organizations in the new economy such as OLX, iFood, and now Nubank. I began my journey as a Product Owner during the early days of modern product management, contributing to pivotal moments like scaling startups, mergers of major tech companies, and driving innovation in digital banking.

My passion lies in solving complex challenges through user-centered product strategies. I believe in creating products that serve as a bridge between user needs and business goals, fostering value and driving growth. At Nubank, I focus on redefining financial experiences and empowering users with accessible and innovative solutions.

+ Read More

SUMMARY

The discussion centers on highly technical yet practical themes, such as the use of advanced post-training techniques like Direct Preference Optimization (DPO) and Parameter-Efficient Fine-Tuning (PEFT) to ensure LLMs maintain stability while specializing for e-commerce domains. We compare the implementation challenges of Computer-Using Agents in automating legacy enterprise systems versus the stability issues faced by conversational agents when inputs become unpredictable in production. We will analyze the role of cloud infrastructure in supporting the continuous, iterative training loops required by Reinforcement Learning-based agents for e-commerce!

+ Read More

TRANSCRIPT

Paul van der Boor [00:00:05]: Great. Well, we have a couple of minutes. I just, I, I can't believe with this panel and this topic we. You only gave us 20 minutes or. No, there's no way we're going to cover all of the depth and expertise we have here with us today. So we have Arushi Jain, Senior Applied Scientist at Microsoft, Swati Vaita, Product Leader at Google, Audi Liu, which by the way got his name from his parents who named him after a car. That was an incredible, incredible. Never forget that.

Paul van der Boor [00:00:32]: It's good to have you. Adi, Senior product manager at InWorld AI and Isabella Piratinga, director of technology and innovation at iFood. Thank you so much for joining us here today. We've got a lot of stuff to cover. The topic is a small one, hardening agents for E commerce at scale from RL alignment and reliability. So there's probably 500 things we want to cover today. But, but I thought maybe to kick it off, since we only have a couple of minutes, I'll ask each of you to briefly share maybe a real example of the work you've done where an AI agent either exceeded your expectations or failed miserably in production and in particular how that informs the work you do today on reliability. So maybe we can start with Swati, then we can go to Arushi, then to Audi and then to ESA Works, then we'll go into the questions and the content.

Paul van der Boor [00:01:26]: Over to you Swati, thank you for joining us.

Swati Bhatia [00:01:29]: Yeah, of course. I'm super happy to be here. Thanks for having me and great to be on this panel and talking to everybody. But I'd like to mention that coming from the product perspective and especially leading a product at the scale of Google cloud, reliability, risk, ROI are the three Rs that I'm constantly focused on and we've got, I've actually got both a success and a failure story with all of our experimentation with agents and one key product that I actually work on is actually troubleshooting. All of Google Cloud's, you know, the entire troubleshooting and support experience and specifically focused on Cloud compute which is everything around GPUs, TPUs. So people really, we've had an influx of consumption there and we actually had an agent deployed that was responsible for pre qualifying complex technical support tickets that needed to be routed to a human. The base model itself was way too conversational for the kind of customers that we were targeting and it often failed to adhere to these strict internal service level policies. I think by implementing a new like, what's incredibly helped the team is when we brought in a DPO Best Adapter trained purely on human reviewed compliant rejections, we boosted that productivity policy adherence score from 45% to literally 90% in less than 48 hours.

Swati Bhatia [00:03:10]: And that really, you know, reliability in the enterprise context, especially when you're dealing with like really large workloads, really critical customers, large wheels that are asking for questions and about what's going wrong with specific deployment. I think that enterprise context, anonymity with compliance and DPO giving us a quick advantage there is really the success story I'd like to call out. There are plenty of failures building this on and we can dive more going forward. But that's one big success the team had deploying DPO and best adapters.

Paul van der Boor [00:03:52]: Amazing. Thanks for sharing. Swati Arushi what's your favorite example of either exceeding or underperforming agents in production?

Arushi Jain [00:04:02]: I actually have two I'll try to keep it short. So I have been leading the language understanding and reasoning front for post training primarily OpenAI models because we have partnership with OpenAI and I have seen tremendous change in terms of variance of the outputs. So through post training you can actually solidify and reduced a lot of the variance and the hallucinations that you see in the outputs. And extremely grounded in the M365 grounded data that we post train similar looking queries that our customers do in Copilot. One of those is the entire muscle has been built now over one and a half year that we don't do prompt engineering anymore. We heavily rely on post training methods to ship our models which are heavily grounded into Microsoft data. I would like to call out that as a pretty big success because all of the horizontal layer models that you see today in Copilot are all post trained models. None of them are just based out of the box prompt engineering.

Arushi Jain [00:05:16]: And then second one that I am currently working on is integrating computer using agents and post training for a lot of Microsoft specific tasks. So we've seen that inherent knowledge in the base model is quite strong for like normal day to day tasks like book a cheap flight ticket or book a restaurant reservation and things like that. But how can you take those models and post train and make it better to write it in Outlook, compose email kind of scenario or find the latest presentation that I just modified and add two more slides below it to get these two issues highlighted in the next meeting because it has all of your data combined of the presentations of the meetings and of the emails. So how can I take more off the plate from customer and just automate it and we are seeing like quite tremendous progress in computer using agents right now with post training. So yeah, in general, like I'm a big believer and optimist in post training so that's what I would like to call out.

Paul van der Boor [00:06:25]: Very exciting because I think the computer use agents itself as a whole topic which has been sort of an underwhelming performance. Right. So we are all excited for that to be possible. We know 1 out of 10 times does magic, but getting that to do work 9 or 10 out of 10 times has been. Has been hard. So good to see that we're on that trajectory. Adi, over to you.

Audi Liu [00:06:48]: This is so fun. Happy to be here. Great talk. Arushi and Swati, I think they touched on a lot of points about training. I want to maybe focus on model choice as a key factor of determining reliability. A quick intro about InWorld in World we provide text to speech models, we're number one on hugging face and artificial analysis. And a lot of customers come to us to build real time conversational agents. So think shopping agents, we part of Netflix and etc.

Audi Liu [00:07:16]: And a story I want to share from a production agent that went from not performing so well to really great is when they switch from an architecture from end to end speech to speech model to a cascaded model. And let me kind of explain what happened there. So toll calling, as we all know is very important in a lot of workflows. And with the speech to speech model, at first the customer prioritized latency. They wanted the conversation to feel natural. But they later found that hey, if I can cascade this model with a separate speech to text, text to speech and LLM, they could really customize the logic that way. And without ever fine tuning any models relying on Arushi and Swati training the models on their behalf, all they have to do is just piece the right pieces together. And they saw a huge performance boost in terms of customer acquisition and retention because the toll calling was more accurate.

Audi Liu [00:08:07]: Customers data for example balances like their memories, their preferences are being pulled more accurately. So they sacrifice maybe 200 milliseconds in latency switching from end to end model to something more cascaded and they saw a huge uptick in customer happiness. So I think my only takeaway working with a lot of customers is that a lot of times you gotta rely on shoulders of foundational companies. Training the models at EdenWorld will also train our own TTS model. So we're also pushing that boundary. But at the same time what you can do as a developer is choose the right architecture so that reliability is increasingly improving as foundational model provides model providers improve.

Paul van der Boor [00:08:49]: Thanks ADI and maybe before we go to isa, can you share what you mean with cascading? What's the difference between cascading and the end to end setup?

Audi Liu [00:09:00]: Absolutely. So imagine you have a model like speech to speech like GPT for real time. It takes in speech as an input and it also processes LLM processing and then it outputs speech as well. The model can also do torque calling, but it's on the model to do that. Right? A cascaded architecture is where you choose an STT provider, speech to text provider, think assembly DRAM. You pick an LLM provider like OpenAI GPT4O clots on it and you also pick a text to speech provider like InWorld and et cetera. With that do you have more flexibility in determining what function calls you want to call midstream, do you want to add parallel processing and etc. The end to end one usually is lower latency because of one model.

Audi Liu [00:09:55]: The cascaded one provides you with more flexibility. So that really depends on your use case and the experience you want to bring to your customer.

Paul van der Boor [00:10:07]: Very cool. Thanks for sharing. ADI and ESA a completely different world than the two large cloud providers and a more startup scale up experience from adi from a commerce peer commerce player in Brazil.

Isabella Piratinininga [00:10:20]: Yeah hey guys, glad to be here. Super excited to share what we are doing here in Brazil. I'm Isa and I'm leading innovation team in iFood and we are building ILO. ILO is your multi channel generative agent AI agent and I guess that talking about the product perspective, I guess that the main challenge that you have been facing and when you talk about reliability the main point is how can you create this connection with the customers. So we understand that until certain point the agent can perform very well Once you understand how to manage the customer's message and find the right interpretation for that. But we are still trying to figure out how to manage the answer and how we are providing all those as audience said, okay, we can define the ranking and you can find the best offer to them but doesn't mean that they wanted to have everything choose by the agent. So create this balance between until when we wanted to offer something to the customers and make autos offer available to them in a way that they feel the confidence okay, this is what I want and I feel super comfortable with that. This is very difficult.

Isabella Piratininga [00:12:01]: So we are in the middle of the how can we provide the best personalized offer here? Because you have the LCM it's your live commerce model created in partnership with Prosus where we can understand the whole customer's behavior and the work on top of this personalization. But by the end of the day, once you have created some amazing sync to the customers doesn't mean that they want everything in a way that they don't have any choice. So we are in the middle of the process between the experience and manage the best offer organization to them in a way that they can feel that this is very helpful. So I guess that once you have a good agent doesn't mean that in the real world, once they understand what you want, everything will perform us in the best way. So this is the main challenge that you are facing because we see like in the process between offer and add to cart we have a good conversion. But when you have like between cart and order then the child begin. So it's a trick process between managing personalization. Don't over personalization, it's still giving the customers the process managing the experience.

Isabella Piratininga [00:13:27]: Sometimes you have a voice to voice experience, sometimes they just wanted to text you and find the balance between the whole experience. I guess that is being like the craziest things that I have been facing so far.

Paul van der Boor [00:13:42]: I guess as we've heard so far today, it's not easy to build agents or it's easy to build one for fun, but to make one that the world uses or users like to use, it's not so trivial. And one of those areas Arushi that you touched on that you're doing research on is related to computer using agents. And I think a lot of the early demos of these types of agents using existing interfaces were very exciting. At the same time, the actual real life often were. At least in our case we're somewhat challenging and not quite there yet. So I'm curious, you know, when you think about agents that can actually do things on computers and let's take the E commerce context, if that's something you look at specifically. You mentioned a couple there. What makes the E commerce, I guess applications particularly tricky and how do you think about what we need to do to solve those?

Arushi Jain [00:14:46]: I would say two aspects are really challenging at this point in time, which is not even solvable through post training methods because I primarily work on post training. So for example, let's say you are on an Amazon website and you have to tell like find the cheapest cricket bat that I want to play with. Now for cheapest it has to go to that dropdown menu and select the price, right? So sometimes model gets Confused because these are agent models which are taking screenshot by screenshot and processing it. So sometimes model gets confused with these small dropdowns or if there are small icons and buttons are there because the localization still needs to be optimized a lot in the inherent intelligence of the model. So these things, some of the websites you can optimize, but there are vast kinds of websites that have been developed not for agents to process that, but for humans to process it. And sometimes humans can get confused if there are too many icons or small things and small text. When I actually particularly go to some of the Indian websites, they are very overwhelming because the way they have been designed are so cluttered, let alone humans can we expect agents to process that data and then navigate so well. So I think that has been one of the biggest challenges which we have seen coming from GPT4.0 to 03 and now more advanced ChatGPT Atlas models that OpenAI has developed has consistently worked upon this problem so that the base model is extremely strong.

Arushi Jain [00:16:41]: And then second challenge that I feel comes with the DOM versus the screenshot because the dom, the way some of the websites have been written, the entire text data that we feed to LLM is sometime is not rendering either the current information or is rendering with some lag. So the agent model which processes the entire text of DOM is not able to accurately answer. So in that case do we make the vision capabilities stronger and process everything screenshot by screenshot or also rely on the DOM for a particular website? So that has been two interesting challenge I would say working with computer using agents.

Paul van der Boor [00:17:28]: Yeah, that's a great point. I mean I remember when we did some tests, some simple things ended up like in the E commerce journey ended up being completely unaddressable by computerization. So like being able to do a click down, drop down and select a date on the calendar for when you're traveling from where to A to B.

Arushi Jain [00:17:48]: That with the date picking task. Yes, we've tried to improve that a lot.

Paul van der Boor [00:17:54]: Or scrolling right or adding toppings to your to your feeds. Those are things that these agents were traditionally not very good at. And I'm actually surprised you mentioned the post training that you know, doesn't seem to work here. I would be kind of expecting that if you're looking for like E commerce, shopping, ordering food, travel, you would imagine.

Arushi Jain [00:18:18]: Localized localization is such a universal or a vast problem that you end up optimizing for let's say 15 kind of websites, but you won't be able to solve it. Fundamentally that has to come through pre training that has to come from the base intelligence of the of the base model. Post training is something where you can add a layer of knowledge or you want to let go go of some of the properties of the pre trained model to make it learn new. Like OpenAI models have been trained whilst on the Internet data and they are accustomed to go to Google flights. Now we can post train on not to go to Google flights but be more unbiased and go to lot of other websites in general. Right. So those things can be very easily tackled through post training. But you say that the dropdown menu and there are so many Facebook friends messages and I want to click on one of those particular friends so it messes up the position all the time.

Arushi Jain [00:19:19]: So those are some of the things that we've recognized needs to come from the very fundamental intel like fundamentally needs to be solved in the base intelligence of the model.

Paul van der Boor [00:19:32]: That's a good point. I think a lot of those are actually do they should be underlying patterns in the way users expect to use websites. How we as humans do that. So that makes sense. So Adi, you mentioned obviously you're working on the real time voice experience and enabling more of those in the simplest form you're just kind of taking some user's voice input and translating that into an action. But I think the promise is much more to fully be able to control an experience using your voice in real time. It requires a lot of stuff. I mean, you know, again they're in the early experiments, you know we have Arushi, you mentioned Indian websites, Webiza in Brazil, I'm in Brazil.

Paul van der Boor [00:20:21]: Maybe others are also in parts of the world where English or the languages spoken aren't necessarily easily understood by these models. And so you have to deal with vernacular, with accents, with noisy backgrounds. If we're dealing with, you know, riders and ifood delivering food in the noisy streets of Sao Paulo, having to understand what they mean. So I'm sort of curious from your point of view, what is the most difficult challenge you see to actually get these e commerce agents powered by voice to harden at skill?

Audi Liu [00:20:55]: That's a question directed to me, right? Awesome. I think I want to actually question the status quo of agency versus control. Right. So I think people now want the agent to do more. Right. I want the agent to be asynchronously find the cheapest flight, you know, find me the best itinerary, come back to me in three minutes. I think on the consumer side of things, sometimes people might not want, might want to have that I talked to a founder and he did customer research and he said I want to build the best shopping experience for you. Do you want a support agent to tell you how good, how great you look when you check out a dress or do you want an experience where we automate things for you and just we're buying toothpaste and etc.

Audi Liu [00:21:43]: And based on the segment and the items they're buying, the experience that they want to design is completely different. So for something like you're buying for fun, right? Something where you want to be interactive, see how you look, have a voice agent talk to you and say you look great. Oh, actually this is better. Toll calling is not as important. You need to be in the moment, right? Whereas something where you're buying like finding the best trip, that's with this restriction etc. That requires a lot of agency. So I think my call to action for a lot of developers out there is you can differentiate not by building the most long run standing agents but by maybe providing an experience immediately that could give the user satisfaction. So better voices, more natural.

Audi Liu [00:22:29]: And I want to actually tie this back to like I'm not sure if people have seen cloud code and then cursor introduced like planning mode, ask mode, like they were so impressed by agents doing those long running tasks. And if you look at like Claude's own mime coding guide, they tell you always create a plan with the agent first and then cursor. Also recently in addition to introducing ask mode, they introduced plan mode. They understood how agencies sometimes can be a trade off, it's not always better. So I think this is also true for e commerce agents. I know Swati Arushi already doing the great foundational job of getting pre training done, post training done. But I think at the minimum feature level how do we get win customers like Isabella? You want to increase that funnel for people to check out people already adding to their cards. How do you nudge them and say oh my God, this makes me feel good to buy something.

Audi Liu [00:23:19]: It's not automating the process of adding to the cart, it's giving them that extra nudge. You look great. You made the right decision. I scraped 25 website is the cheapest price you can find right now. That's the kind of thing that will get people to like click on checkout. So I think different players have different roles to play. I think Arushi Swati are great at making agency great. Our job at InWorld or you know for Isabella as a consumer company is to get that, you know Small wins initially, which is a little bump in the right direction.

Audi Liu [00:23:48]: So. Great question, Paul. This is so fun.

Paul van der Boor [00:23:52]: Great. So I think we're already nearing the end, so I'm going to ask you to maybe each of you in 30 seconds share. Looking ahead 18 months from now, what is the thing each of you are hoping for? We solve or are working on solving when it comes to unlocking our ability to scale these agents for E commerce in a reliable way. Swati, you want to go ahead 30 seconds and then Isa Arushi and audit to close it out? Sure.

Swati Bhatia [00:24:26]: I think that Arushi alluded to this briefly where, where you know, there's a lot of talk of post training right now and trying to correct actions at the, you know, utilizing methods like DPO, theft, etc. I think we're heading to a zone where there's more internalized trust within the agent itself and there's like a learned confidence score that is before executing potentially high risk or high impact action, the agent itself understands and puts a confidence interval too that score and a risk assessment and yeah, I would call it more internalized trust and self reflection on behalf of the agent itself than necessarily putting corrective actions as we do at the moment.

Audi Liu [00:25:15]: It.

Swati Bhatia [00:25:18]: Yeah, that's my 30 seconds.

Paul van der Boor [00:25:20]: Great. I love it. I absolutely love it. It's sort of in the direction of the verifiability, the self verifiability. Right. Or confidence is.

Isabella Piratininga [00:25:28]: Yeah, I guess that my, my perspective is to have like iPhone connected in different channels and creating some good connection with the customers in a way that you just send something very simple to us and you can understand and deliver the best offer to them. Like oh, I want my favorite lunch in 30 minutes. You know, if I'm driving I can be connected to my car. If I'm at home, I can talk to Alexa, I can send a WhatsApp to them. So I guess that you create a huge possibility in terms of something that really changed the way of how they order some singing in Brazil, you know. So I guess that this would be like my vision and also my dream.

Paul van der Boor [00:26:13]: Super cool. I put everywhere. Right. Every channel, every. It's omnichannel in the truest sense. We use that word for a long time, but I think maybe now may really be possible.

Arushi Jain [00:26:25]: Yeah, I think techniques and data centers and data and all of those things will go on and have quite established what I am particularly interested in next one, one and a half year. Do we see how the product that we are trying to target versus actual the real gains in terms of Customer efficiency. How does that play out? Because the per token generation cost is still so high that even none of the big techs have figured it out to get the cost and the revenue, let alone like OpenAI and Perplexity who are burning a lot more money. So for me I think it would be very interesting in next one one and a half year how can we deliver on the cost and the product efficiencies that we promise customers targeting. So yeah, that would be very interesting.

Paul van der Boor [00:27:26]: To see and we were counting on you to do that. Right. So to drop the cost by another 90%. So we'll need you for that. Adi, what's your wish or requirement for scaling E commerce agents reliably in the next 18 months?

Audi Liu [00:27:43]: I want the agent to be able to be personable and adapt in real time. Like speak slower, be more compassionate, change your voice to a different accent and it all just works. It works naturally in humans now. It's so hard to do that with LLMs at the moment. So I look forward to that feature.

Paul van der Boor [00:28:02]: Very cool. I do too. So thank you so much. We're at the end of our time here. I think Demetrios AKA the gorilla is probably going to join us in a minute.

Demetrios Brinkmann : There's no more gorilla. Moderate some audience.

Paul van der Boor [00:28:16]: That's disappointing, man. That's disappointing.

Demetrios Brinkmann : Just a normal guy trying to host a virtual conference. All right, any questions we have there from the room? From the room. Lots of them. But not respective of the conversation you were having.

Demetrios Brinkmann [00:28:37]: There was some good questions in the Chat,

Demetrios Brinkmann [00:28:39]: and I instantly just had to know and I'm glad you guys got into this.

Demetrios Brinkmann [00:28:48]: The elimination of prompt engineering is huge. So super cool.

Demetrios Brinkmann [00:28:53]: That's probably like the biggest thing that.

Demetrios Brinkmann [00:28:54]: Made it made me break my frame. But everything you said was very exciting to hear.

Demetrios Brinkmann [00:29:01]: As you know, I'm a big fan of voice, so Audi, what you're doing is fun and then all the Ifood.

Demetrios Brinkmann [00:29:07]: Stuff and all the PM stuff. Great work folks.

+ Read More
Comments (0)
Popular
avatar


Watch More

AI Agents Are Revolutionizing E-Commerce at OLX // Nishi and Beatriz
Posted Nov 22, 2024 | Views 1.2K
# olx
# Prosus
# AI Agents
# Agentic
# GenAi
From MVP to Production Panel
Posted Mar 06, 2024 | Views 589
# MVP
# Production
# Evaluation
# Databricks.com
# Prosus.com
# WandB.ai
# honeycomb.io
# lastmileai.dev
Code of Conduct