MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Inside Uber’s AI Revolution - Everything about how they use AI/ML

Posted Jul 04, 2025 | Views 14
# Uber
# AI
# Machine Learning
Share

speakers

avatar
Kai Wang
Lead Product Manager - AI Platform @ Uber

Kai Wang is the lead product manager of Uber’s AI platform team, managing Uber’s end-to-end ML platform called Michelangelo. Today, 100% of Uber's most business critical ML use cases, including GenAI use cases, are managed and served on Michelangelo, driving Ubers both top line and bottom line business metrics. Kai has 12 years of engineering and product management experience in high tech, with a EE PhD degree from the State University of New York - Buffalo.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Kai Wang joins the MLOps Community podcast LIVE to share how Uber built and scaled its ML platform, Michelangelo. From mission-critical models to tools for both beginners and experts, he walks us through Uber’s AI playbook—and teases plans to open-source parts of it.

+ Read More

TRANSCRIPT

Kai Wang [00:00:00]: We actually tier all the different machine projects at Uber into four different tiers with the tier one being the most business critical project. We can measure some proxy metrics, how many models we trained last week, how many models deployed for those users, we provide a way for them to directly access the infra layer.

Demetrios [00:00:20]: Michelangelo is the machine learning and AI platform for Uber. All of the AI and ML use cases are powered by Michelangelo, which is a feat. That's a freaking crazy thing to say just because of how many use cases you have.

Kai Wang [00:00:39]: So we are. Yeah, we're actually open sourcing Machanger. Do you want to touch this?

Demetrios [00:00:43]: No.

Kai Wang [00:00:45]: We are.

Demetrios [00:00:46]: What?

Kai Wang [00:00:47]: Yeah.

Demetrios [00:00:48]: Holy shit.

Kai Wang [00:00:53]: Let's align on something first. Yeah, that is the definition of AI because you know, nowadays everybody talks about AI. Even my mom asked me, hey, I heard you're working on this artificial intelligence thing. What exactly are you working on? So I think when a lot of people out there talk, talk about AI, to think of, you know, ChatGPT. Yeah, that's what AI means to most of the people out there. But from the AI platform perspective, AI is not just ChatGPT. AI is not even just the large models for generative AI, which is technology behind the ChatGPT. AI actually covers the WHO spectrum of machine learning, from the very simple linear models to tree based models like Random Forest and xgboost, to the traditional deep learning models like convolutional neural network and recover neural networks and all the way to generative.

Kai Wang [00:01:46]: So let's keep that in mind and then let's talk about mechanical. So Uber's AI journey started back in probably 2015 when a few teams like Maps, Pricing and Risk, they started thinking about if it's feasible to replace the rule based systems with the machine learning systems. So when this team started, they all built their own one off workflows or infrastructure to support their machine learning needs. For training models or deploying models, these are the random or ad hoc Python code sitting in notebooks. So these notebooks were very hard to manage, they're very hard to share. So it's basically non reusable or non.

Demetrios [00:02:37]: Shareable and non shippable.

Kai Wang [00:02:39]: Non shippable, exactly. Very hard to productionize and impossible to scale, which leads to the inconsistency in performance. And also duplicated efforts across teams at Uber. So that's actually where Mechanial comes into play. So MeChangeo provides this centralized platform for our machine learning developers to manage the whole machine lifecycle end to end, without worrying about the underlying infra and system complexities. So that's why we actually started building Micangeo back in 2016.

Demetrios [00:03:10]: Yeah. Get that scalability. And I think there's something super cool that you pointed out in the recent blog post. Well, recent, like it's now a few months old.

Kai Wang [00:03:21]: Yeah.

Demetrios [00:03:21]: On from predictive to Generative AI and how Michelangelo has had this journey and you talked about how folks want their own way of doing things. And so you have this centralized platform that's supposed to be helping them and streamlining things, but then you have those people that are pushing the boundaries and they're pushing the edges and they're asking for hey, we want more deep learning use cases or we want to be able to support deep learning use cases. And then you had to go and figure out how to make Michelangelo adaptive for that. And so the flexibility on that platform. Can you talk a little bit about how you thought about that?

Kai Wang [00:04:01]: I think one of the lessons we learned along this journey is that you should let the developers choose which tool they wanted to use. So what we do is, what we did is that we provide an abstraction layer on top of all the infra complexities with the pre built templates and pipelines for I want to say more than 80% of the users for them to easily build machine learning applications. For the rest, like 20% users, these are the advanced power users as you mentioned, they want to build highly customized workflows. They want to use different trainers to train different models. For those users, we provide a way for them to directly access the infra layer. For example, we have a tool called Uniflow which is Orchestration framework, Python Orchestration framework which allows our users to write their own code, to customize their workflows and directly access the ray clusters to run their own training and then deploy the models on Machangel. So that's how we think about this. 80% and 20%.

Demetrios [00:05:12]: Yeah. And speed that process up so much. I can imagine from that time that you have the idea to actually deploying. The idea is unreal. Is that something that you look at as a key metric?

Kai Wang [00:05:25]: Yeah. If we talk about North Star metric for mechanical. Right. Let's go back to the merchantial goal here. Because the success metric is supposed to measure how well the product meets the product goal. So Michelangelo goal is to provide best in class machine learning capabilities and tools for wooba's machine developers for them to rapidly build and iterate high quality machine learning applications. So there are two keywords here. One is the rapidly which measures the developer velocity.

Kai Wang [00:05:56]: That's what you just mentioned, the ideal metric is time to production. Basically from the idealization of a project, how long does it take for that project actually launch into production? So that's a very nice metric. On paper, it turns out very hard to measure in practice.

Demetrios [00:06:18]: That was what I was going to ask. That's so funny that you say that.

Kai Wang [00:06:21]: Yep. I can tell you why. Right. First of all, use cases, they're all different. There are use cases, you just need a linear model. The other use case, you probably need to train a larger model from scratch. I'm just giving example. So you can imagine the time takes for these two different projects to launch is totally different.

Kai Wang [00:06:45]: The second variable here is the. How do you say, the team capabilities. Probably an engineering team with five machine engineers, everyone has 10 years of experience, probably can do things much faster than a team consisting of one like Applied Scientist, which. Who just graduated.

Demetrios [00:07:05]: Yeah.

Kai Wang [00:07:06]: So we also see this kind of difference based on our experience.

Demetrios [00:07:09]: Unless they're going on vibes these days, you know, can take you a long way.

Kai Wang [00:07:15]: Yeah.

Demetrios [00:07:17]: Jokes aside. Yeah. Didn't mean to derail you. It's true that you do have these very different teams, capabilities and maturity levels and how they attack the problems. And then you also have these different use cases that if you're training a large language model, it's very different than if you're just using a random forest model. And you can't take the average of those and then think, oh, our time to velocity is decreasing. Or over the last quarter it increased because we trained some large language models, you know.

Kai Wang [00:07:48]: Yeah, it's really hard to systematically measure this metric. But what we do here is we do two things. The first thing is we work with individual teams. Just watch how they actually use machangel to launch their project. So we get some anecdotal feedback from these teams. For example, our rider pricing team, they shared feedback saying by using mechanical, it actually reduced their engineering cycles by 80% compared to them building everything by themselves.

Demetrios [00:08:20]: Interesting.

Kai Wang [00:08:20]: Yeah. So we have such anecdotal feedback from our.

Demetrios [00:08:23]: It's subjective in a way.

Kai Wang [00:08:25]: Subjective. And each team has their own way to measure speeds or to measure the engineering cycles. The second thing we do here is that this North Star metric is really hard to measure. But we can measure some proxy metrics. Things like how many models we trained last week, how many models deployed, how many evaluation pipelines we run, how many eval reports are generated. All these proxy metrics can be an indirect indicator about your developer, you know, speed, developer, velocity here. So we track all this systematically on like Azure level.

Demetrios [00:09:01]: It's funny. Now is probably a good moment to bring up to when you talk about evaluation pipelines. You're probably talking about the predictive ML evaluation pipelines less than the new generative. You're. Now you're talking about both, actually.

Kai Wang [00:09:16]: Both. Okay, but that's totally different. Okay, I got your point. I see where you're coming from. Yeah, they're totally different.

Demetrios [00:09:22]: So in a way, you're looking at all these proxy metrics. You probably have a page or a spreadsheet of them and you can see if they're going up. We're doing more is good. Right. And less is bad. That means that there's something that's slowing people down and you can correlate that to that North Star metric of velocity.

Kai Wang [00:09:42]: Yeah. So we track this at 2. At 2 levels. One is at the like the Mic Andrew level. We look at all the training pipelines run last week or all the models deployed last week. That gives us a very good idea about the velocity or how people are using our platform. We also track this at project levels. For each project we have a dedicated dashboard to show all these metrics just for that dashboard.

Kai Wang [00:10:06]: Sorry, just for that project. So we know if there's certain things going wrong with certain projects we need to fix. So we do add both samples.

Demetrios [00:10:14]: It's worth noting that all of these different projects, I'm sure you are powering a million different models that I can't even fathom because I was telling you earlier, before we hit record, about a few blog posts that I read on that were more on the predictive ML side. And it was a breath of fresh air to read these posts about. One was around recommender systems out of application and the other one was, I think it was like a multi armed bandit scenario.

Kai Wang [00:10:40]: Yes.

Demetrios [00:10:41]: And they were so unique. These use cases were like, wow, this is so cool to see how mature and how advanced these use cases are. And it made me think when you enable the teams, when you have something that's not blocking them and they can quickly iterate, like you were saying, evaluate if it's worthy to continue forward and then get it into production, you can throw ML anywhere. It's like the more creative ideas, the better.

Kai Wang [00:11:13]: Yes, you can. Yes, for sure. You can throw ML everywhere. But do you want to do it? Because ML is expensive, right? Especially if you want to use deep learning. It's very expensive.

Demetrios [00:11:25]: Yeah.

Kai Wang [00:11:26]: Does your use case actually require machine learning? Like, or even deep learning? That's a. That's the first question you need to answer because that the machine doesn't come for free.

Demetrios [00:11:36]: Yeah.

Kai Wang [00:11:36]: So you have price to pay. It's so expensive. So do you want to actually want to do it? You want to evaluate on that front first, then start building a project, training your models, prepare data, train models, deploy models, do things like.

Demetrios [00:11:49]: And are you looking at the cost? Because I remember in the From Predictive to Generative blog you talked about how there's different tiers of model support and so if it's obviously the ride eta, that's probably one of the most important models that can never go offline ever. And then if it's something maybe a little bit more experimental, you are more relaxed about it.

Kai Wang [00:12:12]: Yes. When Micah angel started back in 2016, at that time our mission was to enable machine learning for Uber. Basically get Uber started with machine learning. And at that time when mechanical started we only had like three use cases on Macanjo, but now we have like thousands.

Demetrios [00:12:32]: Each one has their own dashboard.

Kai Wang [00:12:33]: Each one has their own dashboard and.

Demetrios [00:12:35]: You are looking at, hey, if the metrics aren't going right, we gotta go look and see why.

Kai Wang [00:12:41]: Yeah. For example, if you model performance degrees, it automatically send out alerts to that team and then they look into why.

Demetrios [00:12:47]: And for you, if they're not shipping enough, then you're going to go look into why. Is there something blocking there in the platform?

Kai Wang [00:12:56]: Is there some system bug that blocking the development? Does the pipeline keep failing? And other things we need to look at.

Demetrios [00:13:05]: But anyway, sorry I distracted you. Thousands of dashboards, it just blows my mind. Yeah.

Kai Wang [00:13:11]: So whichever team on your machine learning come to mic angel will help you get started and fast forward a few years. Actually most of teams already had incorporated machine learning to their core user flows. This is I think the end of 2019, early 2020. So at that time we actually pivoted our strategy a little bit. Instead of focusing on enable machine learning, because that's already done, we pivoted to improving the key project performance. That's when we introduced that machine learning tiering systems. So based on the business impact, mostly based on the business impact, we actually tier all the different machine project Wooburn into four different tiers with the tier one being the most business critical project. Some examples like you mentioned, pricing matching, like rider driver matching and also ETA fraud detection, nothing switch.

Kai Wang [00:14:11]: Right. If this model doesn't work, it will cause like a highest level outage for services. So these are tier one projects and all the way to like tier four. These are some personal experimental projects. Like you mentioned, users just want to try different things. So those are tier four projects. So today we have about 40 tier one projects, about 100 tier two and five, 600 tier four projects. Wow.

Kai Wang [00:14:38]: So that's how the tiering system works. And when it comes to prioritization, which project should we support first then? It's very clear we should focus on Tier one first, then Tier two, then move on Tier three, Tier four. And a lot of times we just don't have bandwidth to support Tier 3 and Tier 4.

Demetrios [00:14:56]: To be honest, the majority of the use cases still, and I can bet that almost all of those tier one use cases are still predictive. ML versus Generative AI.

Kai Wang [00:15:09]: Again, to be in the tier one project, you have to be in the core Uber's core user flow. Meaning when user book a ride or order food or buy some grocery, is your model actually in that critical path? If it's not, that means you're probably not tier one project. So based on that definition, 100% of our tier one today, that's still predictive. ML.

Demetrios [00:15:34]: Yeah. And I can imagine there's been a lot of people that have tried to think about ways to put generative AI into that core product. But when you think about the flow of me signing onto an app and then getting a car or getting food, where can you even put it in? Right?

Kai Wang [00:15:56]: Very good point, Very good question here. There are some scenarios and use cases, I think, where we can actually apply generally and to improve the experience. For example, we have to. We're actually working on two projects right now on the Uber Eats side. The first project is we use the large language model to improve the personalization experience, to generate better dish descriptions, to actually match your user's interest to what we actually show on the East Home feed screen. That's one thing Gemini can really help. The second one is we actually use large geometric model to improve the search quality. When you search something in the Ether app, we actually use large geometry model to improve that search quality.

Demetrios [00:16:54]: Because it can pick up if I'm saying I want something fancy, then it understands fancy a little bit more than.

Kai Wang [00:17:00]: If it were traditional semantic understanding. That's one thing. And also we can use the larger model to build taxonomy of all the dishes and restaurants within Uber Eats. That's done with the large energy model.

Demetrios [00:17:13]: In a way, it is a little bit of this recommending with LLMs, especially on that first use case you were talking about where you understand me, I'm a vegetarian, I go onto Uber Eats, you don't show me any food. And then when you talk about different restaurants, you're not Going to highlight their meat dishes. You're going to highlight their vegetarian dishes.

Kai Wang [00:17:36]: Exactly. So in the past, if you. I don't know if you noticed this in the past, if you log on to the wooden app, we have these carousels, right? Each carousel we show different restaurants or dishes. And in past, we only have like, Korean food, Chinese food, burgers, hot dog. Very boring. You look at it, you don't want to water, I don't want to water. It just says Korean, Chinese. It doesn't really intrigue me.

Kai Wang [00:18:01]: Right, so what we do is we, first of all, we use the large energy model, try to understand the menus and dishes from all our restaurants, then come up with that new set of carousel titles. Now we have like spicy Sichuan food, meat lovers, and all those catching carousel titles. And then we use live joint model to tag each of the restaurants with these new carousel titles. This is how we act. Then we use this information to match our eaters with the restaurants.

Demetrios [00:18:36]: Oh, fascinating.

Kai Wang [00:18:37]: So that's how the live generation model is used for recommendations.

Demetrios [00:18:40]: Yeah, And I can imagine also you could do it with user reviews. Maybe not. In Uber's case, you're not giving reviews after you eat, right?

Kai Wang [00:18:49]: Normally we actually do, but not many people are doing that.

Demetrios [00:18:52]: Yeah, yeah, yeah, that's good. I didn't even know that you had that. And. But it, in a way, it's like that sentiment you can see if somebody or if there's many people that are ordering the same dish, then you can know that. All right. We really want to figure out how to message this dish correctly. It feels like it's a bit of predictive and generative ML. Like, because if I'm ordering the same dish over and over, you don't need an LLM to be able to siphon that out.

Demetrios [00:19:24]: I guess that's very clear tabular data in a way.

Kai Wang [00:19:29]: Yeah, but that's a very good point coming back to that review thing. Right? We actually, we just launched a project, a geni project, into an online experiment. What we do is we use the large junction model to look at all the reviews for certain restaurants and also the scores and other sources of data, then try to summarize how the eaters actually like or dislike about this restaurant. Then we give that feedback to the restaurant owners and they can make improvements on their dishes or on the services based on that feedback. So that's one of the project. We call this Customer Feedback Summary for Merchants.

Demetrios [00:20:09]: Are there any other interesting use cases that you like talking about or that you've seen and you kind of got your mind blown by.

Kai Wang [00:20:18]: I can talk broadly how machine learning has been used at Uber from the predictive machine learning all the way to Geni here. So at any certain moment we have more than 5,000 models in production making 28 million real time predictions per second at peak. So that's a scale we are operating on. And everything happens on my kanjo. Sorry I have to brag. Yeah. So virtually every single interaction our users have with the Wooper app involves machine learning under the hood. Let's take the Rider app for example.

Kai Wang [00:20:59]: Right. This is all predictive machine learning so far. So when you try to log into the account, we actually use machine learning to detect if. If this is actually you try to log into your account. We call this account Takeover.

Demetrios [00:21:14]: Oh wow.

Kai Wang [00:21:14]: Yeah. This part of our fraud detection mechanism here. Yeah. Once you log in, machine learning used for when you search for a destination, it's used for search. It's used for ranking of search results. Once you identify your destinations, then machine learning is heavily used for match you to the driver, for the pricing for ETI calculation, even to recommend or recommend the right product to you. I don't know if you notice this if you open the Wooburn app, it actually shows you a lot of different like UberX, Uber Black, UberHool, everything, right?

Demetrios [00:21:51]: Comfort, all that.

Kai Wang [00:21:52]: Yeah, actually that list is personalized. Everyone see different list.

Demetrios [00:21:56]: Uber thinks I'm way richer than I am because they're always recommending me comfort. I'm like give me that cheap option. Why are you giving me the more.

Kai Wang [00:22:03]: Expensive better than I know I don't like that.

Demetrios [00:22:07]: I don't like that at all.

Kai Wang [00:22:08]: Okay.

Demetrios [00:22:09]: I do choose comfort though. They got me. I'm a sucker.

Kai Wang [00:22:13]: Okay. I let our product recommendation team know. Yeah but coming back to this, right? All the way to like where you are on trip, we we give you the real eta. That's also machine learning and all the way to payment fraud detection payment and also the of course the customer support. Now we use Chennai for customer support. Then if we take the Uber Eats app, it's the same story. Machinery is everywhere. We focus more on the personalization recommendation and also East ETD.

Kai Wang [00:22:47]: So these are ETD's estimated time of delivery.

Demetrios [00:22:51]: Oh yeah. Okay.

Kai Wang [00:22:54]: So yeah, that's on the predictive machine side. Now let's talk about Geni.

Demetrios [00:22:58]: Right.

Kai Wang [00:22:59]: If we look at the genius cases at Woover, we can roughly divide them into three different categories. The first category is we call this magical user experience. These are the general AI applications that directly impact our end users. For Example, any of the chatbots, our customer support chatbots. We also have this ernlo copilot. We build a copilot chatbot for our drivers to answer their questions, to guide them where you should drive so that you can maximize your earnings.

Demetrios [00:23:27]: Oh, cool.

Kai Wang [00:23:27]: Things like that. It's still in the works, but yeah, that's something in progress. Something else like the personalization experience I just mentioned. That's one of the use cases belong to the magical user experience. Then the second category here, we call it process automation. We basically use generative AI to automate some of our internal processes or further automate some of our internal processes. For example, we use AI to generate the menu item descriptions for restaurants because a lot of times if you log into Uber Eats app you will see a lot of the menus, they only have the title. The dishes, they only have the title, there's no description at all.

Kai Wang [00:24:15]: So we're using Gen AI to generate descriptions for 100% of all the dishes on Uber Eats. Wow. And what else do we do? We also use JNI to try to identify like fraudulent accounts. One of the key characteristics of this fraudulent account is that their usernames are usually gibberish usernames.

Demetrios [00:24:37]: Interesting.

Kai Wang [00:24:37]: Yeah, like I love dog. That's probably not a real person.

Demetrios [00:24:41]: I thought it was just like F X Y W by exactly, exactly.

Kai Wang [00:24:44]: So we use arch language model, just scan through all the accounts, try to identify these potential fraudulent accounts. Something else in this category is the driver background check. When the new drivers, when they try to onboard to the platform, we do very thorough strict background checking for each of the drivers and we use large junction model to accelerate that process. Oh wow. So that's the second category. We call this category, we call this process automation.

Demetrios [00:25:17]: And then the process automation isn't. It doesn't have anything to do with like back office processes of hey are like in a way bureaucracy of when you need to do something and if I want to submit time off, maybe I need.

Kai Wang [00:25:35]: That's the third category.

Demetrios [00:25:36]: Oh, okay. What do you call that?

Kai Wang [00:25:37]: We call this internal employee productivity.

Demetrios [00:25:40]: Okay, yeah, that makes sense.

Kai Wang [00:25:41]: This is where we build all those tools as you mentioned, the workflow automations. And for example we also have this, we build this data GPT which is because Woopa has vast amount of data and we have so many data scientists, product analysts that need to analyze data every day. Today they do that by writing SQLs. So we build this data GPT tool to help them first of all write better SQLs and second just query data with a Natural language. So that's one of the internal employee productivity projects.

Demetrios [00:26:18]: This text to SQL is becoming a very standard gen AI use case. You see it in so many companies. And that lift that you get by allowing the data analyst to write more queries more quickly or just get answers quicker is very valuable. And it's something that you can show value really easily. The other stuff, maybe it's debatable if.

Kai Wang [00:26:46]: You do it right. Let's keep that in mind. Yeah, you can do things really fast, but it has to be. You have to give your correct answer. That's the key.

Demetrios [00:26:57]: Yeah. Because you can spend 20 minutes trying to get a simple answer because you didn't do it right.

Kai Wang [00:27:01]: Yeah, exactly.

Demetrios [00:27:02]: Yeah. So you're talking about doing it right from the end user perspective, not doing it right from the infrastructure side of things, and how you're architecting the agent to go in and write the SQL.

Kai Wang [00:27:12]: Because you're from the infra side. If you do it wrong from the infra side, you're screwed. Yeah, let's forget about any user experience.

Demetrios [00:27:19]: Exactly. There is another thing I wanted to talk about, which is the scale and all of this inference. And what are some gnarly challenges that you've gotten into because of the amount of models that you're using and you're now using what? Not smaller models, but you're using all these predictive models. And so those have a certain style and flavor of needing to use inference. And then you're using the generative models and those are a different style and flavor. And so how do you attack that within Michelangelo to be able to serve both of them?

Kai Wang [00:27:54]: Those. Yeah, very, very good, very good question here. So I think there are a few things we did to enable generative AI to extend Micah angel from supporting the predictive machine learning to generate AI. First is the compute resource. The compute demand is totally different right now. You need a lot more GPUs and high end GPUs, H100 other GPUs.

Demetrios [00:28:24]: Tier 4 doesn't get that kind of stuff. Huh?

Kai Wang [00:28:25]: Tier 4, usually, no. Oh, by the way, that's pretty cool.

Demetrios [00:28:30]: Yeah.

Kai Wang [00:28:30]: So we prioritize higher tiers over lower tiers. For sure.

Demetrios [00:28:35]: Yeah.

Kai Wang [00:28:35]: We also prioritize production jobs over offline training jobs, batch jobs.

Demetrios [00:28:40]: That makes sense, right? Yeah, yeah, yeah. Okay.

Kai Wang [00:28:43]: So that's how we prioritize. But coming back to the infra change for jni. So that's one thing we need to procure more computer resource. That's the prerequisite for you to do anything for jni. Right. And secondly, the tech stack has to change, has to evolve. But for JNI models, because now the model is much larger. It doesn't fit into one gpu.

Kai Wang [00:29:08]: Usually it doesn't fit into one gpu. So you need model parallelism for the traditional deep learning. Usually you can get by using data parallelism. It's totally enough. So. So to enable Geni, because of this restriction, we actually integrated deep speed and use deep speed in junction with Ray.

Demetrios [00:29:27]: Both together.

Kai Wang [00:29:28]: Oh yeah, you have to. One is the orchestration layer, the other is the model optimization layer. So use both for large energy model fine tune. We also enabled Triton from Nvidia Nvidia for serving large models. And now we actually using the same IFRA to serve one of our large traditional deep learning models like our recommendation model.

Demetrios [00:29:52]: Okay.

Kai Wang [00:29:53]: So it's not just for jni. We also use that for some of our traditional machine learning, which is really cool.

Demetrios [00:30:01]: Wow. So you had to add. It's almost like you had to add a few new tools into the tech stack, but then when you did that, you found those tools help for the.

Kai Wang [00:30:12]: Other stuff, it's actually useful for something else we've already doing for years for.

Demetrios [00:30:17]: Those bigger deep learning models, but not for the smaller models.

Kai Wang [00:30:20]: Not for tree models like agpose. You don't need that.

Demetrios [00:30:23]: Yeah, it's not needed. And then they can choose what they want as far as that. Or you're the one that's optimizing the inference.

Kai Wang [00:30:33]: On the inference side is us. But there are. When we talk about optimization of serving a model, it's always two things. One is the optimization on the infra side, that's what we do. The other thing is on the modeling side, that's what our users do. For example, you can like quantization. That's something they should do. But we should support quantize the model serving.

Kai Wang [00:30:56]: That's on infra side. It's always a collaboration. We also collaborate with other teams.

Demetrios [00:30:59]: I imagine they're coming to you all the time asking you for support for certain things like oh well now we want to distill this model. Can we figure out how to make it easier for us to distill it or compress it or quantize it? All of that.

Kai Wang [00:31:12]: Yeah.

Demetrios [00:31:13]: How do you go about choosing? Because there's no tiers there, is there? Or are you still bringing back in this idea of the impact?

Kai Wang [00:31:22]: They're still impact based. They're still impact based. There are larger. There are general use cases with larger impact compared to the other. Like more like experimental general projects. So you even. They're still tier three today because they're not in the core user flow, but they still can prove their impact, business impact. So as long as they can prove their business impact, we can prioritize.

Demetrios [00:31:46]: We talk about generative AI use cases. How are you looking at the agentic use cases and when the agents are going and doing things it in the. At the end of the day, it still is just, hey, you've got models, you're focused on that inference of those models, are you adding extra infra support for the agents, quote, unquote agents? Because as we said before, it's probably good to define what you think when you think of agents. But in my eyes, it's like the LLMs or the generative AI that can go and actually do stuff as opposed to just giving you these answers.

Kai Wang [00:32:26]: So starting this year, our focus has shifted to support exactly what you just mentioned. We want to actually enable Uber to build agentic AI systems going forward. So we're extending Micangelo. So in the past we call that more like Model Studio, where you manage your models and all the relevant components related to model training and serving. Now we want to also build a agent studio for agent ops for you to actually build, evaluate, deploy your agent and manage your agents. So that's something actually in the works. I just got out of a meeting this morning with our engineering team who is actually building this tool right now.

Demetrios [00:33:10]: That's very cool to know. So how are you looking at it and is it any different than the other stuff? Or if so, maybe the better question is how is it different?

Kai Wang [00:33:22]: So in the past, all that we carry is the model. We make sure you can train the model, then deploy the model to endpoint. Now you can call the endpoint to make predictions. That's what we care as the platform team. But the agent is different. Agent is application by itself. So we're now at application level. So we have the Model Studio here, the agent student on top of Model Studio.

Kai Wang [00:33:50]: The agents in the agent studio will leverage the models you build in Model Studio. So inside the whole application level now, how do you actually accelerate that agent creation, evaluation and deployment flow? That's something we are actually evaluating and deciding. How do we allow our users quickly spin up some agent, evaluate agent performance, then go back to iterate on agent and evaluate again and deploy. Same thing as what we do for models, but now it's more at the application level. That's a major. I think that's a major.

Demetrios [00:34:27]: How do you make sure that the agent has the right permissions. So you're not writing to some database that it shouldn't be and all that stuff.

Kai Wang [00:34:35]: Yes, very good point. Because one of the major value prop of the AJAX Studio on My angel is that no matter what model you use, we allow you to access open internal data and also open internal tools. Then the security question you just mentioned coming to the picture, there are certain data you have access, there are certain data you don't have. So to enforce that, we actually work with our engineering. We have engineering security team to work with them to build this security protocol for this agent.

Demetrios [00:35:13]: So if I create an agent, then it knows my permissions and it is only allowed to do those things.

Kai Wang [00:35:21]: Yeah, it's either yourself or your team. So it depends on your authentication.

Demetrios [00:35:29]: But then the utopian scenario here is that you have this agent studio and another team can come and grab my agent that I built and now start implementing it for their use case. Or they tune it and do a little bit more. And now, cool, I'm up and running with an agent. I didn't have to build it from scratch.

Kai Wang [00:35:50]: Exactly.

Demetrios [00:35:51]: But the big question in my mind is, does it automatically know that now it's another team, it's another person and now they have these permissions?

Kai Wang [00:36:00]: It should. So we're still building this. It's not in place yet, but still building that. And the thing just mentioned is actually we call the agent registry. So we have a repo of all the agents built within Uber and every single team can look into, look at that agent Repo to see if there's anything already existing so that I can just reuse.

Demetrios [00:36:21]: I know you're going to have that because you have the model repo that's classical one. Most folks have that. But then I've read the prompt Toolkit blog that you have.

Kai Wang [00:36:32]: Yeah, prompt repo. Yes.

Demetrios [00:36:33]: And you have the prompt repo, which is another great one where folks come and they see, oh, here's a prompt for this model, and it is for this specific thing. Somebody already spent the time to tune this prompt. I don't have to start from zero.

Kai Wang [00:36:47]: Yes, yes. And now we have Agent Repo.

Demetrios [00:36:51]: Yep.

Kai Wang [00:36:51]: We're also building the MCP repo.

Demetrios [00:36:54]: Oh, really?

Kai Wang [00:36:55]: MCP is a cool thing now. Yeah, Yeah. I want to build MCPS for a lot of the Uber internal services so that it can be leveraged by the users to build agents.

Demetrios [00:37:05]: You're thinking, all right, well, all of the internal Uber tools should have their own MCP server.

Kai Wang [00:37:10]: Not all of them, but some of them. We look at which tools are used a lot today by our users, then we won't build MCP for those tools.

Demetrios [00:37:21]: There's agent builders and then there's tool builders in a way. And maybe it's the same person.

Kai Wang [00:37:26]: It can be the same person.

Demetrios [00:37:27]: It just is that depending on the state or their moment in time, they're building tools or they're building agents. But at the end of the day, I think all of us want to be building agents.

Kai Wang [00:37:39]: Yes.

Demetrios [00:37:40]: And not as much tools.

Kai Wang [00:37:42]: But at Uber, usually the tool owners, they build the MCP server for that tool and that'll be used by all the other agent builders.

Demetrios [00:37:50]: And then they'll say, hey, this is standard practice. This is what we want you to use the tool for.

Kai Wang [00:37:55]: And I publish my MCP server to MCP Repo. Now read the documentations to see if this is useful for you. If yes, follow this and to use it. Yeah, that's how we envision that. It's still in the works.

Demetrios [00:38:07]: Yeah, I can imagine you'll come up with some cool stuff again because you can throw AI anywhere, right? Or ML on helping folks select tools.

Kai Wang [00:38:19]: Oh yeah.

Demetrios [00:38:20]: And helping that search, getting that search really clean on, oh, I want this. And so it's almost like I describe what I want and I see a world where that internal tool can look like, okay, maybe you want to use these agents or maybe you want to use these servers. MCP servers. And they have these two tools that you can leverage.

Kai Wang [00:38:42]: Yes.

Demetrios [00:38:42]: So we talked about supporting different inference. Right. But what about supporting different evaluations?

Kai Wang [00:38:48]: So I think evaluating a general I applications or agents is a totally different story compared to all the evaluation for the predictive machine learning, where you just, you know, you know exactly what metrics you're looking for. You measure the AUC precision recall, and you have standards pipelines for standards ground truth. Yeah, exactly. To measure the accuracy there. But for Geni, it's totally different. I think this is still a problem. The whole industry is trying to figure out how to do the evaluation. Right.

Kai Wang [00:39:21]: But at Uber, we built our own, we call it Geni evaluation frameworks, which allow users to do two things. One is the LLM as a judge, basically use another LLM to judge the output from these LLMs. I think that's a lot of teams out there in the industry are using. The second is to include the human in the loop. Basically, whenever you need human to make a judgment, you just involve the human to do evaluations. Those are the major two methods we're using today. And also the third One is, yeah, of course you can provide a golden data set even for your genius cases, right? Yeah, Provide a golden data set, then use a golden data set to evaluate performance. Those are the few things we've been working on.

Demetrios [00:40:12]: Sweet. Dude, you wanted to touch on that.

Kai Wang [00:40:13]: So we're actually open sourcing Mac Engineer. Do you want to touch base?

Demetrios [00:40:17]: No.

Kai Wang [00:40:19]: We are.

Demetrios [00:40:20]: What?

Kai Wang [00:40:21]: Yeah.

Demetrios [00:40:22]: Holy shit.

Kai Wang [00:40:23]: Yeah.

Demetrios [00:40:23]: Really?

Kai Wang [00:40:24]: Yeah. Our plan is in two years, open source of Fullback Agile, starting with our orchestration framework called Uniflow.

Demetrios [00:40:34]: Wow, that's huge news. That is so wild.

Kai Wang [00:40:39]: Yeah.

Demetrios [00:40:40]: So you're open sourcing first the orchestration framework and then everything else. Yeah, and it's going to come out in spurts or it's going to come out. You basically have to clean the code base.

Kai Wang [00:40:52]: Oh yeah. A lot of cleaning work in the works. Yeah, we have to clean the code base. And our plan is to, at least for this year, 2025, we want to selectively work with some enterprise partners in a closed source fashion. So we give access to them to our open source repo and they can contribute. Wow. Then next year we probably want to fully open the repo to the whole.

Demetrios [00:41:20]: That is so cool. That's why I was asking about the prompt toolkit and if it was ever going to be open source, then it.

Kai Wang [00:41:30]: Would be open source. Yeah, probably in H2 sometime. Q4 maybe.

Demetrios [00:41:35]: Yeah, this year.

Kai Wang [00:41:36]: This year.

Demetrios [00:41:37]: Oh, so there's going to be some things that will come out super, super fast.

Kai Wang [00:41:42]: We already have something. Oh. But again, this year is all closed source. Like only those few partner teams.

Demetrios [00:41:48]: Closed open. Yeah, closed open. Yeah.

Kai Wang [00:41:50]: Yeah. But next year we try to fully open.

Demetrios [00:41:53]: Wow, that's so cool. Why, why now?

Kai Wang [00:41:58]: Well, you know, when, when, when Machanjo started like back in 2016, there were. There was not. There were not many options out there. So we had no choice. We had to build everything by ourselves from scratch. Of course we use open source technology like Spark and other things. But then fast forward 10 years now, nine or 10 years. There's so many startups out there.

Kai Wang [00:42:22]: There's all the cloud players, they provide their own ML Ops tools and there is such a huge mlops community out there like yours. So we do think allowing external contributions to Machanjo can. Can actually drastically accelerate the innovation of Mac engine. That's why I want to open source this, which in turn will benefit our Uber ML community. And also the other thing is now we are in this era of geni, right? If machine learning has been advancing so fast, Gen I is even faster. So our team we have 100 people but still it's a smaller team compared to the whole community. It's hard for us to be honest. It's hard for us to keep our pace with all the advancement in industry.

Kai Wang [00:43:13]: So again, that's why we want to allow the external contributions so that we can keep ourselves always at the front of the technology advancement.

Demetrios [00:43:23]: All right, so that's awesome. Now I always wondered what it would be like to work at Uber. I am not the big company type. I think I would get fired very fast. HR would have a problem with the things that I say. But I want to know like what do you like working at Uber and what do you not like about working there?

Kai Wang [00:43:48]: Sure, I think what I like first thing first, the the engineering team, the Mic angel engineering team is truly world class move really fast, highly capable, they can get shit done, let's put it that way. And incredibly collaborative. Very supportive of your PM work. As a product manager, I could not ask for a better engineering partner. That's the most thing I like. That's the thing I like most about working in my current position. Secondly, personally I do believe product management is the best job in the world if done right. If done right.

Kai Wang [00:44:32]: And I'm deeply passionate about AI and ML ops. So my current job is a perfect combination of both. So I really enjoyed every single minute for the past four years at Wolver. What else? Yeah, since as I mentioned, since all the Uber's machine use cases, AI use cases are managed at Micanjo, it gives me this front row seat, right to see how machinery is actually driving business impact across Uber. So from pricing to recommendation ease to fraud detection, all the way to geni chatbots, it has been a fantastic experience. I want to say. Yeah, Sam.

+ Read More

Watch More

Everything We've Been Taught About ML is Wrong
Posted Jul 17, 2023 | Views 569
# LLM in Production
# Machine Learning
# Anthropic
ML Drift - How to Identify Issues Before They Become Problems
Posted Dec 08, 2021 | Views 553
# Monitoring
# Presentation
# ML Drift
# ML
# Fiddler
# Fiddler.ai
Everything Hard About Building AI Agents Today
Posted Jun 13, 2025 | Views 163
# Production failure
# AI system
# Observability