MLOps Community
Sign in or Join the community to continue

Real-time features, AI search, Agentic similarities

Posted Dec 28, 2025 | Views 4
# AI Search
# AI Agents
# Zipline AI
Share

Speakers

user's Avatar
Varant Zanoyan
Co-founder & CEO @ Zipline AI

Building out the Data Platform for ML at Airbnb. Making it easy to serve tens of thousands of realtime feature to hundreds of models across every surface area of the product, while maintaining observability and governance.

Previously helping big companies solve tricky data problems at Palantir.

+ Read More
user's Avatar
Nikhil Simha
CTO @ Zipline AI

Nikhil is CTO and co-founder of zipline.ai. Before that, he was a Senior Staff Engineer on the Machine Learning infrastructure team at Airbnb - where he built and open-sourced chronon.ai. Before that, he built a stream processing scheduler called Turbine and a stream processing engine called Stylus that powers real-time data use-cases at Meta. Nikhil got his Bachelors degree in Computer Science from Indian Institute of Technology, Bombay.

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Feature stores might be the wrong abstraction.

Varant Zanoyan and Nikhil Simha Raprolu explain why Cronon ditched “store-first” thinking and focused on compute, orchestration, and real-time correctness—born at Airbnb, battle-tested with Stripe. If embeddings, agents, and real-time ML feel painful, this episode explains why.

+ Read More

TRANSCRIPT

Varant Zanoyan [00:00:00]: Like streaming, aggregation, windowed aggregation. A lot of the compute stuff is where we were like, oh, man, once you look a level deeper, it's like you kind of see that it never quite got there. And so we were like, okay, we are the only ones that have practice nuts.

Nikhil Simha Raprolu [00:00:25]: Feature platform is like how it started out, but it got like all kinds of usages. Now, generally, anything that goes into online metrics or rule engines or into context, like these companies like Airbnb strive.

Demetrios Brinkmann [00:00:39]: So the feature platform idea, I know there's like a few different ways of explaining it. How do you guys explain what the feature platform is?

Nikhil Simha Raprolu [00:00:50]: Warren does a great job.

Demetrios Brinkmann [00:00:53]: He's the pro.

Varant Zanoyan [00:00:54]: I'll try. Yeah, yeah, No, I think there are different definitions of it. And I think the first thing I like to say when it comes to Cronon, one thing that Cronon does differently is that it, you know, it does the compute. So there's like feature stores that leave the compute onus on the users and then they'll like, do storage more as like a classic KV store with some stuff around it. But what we saw and the reason why we built Cronon is that compute was really the hard thing that people struggled with. Like, if you want streaming features, if you want to be doing streaming aggregations, if you want to be doing online serving and offline training of, you know, offline training and have consistency between those two things, you gotta be computing your features a very. So for us, what the feature platform is, is takes raw data in, produces features, serves them online, serves them offline for online for inference, serves them offline for training, does both of those things in a very scalable way and exposes a very easy API to the user to do that to power those flows. And so that's the scope that we take of feature platform.

Varant Zanoyan [00:01:52]: And I think that's what made Cronon unique. And I think that's why kind of the open source has gotten some good adoption is because it kind of hits on the main pain point that people face.

Demetrios Brinkmann [00:02:00]: Yeah, and that was my big question, because I know that there's the feature stores that do the compute. They don't do the compute. They don't even store anything. They just will help you orchestrate things. And so you have all of those different iterations of came from Airbnb. What made you want to build it there?

Nikhil Simha Raprolu [00:02:22]: When I joined Airbnb, there was already the earliest version of Cronon called Zipline that like Orant and Brad and the other guy built.

Demetrios Brinkmann [00:02:30]: I remember reading that blog too. By the way. And I was looking for. There was no author on the blog.

Nikhil Simha Raprolu [00:02:36]: And I was.

Demetrios Brinkmann [00:02:36]: I wanted to bring you guys on. And I was so pissed because I'm like, who created this? And me being in Germany, I couldn't just, like, ask anybody, you know, So I was going nuts because I wanted to have, you know, you on the podcast or at least on. Back in Those days, in 2020, we were doing like the virtual meetups.

Nikhil Simha Raprolu [00:02:56]: Oh, yeah.

Demetrios Brinkmann [00:02:57]: And it was like, this is a great blog. I want to know more about Zipline, because Zipline did everything right. It wasn't just the feature store.

Nikhil Simha Raprolu [00:03:03]: Yeah, yeah. So when I joined, I think the very first meeting I was in was like, there's this, like, fraud team sitting in the room and they're like, hey, we need to be able to engineer features and test them out in prod very quickly. Because the nature of fraud is adversarial. Right. Like, someone figures out a way to overwhelm the systems and, like, bleed money. And that was a big problem back then at Airbnb, and that was a big focus. So that's what, like, Cronon was. The early versions of Kronnon were built to solve that.

Nikhil Simha Raprolu [00:03:30]: Like, to make it all fraud. Yeah, yeah.

Varant Zanoyan [00:03:33]: People think of Airbnb as a travel company, but the fact of the matter is it moves a lot of money across a lot of borders. So it looks like, I mean, there's a whole payments company within Airbnb, which is kind of from the outside, you wouldn't necessarily think that. And so all those payments challenges that people have around payments fraud and trust and all those things, you know, Airbnb had those exact same challenges.

Nikhil Simha Raprolu [00:03:52]: Basically. That was the nature of the beast. They needed to be in a position where they can fight fraud in a matter of hours. But people were trying to do a bunch of things. They have to figure out the batch processing side of it, work with Spark, Then they have to figure out the stream processing side of it. Then they figure out how to index it correctly so that they can serve it, you know, at scale. Then they have to figure out how to orchestrate it, you know, using airflow. You know, it's pretty mind bending just to get one signal all the way through to the application, and they have to do it with hundreds of signals.

Nikhil Simha Raprolu [00:04:29]: All right, so imagine like the complexity and the number of components that would be involved.

Demetrios Brinkmann [00:04:33]: And when you say signal, it's like basically a pipeline.

Nikhil Simha Raprolu [00:04:37]: Yeah. It's like a piece of information, like how many times a user used this credit card in the past or Are like, use this IP in the past X days. It's a really simple piece of information. But like to engineer this to the application and to the model training process, they have to like write massive pipelines. Yeah, right. And hundreds of signals is like the lower end of things. It's usually thousands of signals.

Demetrios Brinkmann [00:05:00]: Yeah, yeah. These models are very greedy.

Nikhil Simha Raprolu [00:05:03]: Oh, yeah.

Demetrios Brinkmann [00:05:04]: They need all the signals they can get.

Nikhil Simha Raprolu [00:05:06]: They get better, you know, the more you give it, the better it gets.

Demetrios Brinkmann [00:05:09]: That then became Chronon.

Nikhil Simha Raprolu [00:05:11]: That then became Chronon. Yeah.

Demetrios Brinkmann [00:05:12]: And you bolted it on to Zipline.

Varant Zanoyan [00:05:15]: We're. I mean, we're largely taking Kronnon and building Zipline around it.

Demetrios Brinkmann [00:05:19]: So basically the story is that you, you built Kronnon, you spun that out, or you open sourced it from Airbnb, which I feel like Airbnb has open sourced other stuff in the past.

Nikhil Simha Raprolu [00:05:32]: Yeah, Airflow.

Demetrios Brinkmann [00:05:33]: Yeah, Airflow. That's. I knew there was some big data thing that came out of Superset also. Oh yeah, Superset. Both from Maxime.

Nikhil Simha Raprolu [00:05:42]: Yeah.

Varant Zanoyan [00:05:43]: So just to clarify, the Cronon open source was actually in collaboration with Stripe. So before we open sourced, we had like a close partnership with Stripe, again, another payments company basically, and we collaborated with them on the repository for six, eight months or so before going open source. So it's a jointly managed between the two companies.

Demetrios Brinkmann [00:06:03]: Well, it makes sense if it's very heavy on the fraud part, that Stripe would be interested.

Varant Zanoyan [00:06:09]: Yeah, yeah, yeah. And then actually the journey with an Airbnb after payments is kind of interesting because search was the next big one that came along. Right. So search personalization is another use case that's very sensitive to real time data, has very, very high volume and can move the needle on the business. So it's like worth pushing that forward if you can. Um, and so that was kind of like the next big partnership that we had. And what was interesting is that once we were kind of working at search scale, we saw other interesting use cases within the company just start organically popping up. So like if you go to Airbnb and you see any listing, you'll see like a 4.8 stars average or whatever that's computed on Cronon.

Varant Zanoyan [00:06:44]: Because that team basically wanted that to be real time. You know, the same way that fraud and search want their signals to be real time and they wanted it to be like a windowed aggregation so that, you know, really, really old things fall out of the window and you see a more recent up to date view of that data to the user. And so it's the exact same data engineering challenge that ML faces, but just as a product facing, you know, metric. So they were like, well, Cronon's a great tool for this. And it, like, simplified their life from running this crazy Flink job to simply saying, hey, here's my data source. I want an average over this time window, and I want to, you know, plug it into this thing here. And it went from months to, like, a couple of days to get that up and running.

Demetrios Brinkmann [00:07:20]: How did they find out about Cronon? They just were hearing others use it because I imagine things can get lost in an organization that big.

Varant Zanoyan [00:07:27]: That was very interesting for us. As Airbnb got bigger, we no longer talk to everyone.

Demetrios Brinkmann [00:07:31]: Yeah, I believe it.

Varant Zanoyan [00:07:33]: Yeah. Word gets out. I mean, you know, you struggle with the data engineering stuff enough, and word gets out to like, hey, there's this thing that'll make your life easier. So that was really cool for us to see. We were happy with that.

Nikhil Simha Raprolu [00:07:42]: I think the search use case succeeding played a big role in them finding us.

Demetrios Brinkmann [00:07:47]: Okay.

Nikhil Simha Raprolu [00:07:48]: Because it was such a tough use case at Airbnb, like, a lot of people thought, like, it wouldn't work on Cronon, like, very easily, but the fact that it worked, like, was very surprising and very impactful inside Airbnb, so. And Homes was very. Or like, the team that did that use case was very close to search.

Demetrios Brinkmann [00:08:06]: Yeah. Okay. Search is so funny to me because a lot of the stuff that we're doing now with AI and all of the agents or rag, it's like, this is a search problem. Even the context engineering now, it's like, it feels like there's a lot of search going on here, and you gotta be good at search in some way, shape or form.

Varant Zanoyan [00:08:30]: Yeah.

Nikhil Simha Raprolu [00:08:31]: Search is one of those things, like, where the number of signals is, like, in the order of thousands, and search tends to be very complex. Some of these things include sequence modeling, LLMs, like typical vector indexing for embeddings, all these kinds of things in, like, one monolithic system. Right. The more sophisticated search setups look like that, and data engineering for that means, like, at least 200 or 300 mini pipelines all together, pushing data into vector search.

Demetrios Brinkmann [00:09:07]: Yeah. All right, so back to the story. You went. You open sourced Cronon with Stripe, and then you were like, I guess it's time that we start a company and go all in on this, huh?

Nikhil Simha Raprolu [00:09:21]: So we saw Stripe use it, and it wasn't very easy to get set up with Chronon, and they already had expertise with Flink. And Spark and all of these other technologies and it was still hard for them. So, like, okay, if someone wants to use it, there needs to be a lot more work.

Demetrios Brinkmann [00:09:39]: Yeah.

Nikhil Simha Raprolu [00:09:39]: And like, if someone who's not familiar with any of these big data technologies, it's going to be very hard for them to like use Chronon. At the end of the day, like, Cronon does a good job of abstracting these technologies away from the user. Right. Users write SQL, like stuff, and under the hood, all of this is happening. Like there's Spark, Flink and BigTable. And like all of this technology has come up like even airflow, right. They don't have to even worry about airflow or like any of these technologies, they just like write their queries and these things happen automatically. It does that job.

Nikhil Simha Raprolu [00:10:11]: But like, to set it up, you need to know how all of those things work.

Demetrios Brinkmann [00:10:15]: Of course, because before you abstract it away, you've got to have a deep understanding of it, I imagine basically it's.

Varant Zanoyan [00:10:21]: A technology that can help a ton of different teams out there, we think, but right now it's like there's a lot of need to. The role of the business partly is to productize it and make it very easy to adopt. And yeah, I mean, I think it's great that it has like this open core around it so that, you know, teams can adopt it with us. But then they have that like open source safety net there, which means like no vendor lock in and kind of that sort of easier to work with business model for larger enterprises.

Demetrios Brinkmann [00:10:49]: Thought you guys were crazy when you came out and you were doing this. I was like, but Feast is out there, which is kind of similar space. And I'm not saying you're crazy because you're competing with Feast, but. But because of what happened with Feast, where it felt like it got bastardized a little bit and then. And then also Feather, same thing like LinkedIn released Feather and that also got bastardized.

Varant Zanoyan [00:11:19]: So. So we did this, we did this exercise within Airbnb, right? Because we don't want to just be open sourcing anything for any reason. We were like, okay, what, you know, what is this doing in the market? What graph is this solving? And really what it came down to is kind of what we started off with, which is the compute stuff, right? The compute, the hardest part of actual data engineering challenge for AI and ML. Feast is an example of something that adds a lot of value kind of downstream of that, but it doesn't solve that main problem and never tried to. Feather is really Interesting. If you look at the documentation for it, it talks about all those problems, but then if you go and look at the implementation, there's a lot of to dos that never really got done, like streaming, aggregation, windowed aggregation. A lot of the compute stuff is where we were like, oh man, once you look a level deeper, it's like you kind of see that it never quite got there. And so we were like, okay, we are the only ones that have cracked this nut.

Varant Zanoyan [00:12:10]: And so we need to be out there. And this is a very valuable sort of contribution to the community.

Demetrios Brinkmann [00:12:14]: Yeah, I guess Feather, they released it and they wanted to see if there would be any uptick and people using it and how that would. And when they. I just don't think they got what they were looking for.

Varant Zanoyan [00:12:26]: Yeah, I think they had the right vision. I mean, you read that documentation and I think they hit the nail on the head on a lot of stuff. But I think when they got out there, there just wasn't enough of it there working really well. I mean, it took us a long time.

Nikhil Simha Raprolu [00:12:37]: It took us a long time.

Varant Zanoyan [00:12:37]: Really well, it's hard. I mean, I see why it never got. It was a grind for many, many years. And then I think the opportunity to work with Stripe before getting out there into the open source gave us a chance to really battle test it at a whole other company at scale and iron out all these things. And so by the time we went open source, the dream was real as opposed to. I think Feather had the vision, but I think by the time it got out there, from what we saw, the implementation was incredible. Quite there. But yeah, I mean, I don't mean to talk bad about any of these tools.

Varant Zanoyan [00:13:05]: Like, they're very impressive.

Demetrios Brinkmann [00:13:06]: No, I don't think you're talking bad. If anybody's talking about it, it's me calling a bastard. Jon Snow over there.

Varant Zanoyan [00:13:14]: Yeah, I mean they're, they're all like, they're all very interesting technologies, I think, I think we were unique in that. Like, we took our, like, we took the time to like, really make sure we nailed the hardest part of the problem. And I think that's what was unique about our open source.

Demetrios Brinkmann [00:13:28]: And so why was it so hard to get that working?

Nikhil Simha Raprolu [00:13:33]: There is a series of problems. Right. I think, like. But I'll maybe talk about the two main ones, I suppose. One is generating training data that is point in time. Correct. Right. So people already have a series of like, observations and labels associated with those observations.

Nikhil Simha Raprolu [00:13:53]: Like, I swiped my card like way down the line, you know, Someone said, this is fraud, you know, cloud money back through a chargeback or whatever. So they have these observations, logs of observations. All companies have these. And they're like, okay, when these observations happen, we need to create features at those points in time. So when I say point in time, correct training set. Like I have a new feature idea and I need to create those features at the observation point in history. That is a N cube problem. Essentially, if you do it the NIO SQL way, if you write a SQL query saying give me all the raw data at that point and then aggregate for every one of those observations, that blows up very quickly.

Nikhil Simha Raprolu [00:14:35]: And that's what people are doing. And they're like, okay, this is blowing up and I'm not going to figure out how to make this fast. I'll just use batch features.

Demetrios Brinkmann [00:14:43]: That's true. That's why real time is so hard.

Varant Zanoyan [00:14:45]: So hard. And if you put yourself into the user's shoes, right, it's like, okay, I understand my data and I understand the transformation I want to run. And then it's like to express that you have to learn like flink, Spark, I mean all this stuff because like those technologies fundamentally were not built for these use cases. It was really like if you look at the history of batch, like ol olap, it's like, it's pretty much bi. I mean it's like, like Spark and, or Hive for, or you know, MapReduce and Hive and Spark, like all these things kind of are just evolutions of.

Nikhil Simha Raprolu [00:15:11]: BI powering pipelines and they're more geared for like you know, every day I need to generate a report that someone's.

Varant Zanoyan [00:15:17]: Going to look at the dashboard for the, you know, executives and stuff. And now it's like the actual workflows for AI and ML, they cross these boundaries of streaming and batch, of online serving, of inference and training. It's like they cross all these boundaries and no tool really took a step back and addressed that sort of user need in a first class way. Everyone started duct taping stuff together and then that's where all these nightmare data engineering projects come from. And I think, yeah, one thing unique about us is that we took that step back and we were like, okay, what if we exposed the abstraction that the user is actually trying to do? Just here's the data, here's the transformation. I need it offline in this way for model training and I need it online with these very strict requirements for low latency inference. And what if we try to make that magic? I mean, it's hard.

Demetrios Brinkmann [00:16:05]: Turns out There was a reason why nobody had done it.

Nikhil Simha Raprolu [00:16:08]: People have tried it. We have seen many, many others at this problem.

Demetrios Brinkmann [00:16:12]: There's a graveyard full of people who have tried it.

Varant Zanoyan [00:16:14]: But Airbnb was a wonderful place for us to build it because it gave. It gave us what we needed to try it a few times and make some, like, mistakes that I think if we were like, if we had turned back the clock seven years and started that startup, you know, from then, I don't think we would have had the leeway to make the mistakes that we needed to make to learn how to do this properly. So I think because we had like Airbnb as a place to get this right and Stripe as well as a place to get this right, that's what let us come out the gates with something that worked.

Nikhil Simha Raprolu [00:16:39]: Yeah.

Varant Zanoyan [00:16:40]: And that was huge for us.

Demetrios Brinkmann [00:16:41]: And it kind of felt like you could battle test it, but you could also recognize if it works in two companies that are very different, we could probably make it work in more.

Varant Zanoyan [00:16:53]: And that's huge. Right. And having the flexibility to do that was like we were in a very unique place in time to be able to pull that off. And Airbnb, of course, has the kind of culture to excellent engineering culture to kind of support that and foster that and see the value in that and invest in those long term projects, which is really rare.

Demetrios Brinkmann [00:17:18]: So now what are you thinking about now? It's the whole data life cycle, right? It's not just the feature part.

Varant Zanoyan [00:17:26]: Yeah. Oh, yeah.

Demetrios Brinkmann [00:17:27]: Because I should preface this by. You won the pitch competition for San Francisco. Congratulations. As promised, we are on a podcast. I don't have most pitch competitions. They give you like 100 grand. All this very fancy stuff.

Varant Zanoyan [00:17:43]: Keep your money. I would take this.

Demetrios Brinkmann [00:17:46]: You get a podcast. My friend.

Varant Zanoyan [00:17:47]: Let it be known Demetrios pays his debts.

Demetrios Brinkmann [00:17:49]: Yes, exactly.

Varant Zanoyan [00:17:53]: Yeah, yeah. So what are we investing? I mean, it's very interesting. Now we're out there, we're working with customers, we're seeing what the actual needs are on the ground, and we're seeing sort of where the main pain points are and how to get the most out of this technology that we started building. A couple of things are standing out. One is definitely governance, and just like data privacy, data governance, having confidence in these systems that are serving the most critical use cases at these companies. And so we're thinking a lot about that and how to make that part of the platform. And the other one is. Yeah, I mean, we've seen people kind of still figuring out more foundational pieces of Data infrastructure.

Varant Zanoyan [00:18:31]: A lot of teams are trying to move to iceberg. A lot of teams are trying to move to open source things from cloud specific offerings. And we're thinking about that too. So we're kind of following where the actual needs are on the ground. We're very plugged into trying to be plugged into real value at real use cases. And those two things are standing out to us right now.

Demetrios Brinkmann [00:18:53]: It's interesting what you were talking about with iceberg earlier, how everyone wants iceberg.

Varant Zanoyan [00:19:00]: But they're not sure how for sure. It's hard. I mean. Yeah.

Demetrios Brinkmann [00:19:06]: Where are they getting stuck?

Nikhil Simha Raprolu [00:19:07]: There's a lot, I mean, series of choices, right. It's like you have an event bus or you have a database that's online and you need to wait to transport that into your offline file system like S3 or GCS. And from there, like you need to, once you have the iceberg files or whatever, you need a catalog that you could use. Right. And depending on which cloud you're on, catalog is not a solved problem. That's a very open problem. And that needs to play well with their existing data. If they're on bigquery, it needs to play well with their existing thing.

Nikhil Simha Raprolu [00:19:40]: And you need to have an engine that can understand this iceberg file well and work with the optimizations that iceberg allows you to do. Iceberg has a huge depth of optimizations that you can plug into, which engines like bigquery do automatically under the hood. Right. Just because you have some data in iceberg doesn't mean you're getting the full benefit of having that data in iceberg.

Demetrios Brinkmann [00:20:04]: Oh, really?

Nikhil Simha Raprolu [00:20:05]: Yeah, it's not that straightforward. Like you need to be able to do clustering, do views, do all of these things that the de facto offerings on things like Google cloud simply don't support.

Demetrios Brinkmann [00:20:18]: Oh, so what do you have to do then?

Nikhil Simha Raprolu [00:20:19]: So people like try all kinds of things. They try polaris, they try gravity and all this like budding problems. They write big their own catalog and they figure out how to get their data. They move from pub sub into GCS Avro as it is and then use some ETL to process that into parquet and form iceberg out of it and then figure out this catalog thing and put it in there. We don't think a lot of people are successful with doing this kind of a dance around their data infrastructure.

Demetrios Brinkmann [00:20:52]: So then is the idea to just be like, hey, potentially there is a world where you can hit zipline and you're instantly on iceberg and it is as easy as you would hope that.

Nikhil Simha Raprolu [00:21:05]: It would be, eventually we want to get there. So there is two levels to that, I think. Like, one is like, I already have my data in pub sub. Right. You take care of this dance of getting this into an iceberg and catalog and you take care of like leveraging the optimizations. Right. Then there are people who are like, I have a service. I'm not logging anything.

Nikhil Simha Raprolu [00:21:24]: But if you tell me if there is a library that I can log into, I will do it. So there is that level. Eventually we want to get there. This is a zipline library. You log it under the hood. We log it to pubsub or Kafka and create the iceberg warehouse and you can leverage it. So that's eventually where we want to get to. But first we are trying to get to a point where you already have pub sub.

Nikhil Simha Raprolu [00:21:45]: We are going to make iceberg warehouse for you.

Demetrios Brinkmann [00:21:48]: And what about the engine? Because the hardest part is compute.

Varant Zanoyan [00:21:52]: Yeah, well, so we started there, right?

Nikhil Simha Raprolu [00:21:54]: Yeah.

Varant Zanoyan [00:21:54]: Which is kind of funny, which is a bit downstream of these problems. And now we're working upstream, which is kind of a funny journey. But I think somehow, coincidentally not through any strategic planning of ours, kind of a good way to tackle this problem actually, to start there in the middle, kind of in between your ingestion and your online AIML use cases and then kind of expand, expand from there. But I think the cool thing is also there's a lot of value and just people struggle with best practices for how to do this ingestion, write to these daily partition files in a certain way that it'll scale really well and plug in really well with downstream compute engines. So I think as we kind of put this stuff out and we'll be putting it out into open source, by the way, this will be not kind of vendor technology. It'll kind of give people an example of like, hey, we've seen this stuff a couple of times. We've scaled this from like really, really bad practices to good practices at a couple big companies. It's like we can start you off on a good path.

Varant Zanoyan [00:22:53]: And I think there's a lot of value just in that as well.

Demetrios Brinkmann [00:22:56]: Starting on the right foot.

Varant Zanoyan [00:22:57]: Yeah, starting on the right foot. Like, I can't tell you how many times we've migrated from like a huge monolithic JSON logging to like schema enforced, you know, smaller topics and things like that. And it, you know, it makes sense because to start with a monolith because it's a lot easier to do that right now, but what if it was just as easy to start with the right thing? I think that would save so much pain for teams. But anyway, hell, yeah.

Demetrios Brinkmann [00:23:18]: Yeah, I see that completely. And so it's almost like you're doing the things that are the easiest just because they're easy, even though, you know, later you're like, yeah, I'm just building my tech debt right now.

Varant Zanoyan [00:23:32]: Yeah, yeah, yeah. I think that's a. It's a game that Nikhil and I like to play. How should this work? We looked at the data engineering problem for mloh. That was the question that we asked ourselves. And the answer was, well, very differently than how it works to me. And I think with some of this, like, iceberg and ingestion stuff that we're seeing out there, it's like, well, this should be easier. And so I think there's some ideas on how to make it easier.

Demetrios Brinkmann [00:23:55]: All right, so you guys were also doing, like, agent stuff at Airbnb back in the day. Tell me more about that.

Nikhil Simha Raprolu [00:24:02]: Yeah, so the biggest thing at Airbnb that was using Kronnon to feed data into LLMs was customer support. Customer support is a big cost center for Airbnb. Airbnb traditionally employs people in US and pays a lot of money for that organization. And the cost is super high, and everything that they can save there translates to profit or translates to lower prices. So all of that was a big driver. And the biggest problem they had was populating context correctly. They already have the agent workflow figured out, and the workflow has workflow decision points which are like, either give some data to LLM and see what it says, or it is more like gather data from around the organization from different surfaces and see what the user was doing in the previous day or previous week and figure out what they might want.

Demetrios Brinkmann [00:25:11]: So features, basically, it's basically features.

Nikhil Simha Raprolu [00:25:15]: It's contest engineering is what they call it now, but it's basically features.

Demetrios Brinkmann [00:25:18]: It is true.

Nikhil Simha Raprolu [00:25:19]: So they've started to build out something on their own using LangChain and all of these, like, when they're prototyping. But they. When it. But when it came to prod it, you know, it was a natural thing for them to, like, hey, let's like, move all of these things to Cronon and have Cronon, like, drive the pipelines and the indexes and the endpoints that power this context. Right. So that's kind of where it started. And Search Agent, which I don't think is, like, out there yet, but is another one that that was like those are the two main search agent. It's like, you know, you're chatting with a agent that's helping you figure out how to book your next Airbnb and where to travel even.

Nikhil Simha Raprolu [00:26:02]: Yeah, right. So that's like the idea there. It's like a virtual travel agent. Those are the two main use case at Airbnb.

Demetrios Brinkmann [00:26:12]: And so that was also just taking features from you and helping the search features.

Nikhil Simha Raprolu [00:26:18]: And like in this one, a bit of embeddings too.

Varant Zanoyan [00:26:21]: I was going to say we should talk about embeddings as well.

Nikhil Simha Raprolu [00:26:23]: Yeah, yeah, talk about that.

Varant Zanoyan [00:26:24]: That's been an interesting one for us. And coming up more and more recently. Yeah, even in places you wouldn't expect it, like fraud and personalization and search.

Demetrios Brinkmann [00:26:32]: Oh really?

Nikhil Simha Raprolu [00:26:32]: Yeah. RAG was a big one. Right. Like in customer support, for example, user says something and this something is about a particular set of policy and like this policy related to all these paragraphs in the policy documents. And there's huge corpus of like policy documents that we need to pull from. Right. So that was all driven by rag. So you know, it's like chunk these things and embed them and store them in a vector store and.

Nikhil Simha Raprolu [00:26:59]: And then pull it out whenever you need that information. There is another set of things which is like users conversations. As the thread is happening with the host or with the support agent, the intent of the conversations keeps changing and that intent is captured as embeddings. So what you're trying to do is essentially take this intent side and match it with the policy side. There's two embeddings and you're doing a dot product to figure out what's the most relevant information that LLM needs to get to make a decision.

Demetrios Brinkmann [00:27:32]: Fascinating.

Nikhil Simha Raprolu [00:27:34]: So we were on both sides of this, but mostly on the user side because that's harder because you need to move a lot of real time data, embed it and store it in a vector index and then surface it.

Varant Zanoyan [00:27:45]: Yeah, yeah. And so this is like something that's has some recent momentum in the Cronon open source, like first native embedding support within Cronon. So it's not really features anymore, it's like features and embeddings now, which is pretty exciting. And there's a ton of value in orchestrating more complex graphs of combining features and embeddings together end to end to end workflows. Which again was pretty much prohibitively difficult before for all but the biggest, most sophisticated teams. But we're trying to make that just as easy as anything else.

Demetrios Brinkmann [00:28:14]: In Cronon, but you're not creating embeddings.

Nikhil Simha Raprolu [00:28:20]: Out of features out of raw data, basically. So we create embeddings out of message history and we treat a window of messages as a feature.

Demetrios Brinkmann [00:28:34]: Okay.

Nikhil Simha Raprolu [00:28:36]: I mean we treat that as a transformation like that windowing operation on the message stream as a feature, essentially. Andy, feed that into an embedding model. Get an embedding out.

Varant Zanoyan [00:28:46]: User activities is another good example, right? Like user sequence modeling where you take a bunch of user activities. Again you're usually aggregating like the last 50 or 100, although now bigger windows have become an interesting thing that teams want to experiment with user activities and turning that into a user embedding and using that to drive recommendations or personalized experiences.

Nikhil Simha Raprolu [00:29:06]: TikTok does 5,000, 5,000 embeddings activities into an embedding.

Varant Zanoyan [00:29:11]: Right. And TikTok has the team to build that infra. Not a lot of companies have access to that, even though they have potentially very high value use cases. And I think that's where we're offering a compelling open source based product here that can actually power that kind of use case at scale.

+ Read More

Watch More

Real-time Machine Learning: Features and Inference
Posted Nov 29, 2022 | Views 1.1K
# Real-time Machine Learning
# ML Inference
# ML Features
# LinkedIn
Real-time Machine Learning with Chip Huyen
Posted Nov 22, 2022 | Views 1.8K
# Real-time Machine Learning
# Accountability
# MLOps Practice
# Claypot AI
# Claypot.ai
Real-Time Forecasting Faceoff: Time Series vs. DNNs
Posted Apr 11, 2025 | Views 876
# Time Series
# DNNs
# Lyft
Code of Conduct