MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Building Hyper-Personalized LLM Applications with Rich Contextual Data // Mike Del Balso // DE4AI

Posted Sep 17, 2024 | Views 1.1K
Share
speaker
avatar
Michael Del Balso
CEO & Co-founder @ Tecton

Mike is the co-founder of Tecton, where he is focused on building next-generation data infrastructure for Operational ML. Before Tecton, Mike was the PM lead for the Uber Michelangelo ML platform. He was also a product manager at Google where he managed the core ML systems that power Google’s Search Ads business.

+ Read More
SUMMARY

In the era of AI-driven applications, personalization is paramount. This talk explores the concept of Full RAG (Retrieval-Augmented Generation) and its potential to revolutionize user experiences across industries. We examine four levels of context personalization, from basic recommendations to highly tailored, real-time interactions. The presentation demonstrates how increasing levels of context - from batch data to streaming and real-time inputs - can dramatically improve AI model outputs. We discuss the challenges of implementing sophisticated context personalization, including data engineering complexities and the need for efficient, scalable solutions. Introducing the concept of a Context Platform, we showcase how tools like Tecton can simplify the process of building, deploying, and managing personalized context at scale. Through practical examples in travel recommendations, we illustrate how developers can easily create and integrate batch, streaming, and real-time context using simple Python code, enabling more engaging and valuable AI-powered experiences.

+ Read More
TRANSCRIPT

Link to Presentation Deck: https://docs.google.com/presentation/d/1zi3N65ac_LQsE3v7xGLj4SJ2FlWWHyGs/edit?usp=drive_link&ouid=103073328804852071493&rtpof=true&sd=true

Demetrios [00:00:08]: Making sure you're on your toes there.

Mike Del Balso [00:00:10]: Hey, we're wearing the same path for a dollar. Yeah, I am, dude.

Demetrios [00:00:17]: Full real estate here. I am branded by the team. I know you got a presentation, right? I feel like you got to share your screen, maybe, and then we're going to get to it. Uh, so in the meantime, in the meantime. Oh, there it is. So I'll get cracking, we'll throw this up, and I'll be back in 2025 minutes. Anyone that wants, feel free to add your questions in the chat. And as Mike knows, I think you've seen this one, Mike, but I'm going to share it anyway.

Mike Del Balso [00:00:54]: I can't see you. You can't see you right now because I'm doing this.

Demetrios [00:00:57]: Oh, yeah. Well, there's a QR code for our shirt that is, I hallucinate more than chat GPT. It's on the screen. And if anybody wants to grab that, there's the QR code for it. And help out the community in the meantime. But we've got Mike from Tekton coming up.

Mike Del Balso [00:01:17]: Cool. Thanks, djdeh. Awesome. Appreciate it. Hi, everybody. This is really cool to have this conference, by the way. This is something that, this concept of data engineering for AI is something that we've been thinking about forever and all the way back to just like, a little bit of background on me. So I'm one of the co founders and I'm the CEO of Tekton.

Mike Del Balso [00:01:40]: Tekton is all about getting better data. So your models into your models, your models can be better, and it's all about bridging the gap between your business's data and your models. And I'll talk a little bit about some of that with respect to personalization today. But it's really cool. It's just super cool to see, like, that. There's a whole conference that we're talking about this now, and it's a packed conference and a lot of people here. So this is definitely a topic and aligned with kind of like, honestly, like the founding insight of Tektone many years ago. Getting this right is really like the bottleneck to getting business impact from AI.

Mike Del Balso [00:02:21]: You know, getting AI past the finish line into production, and then your company is getting value from it. So we find that people are typically, like, blocked on, hey, how do I, like, use the, my, right, use better data for my models and people's data company's data is really their kind of like their competitive advantage, their proprietary secrets that they have that their costs that their competitors don't have. So it's critical to get that well connected to the model. So your model is not a lowest common denominator, doesn't have lowest common denominator type of behavior. So today what I'm going to talk about is actually an architecture that we see a lot of our customers starting to use to figure out how to combine LLMs with personalization and all of the data flows that are involved in doing that. So I'm going to talk through that and I would love to answer questions along the way, but I just can't see any of the chat right now. So I'll do that afterwards. So, yeah, I guess the first thing to mention is this concept of personalization.

Mike Del Balso [00:03:27]: So there was a consulting report a couple months ago and it mentioned that the expected impact on global GDP by 2030 is of personalization is $5 trillion. So that means $5 trillion of value with a t, not a b, is going to be created by better personalization over the next five years. Next six years. That's crazy. How is that going to happen? It's not going to happen by like slightly better ads or slightly better recommendations on Amazon. There's going to be some qualitatively different thing that's happening and like just way different tier of personalization, of allowing the AI to get to know you. And we're gonna, as an industry, we're figuring out how to do that. I'm gonna talk about one of these architectures right now.

Mike Del Balso [00:04:14]: So as a motivating example, let's imagine we are trying to build a new startup. It's a travel website. We're trying to crush booking.com, for example. And so the idea for this startup is that we want, the whole idea is you go to this website and boom, it gives you the best recommendation. How can we build this? Right? It's just, that's the, that's the product. You load it up and this is what you should do. Well, you know, we want, we could ask chat GBT, hey, where should I go, you know, behind the scenes? Ask chat GBT, where should I go on my summer vacation? And what that's going to give us is kind of like a bland answer. Go to Greece, basically.

Mike Del Balso [00:04:50]: And then it gives us a bunch of, you know, like kind of standard stuff that you would expect. The problem is, who knows if I care about Greece, maybe it has nothing to do with me. Maybe that's not the right recommendation for me. But what chat GBT or any LLM is able to do is act kind of like an expert. JackGBT knows a lot about different places and stuff, but it doesn't know you, right. So it's got low context, but it's got high expertise. That's why it's telling you to go to Greece. What we're really looking for, the type of product we want to build, is something that's like your best friend and a travel agent, something with high context and high expertise.

Mike Del Balso [00:05:31]: So we would want this product to be able to say something like, hey, you know, we know you love rock climbing and you're really into desert, desert places. You should go to. You should go to Utah. There's a special thing going on there, and there's a sweet national park there. That's a way more personalized recommendation. And that could be the difference between delivering the left or the right can make or break your product, make or break your business. And it's my contention that every company has things like this, has potential product experiences that could be qualitatively up leveled if we are able to help the company build the red guy instead of the blue guy. And so let's talk a little bit about how we can do this.

Mike Del Balso [00:06:16]: How can we do this? Right? Is this about fine tuning? Do we need to fine tune our model? Probably not. This helps improve the models, like intrinsic knowledge, but it doesn't really help the model know you better. What about prompt engineering? Yeah, we should totally do that. Prompt engineering is just asking the question a different way. So it's good to do, but it's not going to actually solve this problem. Might help us do a little bit better. What about rag? Well, what is Ragdez? Reg is really just allowing the model to go out and get more information and then reason based on that information. So this is intuitively the kind of thing we want.

Mike Del Balso [00:06:54]: But if we look at what a traditional or naive rag architecture is like, this is just the most basic thing. You'll see it doesn't really solve the problem for us. So, okay, we get some query from the user, we embed. It goes to a vector database. We do a similar research on the embeddings of, we get some somewhat similar items out and then we pass those to a model, and then the model makes a recommendation. There's a couple problems here. So these are these candidates, right? These are all the like things. These are cities, the destinations, candidates that come out of them, out of our some process and then are used for consideration for our model.

Mike Del Balso [00:07:35]: And so there's a problem here. What we're doing is we're injecting all of these candidates into a prompt so this is how we give it to the model. Right. They're used as context to enhance the prompt. So the prompt is, you're a travel advisor. You know, be an expert at making travel recommendations. Choose from among these different, you know, destinations. That's nice.

Mike Del Balso [00:07:59]: And then that becomes some super prompt that goes to the LLM, and then the LLM gives us right now gives us kind of a crappy answer. So how do we improve this? Well, we need better context, and context is really the relevant information that allows a model to understand and reason about a situation. So instead of just having this kind of lame data, which, when we talk about Paris, the model is only going to be reasoning about what it is memorized about Paris in the past, we can enrich this candidate with a lot extra. Of extra context. What is the weather there right now? What do people do there? What's the cuisine? What's the safety level? What's the budget? These are basically features, but just all kinds of structured or unstructured data that helps provide additional context for these candidates. Then we can take these enriched candidates and pass them into the model. The reason we can't have, we can't just have the model know all of this ahead of time is because this data changes, and some of it may be new since the model was trained. So we need to actually pass it into the model at reasoning time.

Mike Del Balso [00:09:10]: Right. As inference time. And so we're. So now we've got a better situation. Right. The model is going to get, like, richer candidates to reason over. We have to figure out how to join in the destination data in with the candidates. So we're going to do some context enrichment, and then we have this kind of better prompt with destination context.

Mike Del Balso [00:09:34]: It's going into our model, and then we get a little bit of a better recommendation. Right? Hey, well, the Olympics are in Paris this summer. You should go there. That's a good recommendation for what you should do this summer. That's cool. But it's not. Not really personalized for me, though. Like, maybe.

Mike Del Balso [00:09:51]: Maybe I don't care about sports, right. Maybe that's like, maybe I really don't like the Olympics and don't, and don't care. So how do we get this to, like, be personal to me? Well, personalization requires contextual user data, so we also need to get this, like, context about the person and about the user. What are their preferences? What are their interests? What is their history? What have they done in the past? What are some facts about them, like their budget and stuff like that, right? I. We can make this however long as we need to. If we can then inject that into the prompt. Here's the things that we're considering, and this is the information and what the user is looking for, then the model can reason much better. It can have this prompt with destination information and user data.

Mike Del Balso [00:10:38]: But that means we have a little bit of a different problem, like a bigger problem now, because now we got to integrate with a vector database. We got to connect to our destination data could be in wherever that is in our data lake, for example. We have to connect to all the user data and bring this all together. But then we get a much better recommendation. Hey, Utah is perfect for you because XYZ. And the moral of the story here is the higher the quality of the context, the better the quality of the response. That's really the name of the game. How do we get the highest quality context to our models? And if we do that, our application will have, will go from something very lame to something really specific.

Mike Del Balso [00:11:23]: And in this example, we're able to say, hey, at this moment this class opened up right close to where you are right now, and it's a last minute opening. So nobody could have known about this before. And it's just a very unique opportunity, and it's something that aligns with your interests. That's like a really cool recommendation for what you should do right now. A type of opportunity, type of product experience that wouldn't exist, that doesn't really exist on any other travel websites. So what's going on here though, right? How do we do this? Well, a lot of this is like a familiar problem, right? We just looked at a video about how hard it is to add features to models, to predict ML models. All of those problems are still the same in this world. We need to ingest data, we need to transform that data and extract the relevant signals.

Mike Del Balso [00:12:13]: We need to store it for later use. We also need to retrieve it at inference time in an efficient way, at scale with a level of reliability that's going to make sense for our product. And we need to join all of this stuff together into a nice prompt. It's a lot of work, and it tends not to be a one person type of job. This is a thing the industry is trying to figure out, like who does what parts. If we're trying to build like an actual productionized AI application that has all of these components and really high level of personalization, how do we actually do this? And to what extent does it get more and more complicated as we try to make the context better and better. So, you know, the problem we're trying to solve is how do you allow the context and the quality to get better, but keep the difficulty of implementing or the cost of implementing low? And so how do we build these amazing personalized contexts without needing a gigantic engineering team? Because most companies don't have a gigantic engineering team. This is not the case if you work at Google or Meta or something like that.

Mike Del Balso [00:13:16]: But for everybody else, it's a real problem. So I want to talk about four levels of context personalization today. The first, and they're going to be like, increasingly moving us along that path. Right. The first is kind of a cheat. It's no context. This is what we were talking about. We passed the candidates long, we get a bad recommendation.

Mike Del Balso [00:13:35]: This is where it goes. I don't know. Go to Greece. I don't really know anything about you. That puts us right down in this bottom left corner. Pretty bad responses. So now let's go to level one. Level one is where we integrate a certain category of data.

Mike Del Balso [00:13:50]: That category of data comes from batch data sources. So this is like stuff that you could have known from historical data. So personalized insights, your past behavior, your profile data, your preferences, stuff like that. And in this case, we'll connect to a data warehouse. We'll look at your trips history, your interests that, your preferences that you stated, what activities you favorited in the past. And we have this problem where we got to still do this context construction. We have to join this data warehouse data at the right way with the vector database stuff to output the right prompt for the personalization. And this is hard to do.

Mike Del Balso [00:14:29]: This is a whole other data engineering challenge on top of the kind of like typical feature engineering type of stuff that folks have done traditionally. And a lot of the same problems exist. We have to build these data pipelines to retrieve, serve, join this data from warehouses. And we got to also be able to evaluate this stuff. So we got to generate data sets that work historically that I can backfill. So I can say, hey, how accurate would this new, how accurate is my product now if I incorporate this different data so I can evaluate the product. So it's tough to do. I'm going to walk through, like, one of the things actually just backtrack here.

Mike Del Balso [00:15:11]: One of the things that we do at Tecton is make it really easy for people to do this, make it really easy for an individual team, one or two people, to build this type of workflow without needing a gigantic team of engineers or production application engineers. And we do that by providing an SDK to let you very simply just write a couple of lines of python to define the signal or the snippet of data and the transformation that you want involved in your context, then we make it easy for there to be one line for you to create this eval dataset from it. Give me all the historical data for evals, one line to push it all to production so it's production ready and it's serving in real time to your models, and you can throw, you know, 100,000 requests per second against it or whatever your business needs, and it'll be, you know, with the right level of reliability, etcetera, and then one line to actually read that data in real time. So at inference time for you to retrieve all of this stuff. The whole point here is to provide a really slick user workflow, very simple workflow, to build all of the data pipelines, to power a highly personalized application, and under the hood to manage all of the infrastructure. So there's very little overhead and operations management and the whole concept of like ML Ops or LLMops or data ops for ML or AI, all of this stuff, get that all out of the way and make it highly abstracted. So you just, literally just writing the snippet of Python code when you do that, then your recommendation can be much better than go to Greece. You can say, hey, since we know you like going to places with deep history, we recommend going to this ancient city, the ancient city of Kyoto.

Mike Del Balso [00:16:57]: It's a better recommendation. So that brings us to level one. So how do we go to level the next level? Well, we want to go to over here, we're going to incorporate a different kind of data. Now this is streaming data. Streaming data allows us to make use of recent user activity. So what are the current interests of that user? In which ways have they recently interacted with our website? That allows us to capture a signal of intent. But this is really upping the game on the complexity of the system. Now we're connecting to streaming data, laka pipelines, for example.

Mike Del Balso [00:17:38]: We're processing streaming data, so maybe we're connecting to the interactions and sessions. We want to look at trends. What videos were you just looking at? What were you just searching a couple minutes ago kind of thing. This is great. And then we can have much better contextualized candidates here so we can get a much better recommendation. But it's also super challenging. First of all, it's to the shame of the data engineering industry. It's really hard to build streaming data pipelines.

Mike Del Balso [00:18:10]: Still and operate them and maintain them for production. This is kind of like an embarrassment for everyone who works in the streaming domain, but it's the truth. It's really hard to do this, and one of the reasons it's very scary to do it is because when you work with data at scale, it's very easy to waste a lot of money. And so you have a very big performance to cost trade off implication. And if you're the guy trying to figure out the LLM, you're probably not a streaming engineering expert, and that can slow you down and open you up to some big risks of spending a lot more money than intended, and you don't want to blow your company's budget accidentally. So all of these are problems we help with at Tekton. I'm going to go through fast through this example, but we'll sure the slides will be available later. But the idea is the same thing.

Mike Del Balso [00:19:02]: We want to build a streaming feature. Cool. Write a couple of our streaming signal, write a couple of lines of python. This feature helps us determine what topics the user watched a video about in the past hour. That's great. That's a very relevant signal for personalization. And then every other part of the step, part of the workflow is the same. One line to create eval data set, one line to production eyes, and one line to create inference.

Mike Del Balso [00:19:31]: To pull the data for inference. It's the exact same. So the idea is just like keep it really simple, hide all of the complexity, and make it really fast to just kind of get the data engineering stuff out of the way. When you do that, you get much higher quality recommendations. Right. We've included, hey, what locations have you recently viewed? Well then now we can say, hey, well, since you've been looking at Japan and you've been looking at fine dining stuff, we have another more relevant recommendation for you that gets us to here. And obviously we want to go up here. What's the last thing here? The last thing is incorporating real time data.

Mike Del Balso [00:20:15]: Real time data is about looking at where the user's phone is right now. What can we learn about their current context and the inputs? They give us both the user context themselves, but also other information we can pull from other sources, like given that the user's there, what's the weather in that location like right now, what's the traffic like in that location? And we built all of the data that AI for sharing. For example, adding real time traffic information to an ETA model is hugely important for the accuracy of these types of systems. And so incorporating real time data gets a really good signal of the intent the user has and allows the model to give a much better recommendation. Way harder, though. We have to build and maintain real time data pipelines. This is kind of like code that's running at inference time in production. That's deep production engineering, data engineering.

Mike Del Balso [00:21:18]: And then we have to integrate the third party data sources, even have like a contract, maybe with the weather API or the traffic API company. And there's a whole new trade off between speed and cost. Building these things within tecton is the exact same. You write some python code. This signal determines how close is this user from this destination. Are they in the same country? That's a really relevant type of signal. And if we do that, we can get a much better recommendation. This is that like last minute, right beside you, there's a new opportunity.

Mike Del Balso [00:21:51]: Okay, I'm running low on time here, so I'm going to end with one bonus example. Which is, bonus level is level four. It's including or incorporating memory into this system. So wouldn't it be great if once we create that recommendation, the user can say, you know what, no, I want to go somewhere that's warmer this summer. Then we can take that into account as part of the history or as part of the context and pass that into the model. So the model can then reflect and react and have an iterative user experience. That's amazing. I think this is something we're working on, and we're going to be launching soon at Tekton.

Mike Del Balso [00:22:35]: But I think this is a really big. This kind of architecture is where the world is going with modern architectures for personalization. So just wrapping up, what did we learn? Compelling products require a lot of personalization, and personalization is going to be huge. Doing personalization requires great context, and that is very hard to do. There's a concept of like a context platform is something that Tecton spending a lot of time on Tecton helps you solve all of the data engineering problems with respect to building these type of architectures. And it's important to know that this applies across all types of use cases. It's not just travel recommendations, but every company has a lot of value to unlock if they are able to personalize their systems a lot more, show up more as a human, and provide a lot more of an experience that knows your customer. This is the architecture that we're starting to see across all of our, many of our customers.

Mike Del Balso [00:23:38]: This space is so early, though. I'm expecting this to change. So this is going to be an evolving thing. But I just wanted to give everybody a snapshot of where we are today. But don't forget that if you're doing this introduction, you're going to have to figure out versioning. You're going to have to figure out how to collaborate with other people. How do you control which data is used where? How do you debug this thing along the way? How do you ensure that it stays correct over time and you get alerted when it's, when it's broken. These are all things that we solve in Tecton, so we help you build AI that knows your users.

Mike Del Balso [00:24:09]: So you can give this a try. Actually, today, go to Tecton AI explorer or just shoot me a note. I'd love to chat with you. Thanks for your time, everybody.

Demetrios [00:24:19]: All right, there's a few really good questions that came through in the chat, so I want to ask them real fast. The first one is about how can we make sure more weightage is given to the recent data of the users and various real time features, data perhaps coming within a specific timeframe compared to the previously fed batch data.

Mike Del Balso [00:24:47]: Good question. There's a step that I didn't talk about, I didn't really show here, but it's this step of integrating all of that data into the prompt that you passed the LLM. And that's very much an open problem. We're doing a lot of research here to figure out what is the best way to compress all of that context and literally turn it into the tokens that go into the final prompt. And in that operation you can definitely apply some weight and you can even tell the model, hey, pay closer attention to the more recent information. Here's the more recent information, but it's definitely an open problem still.

Demetrios [00:25:26]: Yeah, and kind of along those lines. How do you ensure the context retrieved from the vector DB is aligned with the personalized user context and does not contradict it?

Mike Del Balso [00:25:42]: Well, you, the information you get from the vector database can be pretty lightweight. You can keep the vector database pretty lightweight, and you can override it with more accurate or more trusted data that you get from your first party sources.

Demetrios [00:25:58]: Okay, last one for you. Can I remove recommendations that are not relevant? You mentioned before you push the data to production. Can we mitigate data overload or the data doesn't remain there for the unlimited time?

Mike Del Balso [00:26:13]: Yeah, I mean, this is just like a template architecture. And so what I've shown here is, I mean, not on this slide, but you know, the thing I was showing a second ago is an ideal state architecture, but everyone who applies this stuff in production has their own nuances. They got some weird thing that happens in the data source side or actually their product then uses these recommendations and it feeds into another model and does something different. I've shown the templated version of this, but you can apply whatever filters you want along the way.

Demetrios [00:26:50]: This is like a composable system and cold start problems.

Mike Del Balso [00:26:55]: Cold. Okay. Yeah. So one of the flaws of this whole talk that you just said is like, what if you just don't have this data? Right? If you don't have this data, it's really tough. But most people who are like enterprises who are trying to actually use LLMs and add personalization to their products, they've had a data warehouse for a long time and they've collected a ton of data. And it actually is a product problem to collect more data. You actually want to have your product in some way. Encourage the customer to provide you more data so you can accumulate a better set of context about that user that you can use to enhance the customer experience later on.

Mike Del Balso [00:27:40]: We help a lot of our customers do that part, too. But the, but, but that's a very important step. There's not, there's not like a magic, you know, source of user data that anybody can provide for you.

Demetrios [00:27:53]: Excellent. Well, Mike, this has been awesome, man. I really appreciate it. As always. I'm rocking the Tekton hat today as you are. I'm going to throw it backwards now that we're done.

Mike Del Balso [00:28:04]: Yeah. So thank you for having me. I'm really happy about this conference. This conference is awesome. So I'm going to be sitting by my computer watching for the rest of the day.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

Building LLM Applications for Production
Posted Jun 20, 2023 | Views 10.8K
# LLM in Production
# LLMs
# Claypot AI
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io