MLOps Community
+00:00 GMT
Sign in or Join the community to continue

RecSys at Spotify

Posted May 14, 2024 | Views 6.4K
# LLMs
# Recommender Systems
# Spotify
Share
speakers
avatar
Sanket Gupta
Senior Machine Learning Engineer @ Spotify

Sanket works as a Senior Machine Learning Engineer on a team at Spotify building production-grade recommender systems. Models built by my team are being used in Autoplay, Daily Mix, Discover Weekly, etc. Currently, my passion is how to build systems to understand user taste - how do we balance long-term and short-term understanding of users to enable a great personalized experience.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

LLMs with foundational embeddings have changed the way we approach AI today. Instead of re-training models from scratch end-to-end, we instead rely on fine-tuning existing foundation models to perform transfer learning. Is there a similar approach we can take with recommender systems? In this episode, we can talk about: a) how Spotify builds and maintains large-scale recommender systems, b) how foundational user and item embeddings can enable transfer learning across multiple products, c) how we evaluate this system d) MLOps challenges with these systems

+ Read More
TRANSCRIPT

Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

Sanket Gupta [00:00:00]: Hi, guys. I'm Sanket. I work as senior machine learning engineer at Spotify and yeah, I have an interesting coffee routine, so I love lattes. Okay, so I used to be hitting all these different coffee shops in New York City trying to find the best one, but then I'm like, after pandemic I was like, this is not convenient to do. So I got this coffee machine that I grew my own espresso, grind my own beans and pull a shot and stuff. So it's a lot of fun. I've been learning a lot about like espressos and there are a lot of connoisseurs out there, but yeah, learning a lot on that fun stuff.

Demetrios [00:00:37]: Welcome back to the ML Ops Community podcast. I am your host, Dimitrios and today is a very special one for me because I am such a deep and dedicated Spotify user. We get to talk with Sandkey all about the recommendation systems within Spotify and how they've built that out so that various teams can leverage the embedding store that they have. I was fascinated by the different problems and challenges he talked about, especially when it comes to evaluating and re ranking and what their system looks like. There were some cool takeaways that I had from here. He broke down what the difference is between the vector store and the feature store and what kind of requirements are on each of these. He also talked to me about how me, as a person who is using Spotify, has an identity or I have a vector number attached to me and how they're constantly updating that. And for him, one thing that is very important is how they can continuously update that and make sure that the embedding and the place that I live in vector space is the freshest and most up to date as possible without blowing up the system.

Demetrios [00:01:50]: Because you can imagine, he said on here, there's 600 million monthly active users on Spotify. If you're constantly fine tuning these embeddings and you are churning them out for each user, you really have to think about how much is too much. And so he broke down those problems. I had a wonderful time talking with my man Sankit and it turns out that he's really into coffee like I am. Let's get into it. And if you haven't gotten your tickets to the conference, I would get it now because we're almost selling out. All right, man, let's start with this. You just told me a viciously hot take that I cannot let go too far from my brain.

Demetrios [00:02:38]: Before we get started. Rexys are rags. Break that down. Why do you think that got it?

Sanket Gupta [00:02:45]: Okay, sure. Let's. Let's. Okay, we're going right into the system then. Okay. Right. So the reason I think recommended systems have a lot of parallels to rags is because of this thing called vector database, right? And, I mean, it's. It's a fancy word that we are not talking about, but these so called embeddings and nearest neighbor lookups have been around for a long time, actually.

Sanket Gupta [00:03:08]: Right? It's just not. It's just been applied to text and NLP now. But let me give an example, okay? And then we can see how it relates back to rags. Say you're listening to a song on Spotify, and you want to know what are the ten most similar songs to that one, right? So let's say it's. It's bad habits by at Sheeran, right? So now you want to know, what are the ten songs closest to bad habits? Right? So how a human might do it is the human might, like, go into other songs of Ed Sheeran. It might, like, look at pop genre and do something like that, right, but how would you do that at scale, right? Like, how. We don't have humans sitting, right? We have algorithms. So how would you do that? Right? And that's where kind of vector databases come in.

Sanket Gupta [00:03:49]: Right? So we have embedding for bad habits or vector, whatever you want to call it. Vector or embedding, which is just a series of numbers, and we do embed.

Demetrios [00:03:59]: But it's off of the. Sorry, hold on. With the embedding. This is what I've always been curious to know, is it's off of a description of the song or it's off of, like, all of the metadata, like, the tempo, and if he's singing in falsetto. And so it's very much things that are describing the song, or it's off of the actual song itself.

Sanket Gupta [00:04:22]: It's a song itself mostly. Mostly. I mean, there are other aspects to it, but mostly it's kind of like the spectrogram analysis of the song. So, as you said, like, the tempo, the valence, the instrumental ness, the pitch, like, all of those kind of features, and then you add in kind of the collaborative aspect, which is, like. Which people put the songs together in a playlist, right? So it's kind of like playlist co occurrence. So it's a combination of those two things which kind of builds the embedding. It's a massive system we can get into, but the end output is an embedding. It's an embedding.

Sanket Gupta [00:04:54]: So we uniquely represent each track on Spotify with an embedding. So that embedding model is a black box for now. We can go dig into that, but it outputs an embedding. And then you load all of these embeddings in an index, which we now call vector database. And basically now what you do is you do nearest neighbor lookups on that embedding. You get back 100 or 200 items back. That's candidate generation. And then you do ranking.

Sanket Gupta [00:05:24]: And this is very similar to what rags do. They retrieve different documents with metadata and so on. And then they might rank them based on, if it's a New York Times article, it's higher. That's the kind of a shenanigan we do with rags. And it's very similar. Here you get back these 200 songs and you're like, okay, so which one are you likely to listen to next? And so on. So you kind of do this personalization and user understanding thing. Then you rank based on a model.

Sanket Gupta [00:05:53]: You get back those ten songs, basically. So that's kind of the approach at least. The retrieval augmented aspect is very much similar. The only thing I guess, missing between rag and Rexis is the generative aspect. Right? Like you don't generate back, like a paragraph, like explaining why, say, shivers is similar to bad habits. Right? Like, oh, they're sung by the same artist. So we don't do that. You just get the output, basically.

Sanket Gupta [00:06:17]: So that's kind of, I see how, I see rags and rexes kind of together.

Demetrios [00:06:21]: But how about when you're evaluating the rexis, do you think that's a whole different can of worms too? Or do you feel like there are parallels there? Like if it's actually working, how you score that, how you get ground truth, what the golden dataset is?

Sanket Gupta [00:06:36]: Yeah, I think evals are a little bit same and different. Right. And I think similarities are in the sense of all those ranking metrics, like NDCG and MRR and AUC Roc, depending on how you set up the problem. Those metrics are classic. All of ML uses that. The differences lie in how we do text matching. I see in rags they do matching of exact texts and tokens and blue score and rouge and things like that over here. Because we're not generating text, we don't really go into that.

Sanket Gupta [00:07:08]: I think that's the difference.

Demetrios [00:07:11]: Yeah, yeah, that makes sense. So you've been doing a lot of stuff at Spotify with recommender systems. You're working on a few different ones. But is it safe to say that all the new music that I've found recently on Spotify is in part because of the work that you've been doing.

Sanket Gupta [00:07:30]: Wow, that's a. I can take some of the credit, but not all right. Like, there's just so many teams working on it. So, yeah, I think it's a lot of amazing people on this problem, all kind of.

Demetrios [00:07:41]: And where's your special sauce? Are you working mainly on the platform, on the models, on the optimization?

Sanket Gupta [00:07:48]: Yeah, so I work as a senior machine learning engineer in personalization. So kind of my focus has been quite a bit on user understanding. So, like, how do we actually understand users in a way that helps out everyone basically in the company, like, in terms of the models. So more on models, some ML, like, obviously doing a lot of ML ops because otherwise you can't ship anything. Right. And then evaluations, also a lot of evaluations.

Demetrios [00:08:15]: Yeah. I imagine you're playing with features. You're knee deep in features, and that's probably your day in, day out.

Sanket Gupta [00:08:24]: Yeah, for sure.

Demetrios [00:08:26]: Any features that surprised you, that have a whole lot of weight you would have never expected?

Sanket Gupta [00:08:32]: I think one of them is, say, an account age, which is very interesting. So account age basically is like, how old is end user, right. And we have seen interesting patterns that you wouldn't expect. Like people who have been on Spotify longer have certain kinds of patterns than the new users. So there are very interesting patterns there.

Demetrios [00:08:53]: Is it because of the UI features that you feel like the account is different or the account age makes a difference when it comes to features?

Sanket Gupta [00:09:03]: Yeah, that's a fair, I mean, see, so I'm just giving an example of one model, right? So we have like dozens of models. So every model works differently, right? I mean, and so on. So features are unique to the models. But speaking to your question, like, I think some of the older users may have certain habits, right? Some of the newer users may be getting onboarded due to podcasts. For example, some of the older users may have joined because of music, right? So they have certain kind of habits, which is ingrained, and it's hard to change people's habits and so on. So, you know, there are those kind of correlations also, right? And we recently lost audiobooks, so that's another thing. So all of those things play a player role. Yeah, yeah.

Demetrios [00:09:44]: I've been loving the audiobooks, by the way. So there are some really interesting problems that you face, though, right? Like one being cold start and how you can recommend something to somebody who is first joining Spotify, which feels a little bit like that's probably the minority, I would imagine, because we've all used Spotify in one way, shape or form by now, I would assume, but potentially no, you have better ideas on that. And so how do you deal with the cold start problem?

Sanket Gupta [00:10:14]: Yep. So cold start is a very important problem. Right. And the two main reasons why I think cold start is very difficult to solve in Rex is I think listeners who are from e commerce would relate with what I'm going to say. One is that you don't have any history that makes intuitive sense. You just don't know anything about the user. The second less intuitive reason is because of how the systems are managed. We have usually bad systems.

Sanket Gupta [00:10:39]: These systems operate, maybe they infer once a day for all of users and they're running behind. We have almost 600 million plus users now. So it takes a while for us to churn through all of them. And there are so many dependencies and systems. So it can take one or two days for systems to actually catch up, actually. So someone new joins in. We don't know anything about them that's bad, but you can't do much about it. But our systems, ourselves as a person starts engaging are also slow.

Sanket Gupta [00:11:10]: So we take about two, three days to respond to it. Right. So they're both of these problems which kind of make a massive difference in cold start.

Demetrios [00:11:19]: And so what do you do?

Sanket Gupta [00:11:21]: Yeah. So let's solve the second problem first. Right. Which is because it's controllable by us, is to do a more like near real time systems. So someone joins on Spotify, they start listening to something like, say, eighties rock, right. And then our systems are like, in minutes they respond and they kind of create new user taste profiles and like these new embeddings for them. And then those embeddings can then be used across the company by all the models. So that's kind of the systematic aspect, which is like go from batch to near real time or real time.

Sanket Gupta [00:11:55]: Right. But then for the first problem, we have this thing called taste onboarding where you kind of select, I don't know if you remember doing that, but you can select three artists or four artists when you sign up, and then you can also select languages and so on. So it's about kind of smartly using that information when it's available. Right. So it's kind of combination of both of these things.

Demetrios [00:12:18]: So then you have a bit more information about someone. You mentioned that there are a lot of models that you're playing with. I am understanding that as there's a lot of models in the whole recommender system.

Sanket Gupta [00:12:33]: Yes, there are a lot of models in the recommender systems. Right. So just an example search would have their own model and then the home is going to have a different model. We have models that do ads, which are different models. We have discover weekly which would be a different model would have, as I said, what are the next ten songs? That's more of a candidate generation thing. That's another thing. Then we have multiple systems and people. It's a very, even as an, as a company, like, there are obviously going to be a lot of people, teams and dependencies.

Demetrios [00:13:09]: Now talk to me about. Because this is just for my own scratch, my own itch. When I say go to radio for any song, is that a new model?

Sanket Gupta [00:13:18]: Also, it's going to be a different kind of teams managing that. A lot of them actually tend to use the embeddings. Right. So the team I was working on is building kind of embeddings. And most, the commonality between all of that is the embeddings. And that's what I call as foundational embeddings because the kind of, the goal here is to not have every team do everything from scratch. The user is the same. So if it's demetrios, it's the same user.

Sanket Gupta [00:13:47]: We don't want teams to interpret you differently across the whole stack. We want teams to have a unified and a common understanding of you. So just to give an example, say you wake up in the morning and you listen to some music and then you jump onto a car, you might be listening to a podcast, and then you hit the gym, you're going to be listening to some high energy music. So those are some of the patterns that can be used across. Right. It's not just one team. Right. So as a user, you want to experience unified and a common thing.

Sanket Gupta [00:14:17]: And these embeddings kind of help that. So there is this embedding layer in between which kind of branches out across all these models.

Demetrios [00:14:26]: Okay. So it feels like getting the embeddings right.

Sanket Gupta [00:14:29]: Yes.

Demetrios [00:14:30]: Is one of the most important things.

Sanket Gupta [00:14:33]: I think so. I think that that play a massive role, especially for candidate generation. Right. Because we were talking about vector databases. Right. How would you. So we have hundred million plus items in our catalog. How would you take an item and bring back 100 out of 100? Like, that's a very, very tough problem for any ML model to do.

Sanket Gupta [00:14:53]: So you need some kind of candidate generation. So that's basically the retrieval problem and that typically is solved by embeddings.

Demetrios [00:15:00]: So one thing that does come up quite a bit, even in rag, is how you evaluate the retrieval. And so what are you doing to make sure that what you are retrieving is the best out of those hundred million, did you say 100 million songs or 100 billion songs?

Sanket Gupta [00:15:17]: How many hundred million items? So it's a mix of podcasts, music, audio, books, everything. Right. That's a lot, right? That's a lot of item. Kind of. And that's why evaluations play such a major role. So personally as an engineer, like, I have spent more time evaluating systems than even, like, building the model. Right. Like, I take the evaluations very seriously, and it's a very hard problem.

Sanket Gupta [00:15:41]: It's like sometimes you need models to evaluate a model.

Demetrios [00:15:46]: Yeah.

Sanket Gupta [00:15:47]: Fantastic. So let me give an example, I think, to answer your question. So you said, how would you know those ten songs are correct, right? So that's one model that got the ten right. Now you need maybe say, another model that is going to rank or test that. So you're going to see out of the ten, which ones are you likely to play next or play in the next 4 hours or seven days or whatever? And we kind of take an intersection of what you would have actually what you have been listening with that one. And then we kind of do intersections over that. So those are the kind of things we do to kind of evaluate some of the systems.

Demetrios [00:16:23]: Now, you mentioned as an engineer, you spent more time working on evaluation than even building the models, right. What is your time working on the evaluation system look like?

Sanket Gupta [00:16:37]: When I'm talking about evaluations, I'm talking about evaluating embeddings. Okay? So let me be clear there, because there are teams that do classification models where the evaluations are going to be not as difficult, right? So I mean, that's going to be classic precision recall, whatever, right? As long as you've set up the labels correctly, you know, you can get kind of pretty far, far with just the basic metrics. So I'm talking about these embeddings, right? And in the LLM land, those embeddings may be coming from, say, text embedding. Three large model from open air or whatever, where you parse in a text and you get back embedding. So those are the embeddings I'm talking about. And these are very, very abstract. You can't look at it. No human can look at an embedding and say, okay, that's good or bad.

Sanket Gupta [00:17:18]: Basically, you can't do it. So now, you know.

Demetrios [00:17:21]: Yeah, it's computer language.

Sanket Gupta [00:17:22]: It's a computer language. Exactly. It's numbers. And these numbers kind of have these nonlinear relationships with each other. So in terms of evaluations, we kind of break it down into more like sub facets, right? So one is intrinsic evaluations, right? So we take these embeddings in sitting in the vector space, and we say, like, okay, so we have a Demetrios embedding, right? Like, what are the 50 users similar to demetrios? Right. So we do the nearest neighbor lookup on that embedding. We get back, like 50 users. So it may be, say, me, 49 other people, and then we have these, some kind of heuristical system which is going to be like, okay, Demetrius likes podcasts, mostly podcasts, and it's a user in us.

Demetrios [00:18:06]: Know me well. Have you been looking at my embedding?

Sanket Gupta [00:18:09]: No, we can't. We can't really look at anyone's.

Demetrios [00:18:11]: But I'm just too easy to understand.

Sanket Gupta [00:18:14]: I'm just guessing. I'm just a guess. But, but based on those heuristics and matching, we are able to then say, okay, I think we are. Embedding is not bad, right? It's able to catch, like 30% of what we thought Demetrius similar users look like. So these are the kind of, like, intrinsic evals I'm talking about. Then we'd also do extrinsic evals, which would take the embeddings into other models. So you build these mock models, which kind of mock some aspect of Spotify, right? So as I talked about predicting future listens. So you would actually, what you would do is you take Demetrios embedding, and then you put it into a system and say, like, what would Demetrios listen to in the next seven days? Right.

Sanket Gupta [00:18:55]: So, and then you kind of see the actual listening, and you kind of do overlaps of that. So you're kind of, like, really figuring out if embedding is actually good, is even doing its job. Right. And like that. Like, you build these basically mock downstream models, and you try to, like, evaluate these embeddings by passing through those models. So you make you convert evaluating embeddings into a classification problem, essentially. Yeah, it's kind of what I'm saying.

Demetrios [00:19:20]: So you're trying to predict and you don't actually do anything. You're just putting that side by side with my real actions, and you're seeing how well you were able to predict that.

Sanket Gupta [00:19:31]: Yep. If that makes sense. Right. Because embedding itself, you can't do, as I said, it's hard. Right. But then you can feed it into these systems. Right. And these systems are now classification or regression systems, which we know from all our love with classical ML.

Sanket Gupta [00:19:47]: Right. Like those are, if they are classic techniques, you can easily evaluate them through Roc AUC precision, recall curves and so on. And then when you feed them into, you can evaluate the downstream systems. So you're not evaluating the embeddings yet, you're just evaluating the downstream systems. And if they improve, you can basically say like, embeddings are doing better. So it's a two stage process, which is what makes it an incredible problem. Right. And it's very hard to, like, tell people.

Sanket Gupta [00:20:16]: Okay, so you're now building embeddings, and then you have to evaluate with these downs, like mock models, which are there. So yeah, there's a lot kind of lot going on there.

Demetrios [00:20:25]: And how are you able to test this stuff out? Easily. It feels like there's a lot of infrastructure in place to be able to support that.

Sanket Gupta [00:20:35]: So we have internal platform teams that kind of help with a number of things. And, yeah, so it's a whole stack of distributed systems like cloud, data flow, bigtable. We use Kubeflow. So all of those kind of things kind of play a role there. So a lot of the engineers time obviously is also not just in coming up with an idea, but also productionizing it in these systems.

Demetrios [00:21:01]: Didn't one of these pipeline tools come out of Spotify way back when? Was it Luigi or something? Is that still in use?

Sanket Gupta [00:21:10]: Yep, my team actively used it, but yeah, I know a lot of teams are now maybe moving to more modern tools, but yeah, Luigi is. Love Luigi.

Demetrios [00:21:18]: Yeah. Really? Oh, that's awesome. That's great. And so it does feel like there are other ways to do this. What are different approaches? If you didn't use this, and correct me if I'm wrong, the style that you would call this is like transfer learning.

Sanket Gupta [00:21:38]: Yep. So transfer learning. Exactly. Again, let me take an analogy of llms. Right. I think that will help here. So OpenAI comes up with a new embedding model, right. And then they create these foundational embeddings.

Sanket Gupta [00:21:50]: But then you work in a finance company or say like, you are in a law firm and you are analyzing like a lot of these law, tax or like tax tax docs or whatever. So you might want to fine tune them on your data. So that's basically transfer learning. So we are kind of tuning the underlying model. You're not changing it. And that's kind of the same approach we take here where we build kind of these foundational embeddings. So our team would build, like these representation learning based models to build embeddings. And then all of the downstream teams would on their own task, like tune them, essentially.

Sanket Gupta [00:22:26]: So when you're talking about personalized DJ, like, they would maybe take the embedding and then they would tune it in the voice arena, voice setting. Like, what works, what doesn't work? A search team would then instead look at search features. What are you likely to search? They would still use the same embedding. They're just tuning it in their own models. Basically, that's the approach of transfer learning.

Demetrios [00:22:47]: Are there ways that you think this could also work at Spotify without using transfer learning?

Sanket Gupta [00:22:53]: Everything has an alternate. Something used to be done, right? So now, like, yeah, so the alternate is don't use embeddings, just kind of build everything from scratch. Right? So we look at, like, last n days of history of a user. We look at what devices the user uses, the country, the listening, and like, so you, every team starts to do everything from scratch, and you don't use any embeddings, you don't fine tune anything. So again, analogy to LLMs is the financial company builds their own LDM from scratch. Like, they don't touch OpenAI, they don't touch Lama or whatever. They just build the whole encoder, decoder, transformer. It's possible.

Sanket Gupta [00:23:31]: It's not impossible. I mean, you can still do it, but the amount of difficulty, you might need to put in like 50 engineers for like six months to build that, right? Yeah, it's expensive. Right. And GPU costs and so on. So you just use someone else's learning and you transfer it to your own task. That's where a lot of the secret sauce and magic happens.

Demetrios [00:23:53]: So you're constantly updating me as a user in the embedding database or the vector database. And where do features come in?

Sanket Gupta [00:24:03]: Yes. So features come in during inference. So we train the model and say, I'm like, okay, Demetrios has started listening to jazz, and I want to update.

Demetrios [00:24:14]: You told me, too. Well, man, what is that?

Sanket Gupta [00:24:16]: That's great. Okay. I haven't looked at Bill Evans, and.

Demetrios [00:24:20]: I am stoked on a Saturday afternoon. My, wow, my day is made. Anyways, I started listening in jazz. Yeah.

Sanket Gupta [00:24:29]: Yes. And our system is going to be like, okay, let's trigger an inference for Demetrios. Right? And when you trigger an inference, you're going to be like, okay, let's build the feature set for Dimitrios. Right. So then those features maybe say demographics. They may be like, what have you been listening in the last six months? What have you been listening over the last 28 days? What have you been listening in the last week? So you have these different timeframes and jazz kind of comes in, into the mix, basically. So then you update the representation so you're not moving that embedding. Extremely.

Sanket Gupta [00:25:04]: We just forget everything you've heard in the last six months and we just make it like a jazz thing, but we just want to dilute it a little bit. We are like, okay, let's just take Demetrius pop and rock taste and just put in a tinge of jazz onto it. And if Dimitris starts enjoying jazz quite a bit, we will expand jazz in that taste and we'll remove some other things. So that's kind of how it continues to update.

Demetrios [00:25:30]: And do you have a weight on what I'm listening to right now versus what I listened to six months ago or six years ago? Because I've been on Spotify for a while. Right. So how can you balance all of that music? I know there are some times that I'll see on the UI, something will pop up and it'll say, like, you an old classic or something. I can't remember the exact words, but it's kind of like you used to like this do you still like it kind of thing and you want to jump back in and listen to it?

Sanket Gupta [00:26:04]: No, that's a good. So, I mean, those are, again, some teams that may be like looking at raw data and doing that. Right. But in terms of, like the taste profile and embeddings, we do have certain factors where we take some of the time, but we let them. Mostly we let the models decide it. Right. So if you kind of are being very active and listening to something from five years back, like our model will weigh that higher automatically. Right.

Sanket Gupta [00:26:30]: Otherwise we're going to be by default, we are weighing the recent more than not, basically.

Demetrios [00:26:35]: All right, real quick, let's talk for a minute about our sponsors of this episode. Making it all happen. Lattice flow AI, are you grappling with stagnant model performance? Gartner reveals a staggering statistic that 85% of models never make it into production. Why? Well, reasons can include poor data quality, labeling issues, overfitting, underfitting and more. But the real challenge lies in uncovering blind spots that lurk around until models hit production. Even with an impressive aggregate performance of 90%, models can plateau. Sadly, many companies optimize for prioritizing model performance for perfect scenarios while leaving safety as an afterthought. Introducing lattice flow AI.

Demetrios [00:27:22]: The pioneer in delivering robust and reliable AI models at scale. They are here to help you mitigate these risks head on during the AI development stage, preventing any unwanted surprises in the real world. Their platform empowers your data scientists and ML engineers to systematically pinpoint and rectify data and model errors, enhancing predictive performance at scale. With lattice flow AI, you can accelerate time to production with reliable and trustworthy models at scale scale. Don't let your model stall. Visit latticeflow AI and book a call with the folks over there right now. Let them know you heard about it from the Mo ops community podcast. Let's get back into the show.

Demetrios [00:28:08]: One thing that is fascinating, though, is thinking about those people that are like me and you mentioned there's over 100 million different items. Does that include people? Because I would imagine people are. There's. You said there's 600 million users of.

Sanket Gupta [00:28:26]: The app, the 600 million maus worldwide. Yeah.

Demetrios [00:28:32]: So, yeah, I'm just wondering, there's got to be a ton of people that are very much like me. I think I'm a special snowflake, but if I look in your vector database, I'm going to find a whole lot of people.

Sanket Gupta [00:28:43]: That's the beauty of embeddings, right? Like everyone is a snowflake, right? No one person is going to get the same embedding. So that, that's kind of so special, right?

Demetrios [00:28:51]: Yeah, yeah, that is very true. And so then, as you're looking at these different embeddings and you're looking at ways to make it better in the future, what are things that you're focusing on?

Sanket Gupta [00:29:05]: Yeah, few things, right? One is definitely feature spaces. So, like, how do we better represent a user, right? So you could. So for now, we have been focusing a lot on the music side, right? So we have a musical embedding and then we have podcast embedding and so on. But like, you can imagine like a cross content kind of embedding where we take Demetrios podcast listens along with music, right? So you may be listening to jazz on a day when you listen to say like a history of music, podcast, whatever, right? So there may be some correlations there which are actually very difficult to find algorithmically. So to do that, we need to combine your podcast music and all of that taste together and pack it together. That's definitely one aspect of improvement. Another one is how fast we respond. So you listen to something and we bang, we respond immediately, and all of your Spotify is updating immediately after that.

Sanket Gupta [00:30:03]: So I think these two itself. These two. The two things I mentioned itself a very tough problem, and it takes, I think, I would say, several months or even years.

Demetrios [00:30:12]: Yeah. How hard is it to shift from the batch? Like you said, it was like a one to two day batch type recommendation cycle to going to that cycle.

Sanket Gupta [00:30:26]: Yep. Fascinating. Fascinating thing. Right. The way I see it is that there is a Goldilocks zone with how often you infer. So I call it Goldilocks zone. Basically, I think a lot of people think it's either batch or real time, but actually, I think there may be something in the middle which is near real time. So by near real time, it basically means that you don't need to infer for Dimitrios when you start listening to jazz in, like, 30 milliseconds.

Sanket Gupta [00:30:56]: We don't like your Spotify listening. You'd not, like, dramatically switch in less than a second. That's unlikely. So instead, you can put that jazz listen in a queue along with some other people. And as the systems process that, so it can take a few minutes, but one or two minutes is acceptable latency there. That's kind of what I say. Goldilocks is a good trade off between being responsive, but at the same time, you're not blowing up. Otherwise, your engineering systems have to, like, scale up to massive skills, and then your whole on call burdens you need to put in.

Sanket Gupta [00:31:39]: I don't know how many sres, and you have to do all kinds of stuff to make that a reality. So I think that's kind of the dream, is to train as less as possible, but infer as much as possible, but not in real time. That's kind of how I see this thing play out.

Demetrios [00:31:58]: So if I understand it correctly, too, it's that every time you want to update. So the closer you get to real time or near real time, that means that you have to create new embeddings each time.

Sanket Gupta [00:32:15]: Yeah. And let me just give a little bit, just to break this nuance a little bit. Right. Okay. So our systems are fast. They're still like. Like, the serving is all real time. Serving cannot be, like, you can't wait around two minutes to get back the ten listens, like what I said.

Sanket Gupta [00:32:32]: So it all has to be in 30 milliseconds. Right. So, I mean, just want to be clear there. Right. So that's kind of the final layer. Right. But the question is that that online store, that feature store that is online, that is serving those. Those recommendations, how fast is that being updated.

Demetrios [00:32:48]: Yeah.

Sanket Gupta [00:32:48]: So you have like we understand Demetrios, but that understanding may be of three days back. So whatever you were doing on Spotify three days back is what our understanding is at this moment. That's what I say. That still is still being served very fast. We still are serving that at 30 milliseconds, but it's still serving things that we know about Dimitris three days back. So that is a batch features in online store. That's what I mean by that. It's still batch systems because your understanding is behind, but you are still in online.

Sanket Gupta [00:33:19]: The question is to move your understanding of features as much as closer to real time as possible. So you want to move it to a level where it's like a few minutes lagging features serving in 50 milliseconds, so near real time inference served with real time systems, if that makes kind of sense.

Demetrios [00:33:39]: Basically it's very clear now, 100% you have an understanding of me that's three days old. And so whatever, I really like that. Put it into perspective because now I see that, okay, whatever I do today, it's going to be reflected on my Spotify recommendations in three days.

Sanket Gupta [00:34:02]: Yes.

Demetrios [00:34:03]: And so one other thing that I was thinking about is that the vector database that you have, is it just one gigantic database that everything goes into? Because you were saying that you want to be able to have any ML team, any recommendation team play off of it? Or is it a bunch of different ones depending on the location or whatever it is. How's the system look?

Sanket Gupta [00:34:29]: We leverage a lot of bigtable, which is essentially a Google Cloud product. Amazing product. We have multiple different vector databases for different purposes. But for say tracks it would be one vector database because we want to keep all of that 100 million items or whatever that may be, whatever we want to surface in one vector database and then we do nearest neighbor lookups on that essentially.

Demetrios [00:34:56]: Okay. And then you know which ones to go for depending on what the use case is.

Sanket Gupta [00:35:03]: Exactly. Yep. So that's kind of, and we have to scale it quite a way as you can imagine. Like all of the auto scaling and it's a very, very complex system like just to get that to function like, so that like you see, you see no difference. Basically as a user, you see no perceive nothing is what our goal is. And that takes tremendous amount of effort. So I do want to shout out to all the, all my team members in that regard.

Demetrios [00:35:26]: Yes. Give them a huge shout out. Yes. Correct me where I'm wrong. I'm gonna try and break down the cycle here, huh? And you tell me where I'm not understanding it. I go on to Spotify. I play my Bill Evans and Chet Baker jazz playlist that I absolutely love. This is the first time that I'm playing it on Spotify.

Demetrios [00:35:50]: I haven't played it on Spotify because I didn't realize it was on there. And I've always been playing it on YouTube. Now I play it and it's like, oh, new information about Demetrius. That information gets taken and it gets put into an embedding model to update my embedding. And so you're constantly gathering information about me. And if there's new things or whatever the information is, you're just adding a little bit to my embedding, which is your understanding of me as a user. And once that embedding is there, then it goes and it's able to feed that information and that description to the online feature store and be able to serve everything very quickly so that in a few days after I listen to my Bill Evans playlist, I now am starting to see more cool, great live Bill Evans tracks that I had never heard. And it's coming up maybe on my, if I'm just, sometimes I'll let the Spotify just play and it will come up with random songs and so it'll be spiced in there.

Demetrios [00:37:03]: And then I give it a like. And now there's like more signals that, oh, actually he's really getting into jazz and it gets reincorporated in where I think I'm losing or I'm not so clear on is how does the embedding and the vector store go to the feature store and, like that pipeline or that route, what does it take and how does that look?

Sanket Gupta [00:37:31]: So you said, like, you listen to that playlist that you had heard on YouTube, and then you come on Spotify and listen. So our goal is to not take those five days or three or whatever, the end days to respond. We want to respond ASAP. I think we are clear on that. Right. So that's kind of the goal of this new model is like, we update, like ASAP, or rather, I should not say ASAP. Like, to be technically correct, like few minutes. So we want to respond in a few minutes.

Sanket Gupta [00:37:56]: Now, your question was, how does vector database and feature stores kind of connect with each other? So it's actually like, it's very similar. It's kind of one and the same when you're talking in the online setting. Right. Because lot of the times teams don't care about embeddings teams want to know the, so in a rack system, a person will not care about what the embedding is. They want to know the documents. What are the nearest documents? Same thing applies here. Teams, they don't personally care what the embedding value is. So they call the vector database, which you can say is an online feature because they're just calling that and then getting back those, those documents or tracks, basically.

Sanket Gupta [00:38:44]: So they don't really care as much about the raw embeddings when we're talking about these candidate generation cases. Now there is a second use case which is aligned to more. What you are saying is say a team is doing ranking and that ranking is where you might feed in demographics and are you a podcast listener? So there they fetch the raw embedding from our vector database. That's why I'm like mixing the word. So it's in candidate generation, it's vector database in a ranking setting. It's a feature store. They're one in the same thing. I mean, if that makes sense.

Demetrios [00:39:22]: Interesting.

Sanket Gupta [00:39:24]: Yeah. Okay. It's a little like little nuanced. But because ranking team is building their own models there, they care about the real values of embedding. They want the real value of the embedding. Demetrius is like whatever, like 128 or 256 dimensional embedding. And that goes into their model, which would then infer, okay, right. Like I think when you search for Q, is it like queen or is it like some other artist? Right?

Demetrios [00:39:51]: Yeah.

Sanket Gupta [00:39:51]: So they're like in that search model, like your raw embedding matters. But when we were talking about the bad habits and shivers example, like there, the raw embedding does not matter. You just want those 200 candidates or 200 tracks back. That's a more vector database analogy. And in the ranking, it's a more online feature store analogy. I don't know if that helps.

Demetrios [00:40:16]: Yeah, I like it. It separates it very much because I was under the impression that it would be like, okay, cool, vector database now goes to the feature store, or the embedding now goes to the feature store. But it's more like, no, the feature store is kind of doing what the vector database does. And so there's a little bit of overlap there and you still need the embedding, but it's not like you need the embedding when you're doing the different use cases. And so what I think about then is feature stores are very much known for being this power up when you need like real time. Right. And when you're doing that, very, very fast inference. Does your use cases with these embeddings and trying to get the embeddings out and the candidate generation, like, do you have, how do you make sure that that is fast on inference? Because is that still, like, you're still having to go up against whatever, 50 milliseconds inference time, right?

Sanket Gupta [00:41:20]: Yep. So we keep everything in these fast online stores, which are super fast, 50 milliseconds, and we update them in the backend. That can be one day, two day, three day, or that can be a few minutes. But we still need some notion of Demetrios in that online store so that the search model can call that in 50 milliseconds, essentially. Are they getting the best version of Dimitrios? They can, maybe not. They are still getting a couple of days behind. But in many, majority of cases, that's fine. Even two days behind is not such a bad.

Sanket Gupta [00:41:54]: You know, like, it's fine, you know, like for maybe 90% of the systems, they don't really care, right? Like, they're like, oh, yeah, Q would mean queen for. For Demetrios and then not like some other artists starting with Q. Like that thing, to be honest, is going to be very similar for you, right? Like, but for, for another team, say, that is building out the home shelves, they really care about up to date, because if you started listening to Jazz, you better want to see Jazz there. You want to see Jazz, like, right up on the home screen. That team cares about being updated fast, so it really is, again, very nuanced. Like, which team, what is the use case and how fast they want to be updated, these embeddings.

Demetrios [00:42:33]: And so the next step on this journey feels very much like, how are you informed at how your product is going to be used within the Spotify app? Because it feels like the whole way you go about creating the product would change depending on where you're going to sit within the app. Sanyam.

Sanket Gupta [00:42:55]: Exactly. It's like so many different use cases and complexities. I think that's where we were talking about evaluations. When you want to make a generalized, foundational embedding, you have to cover so many cases. How do you even evaluate that? We can't predict how is it going to be used. Right. So that's why, like, as I said, like, evaluating these systems sometimes is harder than even building them, right? Because not everyone can. I said, you can just fine tune, but not everyone can fine tune.

Sanket Gupta [00:43:25]: Not all teams have the capacity or ability or desire to fine tune. They're like, I'm just going to use your product. If it's good, we'll use it. Otherwise, not now. You really have to step up and, like, build embeddings that can do something for everyone. And that's where doing something for everyone in ML, you know, like, how difficult that is, essentially.

Demetrios [00:43:48]: Exactly. Like, it wasn't hard enough already.

Sanket Gupta [00:43:50]: Yes.

Demetrios [00:43:51]: You've got three teams that you're servicing, and they all want you to go in different directions.

Sanket Gupta [00:43:57]: Yes.

Demetrios [00:43:59]: What are some of these different pools? So I guess the way that it works within Spotify, on the organizational level, is that you're creating this product, and then the UI or UX teams come to you. Or do you have UI UX people embedded in your team to help you and guide you with how it would be used? And I would imagine also the product managers are helping along that I don't.

Sanket Gupta [00:44:24]: Personally work with UI UX as much because a lot of these systems are. ML systems create candidates that can basically be a black box. So for a UI UX person is going to be like, okay, we will get, like, ten tracks. How are they created? We don't care. Right. So that's. That's like, all of this ML that we talked about in this show, right? Yeah. So for the UI UX person, it's, again, a black box.

Sanket Gupta [00:44:50]: They don't technically need to care about it. They only care about, okay, once we know the ten tracks, how is the best layout? Do we allow Dimitrios to skip past it? Do we allow Dimitrios to, like, swipe up and add to playlist? There's a lot of those UI UX elements that you actually interact with, but the way we play a role in that is through feature collection. Sometimes, as ML practitioners, we want to know if Dimitrios is skipping past some of those things in radio or not. And that is something we want to get logged and things, and then we use those logs in our models. That's where I see the loop. It's going more from Ui Ux to ML rather than the other way, if that makes sense. So it's all features. It's all kind of interconnected.

Demetrios [00:45:37]: Wow. So I'm trying to figure out where. Where to go next, because this is so cool to think about how you're basically just saying, hey, here's what you get. If it's valid, take it, go have fun, plug it into the app, and if not, maybe request something else, or it's probably not for you. Go ask another team.

Sanket Gupta [00:46:01]: Something like that. I mean, it's more nuanced, but, yeah, something like that.

Demetrios [00:46:05]: Yeah. Okay. And it does feel like there are constantly new features that are coming up that you could potentially want to track. Like, I know that there is the playlist where you can kind of tune into a playlist by swiping through it and just hearing 30 seconds of a song. How do you stay up to date with. All right, there's this new feature that's going to be rolled out across the app. Now, you can incorporate this into your fine tuning methods.

Sanket Gupta [00:46:39]: Yeah, fantastic question. Yeah, we have some amazing data engineers and data scientists. So we typically start this process at data science level. Right. So we. So instead of incorporating a new fancy feature, like, right away, because you know how difficult productionization is. Right. We first.

Sanket Gupta [00:46:56]: I mean, the way, at least my philosophy personally, is to first use those features and evaluations. What you could do is say a fancy new audiobook thing is coming out. You could just use that feature to build a cohort. Right? So you would say, okay, we have been evaluating embeddings on podcast listeners, music listeners. How about audiobooks? And we use kind of those audiobook features in that evaluation. And if we gain confidence in these kind of notebooky environments and with analytics and a b testing kind of situations, then, you know, okay, this feature is not bad. Like, you know, this feature is pretty good. And, you know, like, it's helping us.

Sanket Gupta [00:47:33]: If it's helping us move our metrics in the evaluation land, maybe it will help our model. We bring it back. Right. So it's. We will not typically jump onto anything new and fancy unless it's really, really cool. So that's kind of how I see it kind of play out.

Demetrios [00:47:53]: And are the main metrics you're looking at right now, what you were talking about with speed and relevance?

Sanket Gupta [00:47:59]: Yeah, relevance is definitely one of them. So it would be like, essentially like, things like relevance, accuracy, like all those things. But also, we have these ranking metrics. So imagine you have home where there are multiple shelves. This is a ranking problem. Which one are you clicking or which one are you engaging with? That's where things like MRR and NDCG come into play. But, yeah, you could say that that's some form of relevance. That's more of a ranking relevance.

Sanket Gupta [00:48:37]: Yeah, yeah.

Demetrios [00:48:40]: And that's also where that, like that twin or that prediction simulation comes in, where you get to say, let's see if we do this or if we show these or whatever. Let's pretend like we're gonna simulate out what Dimitri is gonna do for the next three days, five days, and then I throw you a curveball. And I listen to gregorian chants and you go, whoa, I was not expecting that from this guy.

Sanket Gupta [00:49:05]: Yes, exactly. Exactly. But some of our discovery teams will be very happy, right? They will be like, okay, Demetrius has discovered something new. Like, can we update discover weekly now? Because discover weekly, like, follows you, but then they want to give you something you've never heard before, right? So they really, like, they love it. Okay, now you have heard this something new genre. Like, let's pick on that and, like, start showing you. So. But some other team is going to be unhappy.

Sanket Gupta [00:49:31]: Some other, like, so it's, you know, it's just kind of how, like, all these different systems interplay and work together to kind of satisfy the user, basically.

Demetrios [00:49:39]: Yeah, man, it's been awesome talking to you. I really appreciate how transparent you are in all of this stuff that you're working on. It sounds like a really hairy question when it comes to evaluating this. Keeping it fast, but also making sure that what you're getting is up to date and it is interesting for the user is very fun.

Sanket Gupta [00:50:03]: Yep, absolutely fun stuff.

Demetrios [00:50:06]: Right on, man. Well, this has been a pleasure. Thanks for coming on here.

Sanket Gupta [00:50:09]: Thank you so much for inviting me. Pleasure.

+ Read More

Watch More

RecSys at Reasonable Scale, Data Science in Ag, and Setting Sail with GitOps
Posted Sep 23, 2022 | Views 696
# Recommender Systems
# Reasonable Scale
# Data Science
# GitOps
# cargill.com
# interos.ai
# coveo.com
# flatiron.com
MLOps Community Meetup in NYC @ Spotify
Posted Nov 01, 2022 | Views 994
# Ray
# On Device
# Model Serving at Scale
MLOps at the Crossroads
Posted Jan 16, 2024 | Views 5.9K
# MLOps
# Kentauros AI
# LLMLOps
# AIMedic