Sign in or Join the community to continue

Ads Ranking Evolution at Pinterest

Posted Feb 13, 2024 | Views 291

# Ads Ranking

# Machine Learning

# Pinterest

Share

speakers

Aayush Mudgal

Senior Machine Learning Engineer @ Pinterest

Aayush Mudgal is a Senior Machine Learning Engineer at Pinterest, currently leading the efforts around Privacy-Aware Conversion Modeling. He has a successful track record of starting and executing 0 to 1 projects, including conversion optimization, video ads ranking, landing page optimization, and evolving the ads ranking from GBDT to DNN stack. His expertise is in large-scale recommendation systems, personalization, and ads marketplaces. Before entering the industry, Aayush conducted research on intelligent tutoring systems, developing data-driven feedback to aid students in learning computer programming. He holds a Master's in Computer Science from Columbia University and a Bachelor of Technology in Computer Science from the Indian Institute of Technology Kanpur.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Listen to the lessons from the journey of scaling ads ranking at Pinterest using innovative machine learning algorithms and innovation in the ML platform. Learn how they transitioned from traditional logistic regressions to deep learning-based transformer models, incorporating sequential signals, multi-task learning, and transfer learning. Discover the hurdles Pinterest overcame and the insights they gained in this talk, as Aayush shares the transformation of ads ranking at Pinterest and the lessons learned along the way. Discover how ML Platform evolution is crucial for algorithmic advancements.

+ Read More

TRANSCRIPT

Demetrios [00:00:00]: Hold up. Before we get into this next episode, I want to tell you about our virtual conference that's coming up on February 15 and February 22. We did it two Thursdays in a row this year because we wanted to make sure that the maximum amount of people could come for each day since the lineup is just looking absolutely in. Incredible. As you know, we do. Let me name a few of the guests that we've got coming because it is worth talking about. We've got Jason Liu. We've got Shreya Shankar. We've got Druv, who is product applied AI at Uber. We've got Cameron Wolfe, who's got an incredible podcast, and he's director of AI at Rebuyengine. We've got Lauren Lochridge, who is working at Google, also doing some product stuff. Oh, why is there so many product people here? Funny you should ask that because we've got a whole AI product owners track along with an engineering track. And then as we like to, we've got some hands-on workshops, too. Let me just tell you some of these other names just for a know, because we've got them coming. And it is really cool. I haven't named any of the keynotes yet either, by the way. Go and check them out on your own if you want. Just go to home, mlops community and you'll see. But we've got Tunji, who's the lead researcher on the Deepspeed project at Microsoft. We've got Holden, who is the open-source engineer at Netflix. We've got Kai, who's leading the AI platform at Uber. You may have heard of it. It's called Michelangelo. Oh, my gosh. We've got Faizaan, who's product manager at LinkedIn. Jerry Liu, who created good old llama index. He's coming. We've got Matt Sharp, friend of the pod, Shreya Rajpal, the creator and CEO of guardrails. Oh, my gosh. The list goes on. There's 70 plus people that will be with us at this conference. So I hope to see you there. And now let's get into this podcast.

Aayush Mudgal [00:02:12]: Hi, I'm Aayush. I'm a senior machine learning engineer at Pinterest. And I like to have my coffee, but I think for Indians, I've been not used to black coffee and stuff. I think Indians used to have some kind of, like, I would say fake coffee. It's like ready to make coffee. You add milk to it and that's it. And I think that's the coffee I.

Demetrios [00:02:30]: Still like welcome back to the one and only Mlops community podcast. I am your host, Demetrios. And today we're talking with Aayush. This is what I would call a candid conversation. I appreciate everything about how honest and upfront Ayush was with his whole experience from 2018 till now, working on the ads team at Pinterest and bringing ML into their capabilities. He walked us through everything from the nascent ML project to the advancements that they've had. And I think one huge takeaway from my side was along the lines of how evolution happens and how evolution happens when you're at a big company like this, because in 2018, they had to make decisions on what tech stack they were going to be using. And he talked about how the pipelining tool that they were using they ended up having to get off of because it was something that was out of Twitter and Twitter ended up not supporting it anymore.

Demetrios [00:03:41]: There was no company behind it. And they realized that this might not be the safest bet. So they got off of it and they had to evolve. And they did this many times throughout their journey, as he goes on to explain. And so I like looking at this evolution and how he breaks down the evolution in terms of what the ROI is. Every time they upgrade or they make the choice to, quote unquote, lift and shift, the thing that kept coming back to me was, is the juice worth the squeeze? That's one of the sayings that my dad used to always tell me, you're doing this, but I don't know if the juice is worth the squeeze on that. And he thinks about it in that way. He really thinks, okay, what's the ROI? If we're going to rip out some technology and make sure that we have a better technology, is it going to be worth it? And are we going to be regretful two, three months down the line or a year down the line because a potentially better technology has come out.

Demetrios [00:04:52]: So he talks through the decision-making on that and what they did specifically for their ads optimization platform at Pinterest. Hope you enjoy this conversation. As always, if you do share it with one friend, it will mean the world to me, and I will see you on the other side. What are you doing in India? You've been traveling the world. What is going on?

Aayush Mudgal [00:05:22]: Yeah, I think Pinterest is very flexible as far as, like, I think they allow us to work like 90 days outside the country. And I thought it's a better time to enjoy that. I've been, I think, out since November, mid somewhere. I went to Europe for, like, two weeks, then been in India. I think this is a time where it's very auspicious to get married in India. And there are a couple of my friends who are getting married. So I thought, let's maybe just pay back.

Demetrios [00:05:51]: Oh, I thought you were going to tell me you were having a wedding.

Aayush Mudgal [00:05:55]: My wedding happened, like, two years ago.

Demetrios [00:05:58]: Well, congratulations. Related. Congratulations.

Aayush Mudgal [00:06:01]: Yeah. Thanks so much. I think Pinterest allows that, and I think there are not many companies right now. Many companies are calling people back, but I think Pinterest has been in a good spot there, still allowing people to work remotely.

Demetrios [00:06:16]: So when it comes to the indian weddings, be honest with me. Have you been to one with an elephant yet?

Aayush Mudgal [00:06:22]: Yes, just recently. Last week. Yeah.

Demetrios [00:06:28]: So wild.

Aayush Mudgal [00:06:29]: Yeah, it's fun. It's different than what used to be in America, but, yeah, it's fun to be here.

Demetrios [00:06:36]: Yeah. Whenever my indian friends are getting married in the US, I ask if they're going to have an elephant and they're like, dude, you know how expensive that would be? That's just not in the cards right now. Either I have the wrong friends or they need to go to.

Aayush Mudgal [00:06:53]: A good. It's a good time for India right now. It's, like, chiller than normal, though. It's pretty hot if you go in the summer.

Demetrios [00:07:00]: Yeah. So let's give people a little bit of background on what you're up to, what you've been doing, because you've been working at Pinterest, as you mentioned, you've been there for a while and you've been loving it. You haven't changed the team. So maybe we should start with that. What's the deal with this team you've fallen in love with?

Aayush Mudgal [00:07:18]: Yeah, I think that's a good question. I joined Pinterest in March of 2018, so I'm nearing six years. I've been on the same team. Initially, I never wanted to be on that team. I would say that's a funny part of it. I was like, if I want to be, I want the ads ranking. Our job is to rank ads and show the best set of ads, connecting the advertisers to users. But I was like, this was the last thing that probably I wanted to do in 2018.

Aayush Mudgal [00:07:48]: But then luckily, I ended up on this team, and I haven't changed the team ever. It has so much to offer. There are so many things that you can do, both technically and also as a product, both from a technical and product standpoint, and has satisfied, I would say what I was looking for in a job, and that's exciting.

Demetrios [00:08:08]: Well, it seems like you've gotten to create a lot of different projects right over your eight years or since 2018. I guess my math is not so strong. Six years. I knew I was wrong, but I didn't want to tell myself how wrong I was. I wouldn't admit it to myself, but I guess it was pretty wrong. So the thing about it is that you've created a whole slew of projects when it comes to working with this team. And if it's all right with you, I would love to just break down these different projects and how you created them, how you went about designing them and what you would do different. Now that we are in 2024, and obviously technology has advanced quite a bit.

Demetrios [00:08:53]: So ML in 2018 was a whole different beast than ML in 2020, and now we even call it AI. So it is in 2024 a whole different beast.

Aayush Mudgal [00:09:04]: Yeah. I think, looking back, I think it has been a long journey, I would say, in machine learning. I think in 2018, we were doing product building. Pinterest was growing at a faster pace, and advertising was something that Pinterest has started. I think we were around pretty much just building products, not caring that much about how to make the products better in terms of machine learning, but just getting from zero to one. And one of the major project that I worked on was conversion optimization. And one thing about ads business is you have group leaders at Google and Facebook where, you know, these are the things that work for them. And it came up to the point that how do you make it work for Pinterest? You have some direction.

Aayush Mudgal [00:09:53]: You have some direction, this is what you want to build. But how do you build is totally different at different companies, depending on your users and your advertisers. I think we knew that conversion business is big and it's going to work because it works for our competitors. And that's what the advertisers also care about. So what it is mostly about is when you start building products, the simplistic product we would build is to just, can you deliver more of these ads to users? Then the next product would be, can you drive more clicks on these products so that advertisers care more about clicks, not just people viewing it? And then the next step would be, it's not only about clicks, but can you get users to convert more on those products, like buy something or dig to an ad, to cart on their website, and all of those complex things which advertisers care about. So conversion product refers to the later product so that's where when I joined, the other two products were already built up in general, like click product was there. We had also the impression product, but conversion product was not there. One thing in 2018, I would say it was not much machine learning at that point.

Aayush Mudgal [00:11:03]: It's about how do you connect the pipelines together? How do you get through that? And I think one of the pipelines.

Demetrios [00:11:09]: Look like, what was it?

Aayush Mudgal [00:11:10]: Yeah, man, that's totally different what that is today. But in terms of at that time we were using, for basic etls, we are using a cascading, like Scala. Cascading and scalding. That came from Twitter. I think till maybe last one and a half years, we were still relying on that kind of a legacy stuff. And I think over time you started removing less of those legacy technologies because I think both from the technology was not getting updated, but also finding engineers who are familiar with that kind of toolkit also gets harder because not many people are using. So one thing, at least over time. Pinter started moving that those scala, not Scala, but it was in Scala, but that cascading and scalding jobs to more like Spark.

Aayush Mudgal [00:12:01]: Spark waste jobs. And I think that as something which back in 2018 was not that popular as such, what we had in 2018 was probably what we started with.

Demetrios [00:12:12]: Well, this is fascinating that you mentioned that just to drop in a line real fast is that it wasn't necessarily because of the pain that the engineers were feeling on the pipelines when they were using scala jobs in whatever, like 2022 or 2023. When you said you replaced it, you ripped it out, or you slowly migrated over to spark jobs. It was more because of the ability to find engineers that understood how this worked and were proficient in it. And then also you had a little bit of fear. I can imagine if Twitter wasn't keeping tabs on this project anymore and there wasn't really any support behind it. That's scary.

Aayush Mudgal [00:12:57]: Yeah, I think that's a good point. And I think both of those reasons, and also I think Spark provided a lot more flexibility, scalability, and scalding was not providing that over time. So I think things that worked in 2018 might not work in 2024 depending on progress that happens in the open source community specifically. And I think that fuels and then migrations are hard. So you need to make a choice at some point, like whether it's worth to migrate or not.

Demetrios [00:13:28]: Yeah, the ROI of the migration too. That's huge piece.

Aayush Mudgal [00:13:32]: Sometimes we move from Tensorflow one to Pytorch. Like we just changed the complete language in general, because I think we thought, and I think it's paying off pretty well today that we made that decision, but it's always hard to make those kind of decisions as such.

Demetrios [00:13:47]: Yeah. In the moment, you're not quite sure. You're like, I think this is the right move, but we'll find out in five years.

Aayush Mudgal [00:13:54]: Yeah. But I think I would say the key is to keep innovating and keep checking what's working well in the industry or what's working well in the research community. And I think having systems that can migrate faster to new technologies as needed, I think that's the key to keep pitrating faster. At some point in 2018, we used to have totally, like, a hybrid stack of machine learning. We used to train models. Deep learning was picking up, for sure in recommendation systems, but at that scale, the way we were doing was we don't used to do deep learning. Deep learning started in 2018, moving towards deep learning, but we used to have an XZ boost based gbdts, totally different language, trying to embed it into tensorflow based logistic tension. Totally different.

Aayush Mudgal [00:14:47]: And then embedding that again into a c plus plus library that we maintained for serving, which is kind of like brittle systems. But that was what we had, I would say, at least till 2021. Somewhere in 2022, slowly, systems started moving more towards more principled approaches of doing machine learning.

Demetrios [00:15:07]: Yeah, well, sorry, I totally took you on a little bit of a sidebar on these pipelines, but get back to this conversion optimization. You were going for step three, right? To actually get people to buy things when they clicked through an ad. So you did step one, which was just showing people ads. You had step two also in there by the time you showed up in 2018, which was getting the right ads in front of people, and then it was your job to figure out how to get people to actually buy things once they clicked on the ads or get the ads to people who were in a buying mood.

Aayush Mudgal [00:15:43]: I could say, yeah, I would say the latters like how to identify people who are in a buying state and also which buying state. Like, what particular ad would they buy in general, we have 10,000 of ads. You can show any, but which one would be the best candidate? And that improves. Finding that best candidate is important because it helps Pinterest to not show irrelevant ads. It helps advertisers to get value by showing the right set of ads to the right set of people. I think all three needs to be balanced in some sense.

Demetrios [00:16:18]: So how'd you go about doing this? What was your idea when you came through you must have done something.

Aayush Mudgal [00:16:25]: So I think if you look back, I think all of these problems like driving clicks, like driving convergence are very correlated to each other. In general. If you just look from a machine learning perspective it's just about, okay, this is your training data set. If you show x to the user, do they buy or not? It's like a very simple, like a binary classification problem in terms of a machine learning model, it's not very complicated. But then if you look at, you need to change your pipelines to get this data. Like before when we are getting click data sets, everything is happening on site at Pinterest and all of those kind of logs are owned by governed by Pinterest architecture. But for conversions, this is not directly. You need to have an integration with your third party providers.

Aayush Mudgal [00:17:13]: How you get that data, how do you transform that data, identify and then pass it for trainings, you had more complex setups in your pipeline, very similar to etls that you have, but you'd have an additional pipeline then making sure that this pipeline can connect to your model training. But model training is just one piece of it. But how do you use that predictions of convergence? When you're deciding on ads, then the other components that you need to build, then train your sales team to sell this product, then there would be a lot of bugs. So you need a lot of visibility over it, like what's going wrong so that you can figure it out. And then it needs to be performant like you cannot have because you're now comparing this product to a product which is driving more clicks and you need to be better than that for sure at this point. It needs to come with some level of personalization. It can't just come with some random previous priors or something because that product would not work. So it has some baselines that you need to at least beat in general.

Demetrios [00:18:15]: Well, it's in the name even performance marketing, right? It has to be performant. And the other piece that I'm thinking about is you had to do this real time, right? There wasn't like, yeah, break that down.

Aayush Mudgal [00:18:31]: So even in 18, when I joined in 18, I think ads, one thing that keeps ads different than non ads is it needs to be real time. But also some of those signals need to be computed like very real time. Something like because ads govern how much you are spending, right? And then based on your predictions, you decide how much you want to charge your advertisers and you don't want to overshoot their budgets as such. Since their budgets are fixed. Like they would say, okay, only $100. They won't give you more than $100. You need to make sure this calculation is real time overspending or underspending, for sure. If they have $100, you want to optimize for $100 that you want to spend.

Aayush Mudgal [00:19:12]: And then given the scale, Pinterest was still pretty big. All of these predictions, all of these models were making predictions in real time. And having a lower latency tool, like having, I think, something around like 300 to 400 milliseconds in that range, you need to maintain that latency. So I think the modeling architectures were not that complex at that time. Coming to 2024, I think we started to hit the latency budgets. Now, in general, models are becoming complex, and also we have this two tiered kind of structures. You don't score all the ads. It goes through targeting, retrieval, and then ranking.

Aayush Mudgal [00:19:51]: So that controls the latency tool.

Demetrios [00:19:54]: Today it's 2024, right? You set this up in 2018. What has changed? What now are you looking at? And you mentioned how you've migrated to the pipelines, but what other things have evolved since.

Aayush Mudgal [00:20:11]: Yeah, I think one thing in 2018, I think machine learning was not that mature like you would probably plug in. There's a lot of plugging that was happening in the system. Many different heterogeneous systems coming together in 2024. I think that homogeneousness is coming in. Like the entire pipeline is being trying to return in the same framework so that you can control more things. And one thing was in 2018 that is much different than what we are doing today is like data preparation. There are too many steps into preparing your data before it can be trained on a model. So these steps are getting, we want to reduce these steps, like intermediate steps, and make sure that it's like one big data set, one model training, it's not like 515 different intermediary tables, and monitoring them becomes harder.

Aayush Mudgal [00:21:01]: So that's a big shift. The other shift is, I would say, in 2018, since we are building up systems like traditional systems, monitoring and visibility was missing in most of those systems, it was working, and most of the time it would work pretty well. But developing that kind of a monitoring to make sure that everything is right, from your data preparation to your model health and to your end to end pipeline health, I think those are things that are mostly taken secondary as you're building new products. Thirdly, I would say features like, features like earlier used to be very nested thrift objects. Everyone would have flexibility to write something. They would write their own logic. And all teams would do it in silos. They would write their own logics to do whatever they wanted to do and then translating.

Aayush Mudgal [00:21:51]: And we had lot of different components. So you need to translate from one language to another and all of those kind of things. So that is becoming standardized. And I think Pinterest has a blog if people want to read like MLN, but I think standardizing those processes, making feature engineering more, I would say more standardized and shareable across different use cases. And I think that is fueling more faster iterations and making sure that you can iterate, but also share knowledge across all the use cases at Pinterest.

Demetrios [00:22:24]: And when you talk about monitoring, are you talking about monitoring on basically every level, like from the data flow level to the systems level to the model output level?

Aayush Mudgal [00:22:36]: Yeah. Today we are monitoring mostly everything in general, from both the offline pipelines, where our data is being prepared, to our online pipelines, the way we are serving. And for every model, we monitor their predictions, like whether they fall within a specific range over a time frame, in general, like last week or a day over day kind of admissions, these are a lot of metrics. You need to make sure then how you set up alerts, but monitoring all of those things, monitoring your features, how they are logged and used in your training pipelines, versus also how they are being served. And all of those things is being continuously monitored and also alerted to make sure we can catch things earlier on. And we have more automated pipelines. If something is off, I think things would not get promoted and they would be stopped from being used. I think all of those monitoring is in place today to prevent many incidents that we used to have and have back earlier in general.

Aayush Mudgal [00:23:39]: So that has reduced in that sense. But also it means now incidents are more complex than what it used to be. There could still be incidents, but I think they have become more complex.

Demetrios [00:23:51]: Yeah. So we were talking about this before we hit record, and it's interesting when it comes to Pinterest, because even in 2018, when you joined, you had to be thinking about scale from day one. Right. There was already a whole lot of users. And so how did that factor into the decisions that you were making?

Aayush Mudgal [00:24:11]: Yeah, I think scale, I think today Pinterest has about 418,000,000 active users. So I think scale is definitely important. And what it means is you cannot push anything to production without testing it. So definitely that is something that needs to take care. But also we need to make sure our systems are healthy to handle that scale. In general, if you look at also in, let's say, in holiday periods, the scale increases in general. In other part of the years, there might be lower loads on your systems, but how do you manage this load? So I think Pinterest on the server side, we do have auto scalings and all of those impacts so that we can reduce the usage of our serving systems when the scale is low. We can increase it when the scale is high.

Aayush Mudgal [00:24:56]: So those are handling some of those scales. But then also you need to make sure our modeling, like whatever the models we have earlier on, when the number of users were lower, we can probably have simple models. We cannot afford to have more complex models either, because the power that you might get from it might not be equivalent to the revenue that the model generates. So you need to make sure that your complexity is maintained with your business health. It goes with your business or your goals in general. And thirdly, I would say we need to make sure our pipelines over time, right back in there in 2018, our EDL pipelines could finish early because the scale was lower. But over time, your scale has increased, but you want to have the same slas for your model trainings, and they're training every hour. So now you need to invest in optimizing your pipelines much better or much more than doing before.

Aayush Mudgal [00:25:52]: You could do anything, like maybe make mistakes or maybe do slower joists. Your joists are not optimized. But now I think that becomes critical, given the scale, that your pipelines also need to be smarter than what it used to be before.

Demetrios [00:26:06]: So you also went on to do a whole slew of other things, one of which is the video ads ranking. And talk to me how that was different than the conversion optimization, because it feels like there's some similarities there, but there's potentially a lot that was different.

Aayush Mudgal [00:26:26]: I think with conversion ads, I think it was like many of the components were there, because you are dealing with the same assets, like images, in terms of your content understanding. But when you move to videos, the way you understand content changes. In general, it's no longer like a simple image, it's more about series of images. So you need to invest at that. In terms of the model, it probably remains the same. It's now trying to predict instead of a conversion, like whether the user is going to view this product for, let's say, more than 2 seconds or 5 seconds. So the problem from a model standpoint remains the same. But how do you integrate this into your systems? Like, do you have a good understanding of the video, what this video is about? Same features that you would do on images.

Aayush Mudgal [00:27:11]: One simplistic thing to start this product is you can just say that, okay, for this video, you have a representative image that represents this video, and that is what you're going to use. So those kind of like, by just doing that, you can just treat this video as an image and then everything can just flow through as what you used to have before. But to make it performant, you then slowly need to keep adding more kind of features on the video understanding, and can you make those features better, which is different than images? So then you need to be aware that this only impacts the video optimization. You cannot reuse these kind of new features across the stack. So that is something then you need to balance between how much is the effort to do that versus what's the potential gain that you would get from that.

Demetrios [00:28:02]: I like how you come back to that idea. You said it before in the last question, and now you're saying it again. And I want to highlight it because I feel like it's very important that trade off of the effort to basically, is the juice worth the squeeze? Is it worth it for me to put all this time, energy and effort into this in order to get how much of a payoff? And I guess my big question is, how can you quantify that payoff before you've actually done the effort and know about it?

Aayush Mudgal [00:28:40]: I think there are two things I would say. One thing is, is the systems ready to do that? Do you have systems in place where you can do that easily versus it's time that you need to optimize your systems to do that? For example, if I want to add video features, if the stack is too complicated, let's say back in 2018, the way we used to do that was we used to have a separate pipeline for creating our training data sets. And that was based on, if you show an ad to the user, we'll log all the features that we used to have at that particular instance, because everything was real time and we didn't have a notion of, let's say, feature back filling, or we don't have a notion to add features historically back into your data set because of that. If you need to train models like models for training on, let's say, six months of data, or like three months of data, you need to wait for three months before you added a feature into the system. Now that delays you by three months. So at that point, I think we started investing in improving our paxel capacities. In general, we can do feature iterations faster, because if you thought, this feature is going to work and it doesn't work. You wasted three months thinking that it would work, and then it just goes down everything.

Aayush Mudgal [00:29:51]: So if systems are not ready, I think that's one factor. How much does it take to. Because you can think about new things, but if the systems are not ready, you cannot go about doing that, and that reduces the return that you would get. It's three months delayed. Other than that, I think it's mostly intuitions at time. If you do things simple, there would be lower risk in general if you start moving with. Because I think there are a ton of research that might go around and you might think about good, complex ideas that might. Okay, this sounds fancy, but then I think the payoffs, if you haven't done the simple things is like lower in that sense, because it's more complex to build those complex, fancy things out in general, because there are more chances of having bugs in the system.

Aayush Mudgal [00:30:36]: So having moving from a simplistic thing to more complex, I think that reduces lot of the risk. And also the product can keep growing iteratively over that.

Demetrios [00:30:49]: Okay, so if I'm understanding it correctly, it is figuring out what the easiest and most simple way to do something is until you bump up against some kind of a bottleneck or some pain that you realize, oh, this is really causing us problems. Let's now figure out how we can alleviate that pain, that specific pain, whether it is the feature engineering or waiting for a model to train three months later. And that must have been just painful, trying to wait three months to know, is this feature actually useful? And then you don't want to do anything else special because you want to have, like, okay, I updated this feature, and so let's see if that actually works. Then potentially you're constrained for like three months while it's training. And so you're working with that, but also you're waiting. It almost feels like you're knowing that if we wait and see, technology in a way is going to kind of catch up with us. So the longer that we can abstain from getting things too complex, we can leverage time to be on our side, so that when we do go complex, we have better tools available.

Aayush Mudgal [00:32:06]: Yeah, I think that's a better way to put it. Thanks for doing that. But I think having places where you can evaluate moving to a better infrastructure or better tooling, I think the tooling is the key in general, like building that tooling, because when you move fast, you might not have the tooling and that fights you back later.

Demetrios [00:32:28]: In general, yeah, because you've made those decisions and they're potentially, I'm not going to say irreversible, but it's a lot harder to rip out the technology than it is to just wait and choose a different one in a month or two.

Aayush Mudgal [00:32:45]: Yeah. And one example I can share. We were at this hybrid system where we are serving in c plus plus because that system touches users directly. It's in your critical path of serving. Moving that system was pretty hard in general. Like many teams, they had to spend multiple quarters because you need to move and then you need to move by making sure you're still being improving your business metrics. It's not that you stop all the work. It needs to move in parallel.

Aayush Mudgal [00:33:15]: Like it's something like you want to change wings on your plane, but the plane needs to keep flying. Like you need to change that in the flights. I think that is something. Having the right tooling and having right prioritization helps at that point, but eventually you need to. The thing is, it took, I would say at least two years somewhere there to move from that kind of a hybrid system, but it's paying off now, today. So having that future mindset to know, okay, this definitely is a bottleneck and you can't go further in this direction without changing it so that you can be better in the future, I think that kind of calculation is important at some point. I know there are still companies who have such hybrid system. They're like, okay, we want to move, but it's harder to move and they don't make the switch.

Aayush Mudgal [00:34:04]: But I think it just slows you down in general for future.

Demetrios [00:34:08]: And do you have any advice for those migrations and how to do them gracefully?

Aayush Mudgal [00:34:13]: So I think while it's commitment for sure need to be committed, things might not look very beautiful in the beginning. So having that commitment from top down that we are committed to do this. The other thing is building the right tooling. It's not about, okay, just changing the frameworks, but if you have the right tooling to make sure that between a and b the new systems, everything is probably similar. I think that is something investing in the tooling is important to make sure that you can cover bugs earlier on than later. Because when you're doing these kind of system migrations specifically for these large scale models, even if one feature is not translated fine, you would have systems would just break and you would not know what's going on. But can you go back in your systems and replay what happened in two of these systems and see where they diverged. And I think having that kind of understanding helps to migrate faster.

Aayush Mudgal [00:35:10]: And also sometimes getting all teams aligned is better. Getting one or like major teams aligned is better because you don't want to have this migration as a chasing target always, because if you keep iterating and improving the older system, you need to make sure that at least there's some consensus, like when there's priority shift that needs to happen just to the migration at that point, because otherwise the post will keep shifting and that makes it very hard for everyone to move out. But you want that kind of period to be very small in general. You don't want it to be like four months, that nothing goes in, you're just migrating, but you want to reduce the time to that kind of, kind of system migration.

Demetrios [00:35:51]: That's right, yeah, because it feels like the longer that you don't finish, the more chances that you have for second guessing yourself or trying to implement something else new and saying, oh, well, you know, actually this might be better, but if you have it as a short window of time, then that shorter window of time is not going to allow you to add all these curveballs into.

Aayush Mudgal [00:36:15]: The project totally, because I think the migration is always on the infrastructure side. You don't see their benefits when you're doing that, but it comes out eventually. So I think it needs bit of support to make it happen generally.

Demetrios [00:36:30]: The other thing that's coming through my mind really is going back to that idea that you were talking about in the beginning when you said the technology that we had chosen was no longer being supported. So it was some open source technology that didn't have anybody backing it and let alone a company. Right. And then it was becoming harder to find people that understood this technology and could code in whatever this language or could use these frameworks. And so as you're thinking about migrating and making a new design decision, how are you making sure that that very same thing will not just happen again in four years? Or do you take that as, you know what, in four years we'll probably be looking at this again and we need to do it again. But for now, the best thing that we have is XYZ.

Aayush Mudgal [00:37:23]: Yeah, I would say it's always a continual looking forward. Like you would say, okay, for now this is the best thing because I think specifically in machine learning and also other software engineering, I think things are moving pretty fast. Machine learning is moving pretty fast. Even right now I think things are moving pretty like back in 18, Tensorflow was the go to production system. Like everyone in the industry who are productionizing systems, okay, Tensorflow is the best system. But as you were in Tensorflow one, Tensorflow two came in. That was a total surprise, because you can't go from one to two directly. It's not like you can have that version upgrade seamlessly.

Aayush Mudgal [00:37:59]: And that was a point when printers decided to do, like, tensorflow two versus Pytorch. Like, what makes sense? And then what we did around that time was something we had a bunch of engineers working on translating, like, a small piece of modeling code from one to two. A bunch of engineers translate, and those same engineers translating it also into Pytorch to see what their experience was. And after all of those considerations, trick developer feedbacks and stuff, we decided, okay, Pytorch makes sense. And that was a totally, like, we moved away from Tensorflow and moved to Pytorch. And I think it's paving off pretty well today. I think we see that industry, or like cadmium, definitely is more Pytorch friendly today. So keep evaluating.

Aayush Mudgal [00:38:42]: I would say in that sense, over a longer time period.

Demetrios [00:38:49]: Was there ever the idea of trying to support both of them? Did that conversation come up?

Aayush Mudgal [00:38:58]: So I think the thing is, from a machine learning platform perspective, supporting less languages is better for them, because then they can focus on particular language more. But I think it depends. There were few use cases who were like, no, everything is in Tensorflow. We want to do Tensorflow. But given that Tensorflow two was totally different than Tensorflow one, you need to move to toe. So I think that made it easier. But I think over time, I would say depending on business use cases that exist, they might have a language decision. But luckily for Pytorch at least, I think things were favorable, that most teams agreed.

Aayush Mudgal [00:39:40]: And once major teams in the ecosystem agree, I think the other teams also need to eventually agree, because I think that becomes the more iterated, because at day zero, they might look same. But as people start to work on those systems, you would have more functionalities, you would have more things build up, which you can reuse for ease and cheaper if you're stuck with the framework that everyone is using.

Demetrios [00:40:05]: Man, Tensorflow 2.0, the beginning of the end. That is so wild to think about. Think back on the history of how that played out.

Aayush Mudgal [00:40:16]: Yeah, I think depends. I think some companies are still using it. I don't think it's. But at least at Pinterest for today, I think it's not.

Demetrios [00:40:25]: Yeah, that's wild. So now talk me through what things you would have done differently, knowing what you know now, and I don't want to say like, oh, yeah, live your life with a little bit of a regret, but for the rest of us that potentially are building systems and thinking through building these systems, what are some things that you know now that you would have done differently?

Aayush Mudgal [00:40:54]: Yeah, I think one thing I would say, like, when we were in 2018, I think improving the way you train your models, and also I would say in 2018, training models were not very easy. You need to copy some code, check in that code, duplicate, copy some code, check in the code, change some parameters, and that is probably just slows down. I know many companies have a similar thing. Even today, you cannot easily train multiple different orders, like having that flexibility. Like Pinterest had an internal system called as easyflow that was developed. I think that was pretty useful. So that now if you want to change things, can you do like a no code version of it and you can just change parameters, change things, don't need to wait for check in. I think that was something, if we had earlier on, could have been useful.

Aayush Mudgal [00:41:44]: The other thing I would say, I think it's easy to say that, but I think the thing is, can we remove the hybrid systems that we had, like, training in a different language, serving in a different language? But I think it also depends because I think that helped us to go to market faster because that was what existed in sixteen s and fifteen s to get to the market because other systems are not mature. But if Pytorch was more mature in 1516, I think we could have saved a lot of time in those migrations. It's easy to say now, but I think practically, I think it might not have been the case, because I think those things are also developing. I would say so in that sense. Yeah, I think C plus plus is one language. I think machine learning engineers, you don't have that many machine learning engineers who understand c plus plus still. I would say so. I think that's where I think.

Aayush Mudgal [00:42:39]: I think in serving systems, yes, there are many. You can find people, but not everyone who is in academia doing machine learning would be comfortable doing C plus plus. I think that's something. I don't know.

Demetrios [00:42:51]: I was expecting you to say C plus plus is a language that I never want to touch again in my life.

Aayush Mudgal [00:42:57]: Yeah. But I think for low latency serving, I think you still need people to know that to pull things off in general. But I think that something is reducing over time.

Demetrios [00:43:08]: Yeah. So then, now that we are in 2024, and as you look ahead on all these different projects or whatever you're working on these days. What are some things that you are interested in implementing in your stack or not even implementing in your stack? Just things that are getting you excited.

Aayush Mudgal [00:43:33]: So I think the thing I would say, when we moved from gbdts to, let's say, neural networks, I think we knew, okay, the first migration we did from gbdts to neural network was kind of neutralish. Everyone would think that, okay, bringing neural networks, you'll get magic out of it. It was totally like neutralish. It was kind of building the backbone of what we are doing. But I would say advancements that are happening in language processing, where you have a lot of sequence of words coming in. I think the same kind of advancements is happening in recommendation systems, where you have, instead of words, you have interactions that the user is doing. And those sequence modeling techniques are becoming complex. And that's where I feel much excited.

Aayush Mudgal [00:44:14]: Even things like NLP is kind of influencing recommendation systems, at least in the back end, I would say things are coming together. In general, those transformers are in recommendation systems. Transformers are in your computer vision system. So they are not that separated out industries. The use cases are different, but I think those techniques are getting adopted widely, even in recommendation systems. So that is exciting to see how things are coming closer in that sense, and how that research is useful even for our models. The other thing is, I mentioned before that we used to have many intermediate systems, which slows you down, let's say, if you want to train a new sampling strategy or do different sampling, it was harder. But I think we are moving into integrating most of our systems with ray to do in trainer samplings and moving things more flexibly into your mobile training flows.

Aayush Mudgal [00:45:13]: I think that is pretty exciting, because now you can do much more things at a faster velocity and not being bottlenecked by your frameworks.

Demetrios [00:45:23]: Talk to me about this idea of transformers, basically for the ad rankings, because I think I understand it, but I want to hear you dive into that a little bit more.

Aayush Mudgal [00:45:39]: Yeah, so I think there are two ways you can think about transformers in general. So how these ranking models work is you have set of features that represent your user, some set of features that represent your content or pin at Pinterest, and some set of features that represent their kind of interactions that you are having with them. So typically, these recommendation models would have a way to learn these interactions, learn these feature interactions among themselves as part of the model architecture. And there are things like crossings or deep cross networks and other kind of research there. So one way you can also think as transformer is just kind of learning how these features interact with each other. There's no positional kind of system here, but just a transformer encoder can be used as a feature interaction layer. Trying to learn whether feature A is related to feature b and by what magnitude. So that's one way.

Aayush Mudgal [00:46:33]: The other way transformers are being used is users, when they come to the platform, they can interact with lot of pins. They might click on some pins, they might hide some pins, save some pins. So this is like a sequence of interactions. You can think about user journey. And if you look at natural language processing, you can think this is like a sentence of words. And you can translate into same way to say that. What is the next probable quote that would come in this sentence and say that?

Demetrios [00:47:00]: So wild.

Aayush Mudgal [00:47:01]: If this is the sequence, what is the next action? Or what's the next content that you might interact with? Just model it. Very similar to what is in the text processing domain. And that's what recommendation system is like. You have seen everything. What is the next you want to see? And I think transformer and lp comes very close to that. And transformer is building block to make that kind of prediction or setup for training. And that's where I think most of the wins are coming from today. To understand the user better and make it more personalized.

Demetrios [00:47:32]: Yeah. The sequence and mapping out a sequence. And then trying to decide what the next token is like you get with Chat GPT or what the next token is. As far as what ad you should show. That is incredible. And that's what I thought I was understanding. And I know I've seen a few papers on it. I just wanted to make sure that I was getting it.

Aayush Mudgal [00:47:55]: Yeah, things are coming. I think that's good thing about machine learning. You take motivation from somewhere, you combine it together. And it mostly works in your domain too. It's probably just getting motivations and trying those things out.

Demetrios [00:48:10]: So the one thing that I have to ask about is because recommendation systems, and you were saying it earlier, there is a very stringent need for things to be fast. Transformers aren't necessarily known for being fast. How do you look at those two things and how do they coexist?

Aayush Mudgal [00:48:32]: Yeah. So I think as we moved to transformers, I think we started hitting our latency budgets in general, because they are not that fast. One thing is, before transformers, we were just doing cpu servings. We had to move to GPU serving to unlock the similar kind of latencies. One thing is it comes at higher cost. But if you're personalizing better, you can recover some of that cost. That's one factor. But I think at some point you will also hit the ceiling where you cannot have the same kind of personalization unblocked that you might get by increasing infrastructure.

Aayush Mudgal [00:49:07]: So I think there, I think things like quantization for serving, you're reducing the model complexity in terms of sequencing. Depending on how much benefit you get, you can control the length of the sequence. Then you can optimize and make your sequence, because sequence length contributes directly to how much cost you would have. Because the larger it is, the higher the cost it would be. Then you can make the sequence more smarter by not having many similar content in that sequence. You can properly represent many similar content with just like a representative content from that. So you can reduce your sequence and be more smarter with your sequence to reduce that length in general. Thirdly, you can start to put some of this processing into your offline systems, like reducing the amount of things that you're computing online, because some of the longer sequences beyond, let's say, one month or somewhere there, can also be computed offline and text into your system, and you can just reference that on the fly when you're serving the module.

Aayush Mudgal [00:50:06]: So some of those kind of more complexities you can address and add into your system in that sense.

Demetrios [00:50:12]: Again, going back to your idea earlier, because it sounds like you are definitely not just hitting an OpenAI API. You've got a whole lot of stuff going on behind the scenes with your own models that have been trained on these sequences, and you're doing some pretty nice optimization to make sure that they're not too expensive and they're not too slow.

Aayush Mudgal [00:50:37]: Yeah, that's totally true.

Demetrios [00:50:39]: So going back to the idea we were talking about before on the complexity side and staying simple and then getting complex, I'm thinking like, it feels like it's not possible to do this by just going out and using one of these LLM provider APIs because of the latency requirements that you have.

Aayush Mudgal [00:51:01]: Yeah, I think it's not possible. I think for that purpose, Pintus definitely uses, let's say, pytorch for our training, but our serving systems are built in house to do all these kind of optimizations that you might not get off the shelf. So our serving system definitely is much more complex in that sense to handle those things. And I think it takes a lot of time to build those systems. I would say the time to build might be reduced right now because of all the advancements that are there. But for Pinterest we move to, let's say, deep learning based systems in 2020, then we spent a lot of time in making our systems monitoring better. Around that time, I think sequences started coming somewhere in 2022 to 2023. So even when you are making these sequences, and when we added the first sequence, it was not transformer based sequence.

Aayush Mudgal [00:51:53]: It was very simple, like attention based models, like no transformer in there. And then we started slowly, slowly making sure our systems can catch up with the kind of modeling techniques we want to do. That's where I think right now, I think we are at a stage that we can invest in more complex than what we were doing last year.

Demetrios [00:52:13]: Incredible, dude. Well, this has been fascinating, getting to pick your brain about all this. I really enjoy how you think of things and how open you are to all the learnings you've had over the years. It's super cool to see. And thanks for coming on here.

Aayush Mudgal [00:52:27]: Yeah, thanks for inviting and also for others for listening so far.

+ Read More

Watch More

The Evolution of ML Infrastructure

Posted Jan 10, 2023 | Views 1.6K

# ML Infrastructure

# ML Adoption

# Landscape of ML

# ML Investing

# Bessemer Venture Partners

# bvp.com

RecSys at Spotify

Posted May 14, 2024 | Views 6.5K

# LLMs

# Recommender Systems

# Spotify

MLOps at the Crossroads

Posted Jan 16, 2024 | Views 5.9K

# MLOps

# Kentauros AI

# LLMLOps

# AIMedic