MLOps Community
+00:00 GMT
Sign in or Join the community to continue

The AI Dream Team: Strategies for ML Recruitment and Growth

Posted Oct 08, 2024 | Views 769
# Recruitment
# Growth
# Picnic
Share
speakers
avatar
Jelmer Borst
Analytics & Machine Learning Domain Lead @ Picnic

Jelmer leads the analytics & machine learning teams at Picnic, an app-only online groceries company based in The Netherlands. Whilst his background is in aerospace engineering, he was looking for something faster-paced and found that at Picnic. He loves the intersection of solving business challenges using technology & data. In his free time loves to cook food and tinker with the latest AI developments.

+ Read More
avatar
Daniela Solis
Machine Learning Product Owner @ Picnic Technologies

As a Machine Learning Lead at Picnic, I am responsible for ensuring the success of end-to-end Machine Learning systems. My work involves bringing models into production across various domains, including Personalization, Fraud Detection, and Natural Language Processing.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Like many companies, Picnic started out with a small, central data science team. As this grows larger, focussing on more complex models, it questions the skillsets & organisational set up.

Use an ML platform, or build ourselves? A central team vs. embedded? Hire data scientists vs. ML engineers vs. MLOps engineers How to foster a team culture of end-to-end ownership How to balance short-term & long-term impact

+ Read More
TRANSCRIPT

Daniela Solis [00:00:00]: So I'm Daniela Solis. I'm the product owner of a machine learning team here at Picnic and I like flat whites, I like strong coffee but always with milk.

Jelmer Borst [00:00:10]: So my name is Jelmer Borst. I'm a dutch guy. I am the domain lead for everything analytics and machine learning. So from data engineering to real time data to machine learning here at picnic, I like my espressos black double espressos. I think I'm at eight espressos today, but I can keep going.

Demetrios [00:00:37]: Folks, we are back for another mlops community podcast. I am your host Demetrios and talking to Daniela and Jelmer today. I am so glad that I procrastinated on recording this intro until I was in Greece for the very fact that I get to tell a joke right now about some of the other people that I see in Greece. And secondly, I got to listen to the podcast that I was not a part of, talking about a global feature store at Delivery Hero, my man Stefan recorded that and if you have not listened to it, it is incredible. I am so thankful that Stefan could record something like that. And you will hear at delivery hero. They went through different iterations of do they centralize or decentralize their teams? What does the structure look like? Do they create a centralized platform that every team and use case uses, or do they decentralize? And ultimately you'll see that one of the TLDRs is they decentralized. After trying to be centralized for a little bit, this podcast right now with Picnic and Daniela and Gelmer talks all about how they did the opposite approach, and for them they were decentralized and then they went and they centralized and all of the benefits that they got from that.

Demetrios [00:02:10]: So I appreciate both of these perspectives. I also want to draw your attention to the fact that picnic right now is 18 data scientists that are using the centralized platform. Deliveryhero has a lot more than that and they have a lot more use cases. And so it just brings me back to you really got to work with what is best for you, your use case, and recognize what the tradeoffs are. We did a great job in this podcast about talking through how they ultimately arrived at the decision of centralizing their ML platform. And now time for the observation that I've seen in Greece. While I am here, there are some people that tan wonderfully, and what is so surprising to me is that there are some people that are very white and they tan incredibly. And then there are the British, and for whatever reason, they did not get that gene.

Demetrios [00:03:27]: I'm just gonna go out and say it. Because I have seen so many red people here in Greece, I feel bad for them sometimes. It's like they look like a lobster. Literally look like a lobster. And so, uh, sunscreen, huh? That's what we gotta preach, the good old sunscreen. And as always, if you like this episode, please share it with one friend.

Demetrios [00:03:57]: Let's jump right into it. Let's start with a use case, or use cases. And where you were, what was the snapshot twelve months ago versus where you wanted to be. How has that evolution happened?

Jelmer Borst [00:04:13]: So let's first talk maybe about where we're coming from. Picnic. So we're an online supermarket and what do we do? We deliver groceries to people to fill up their fridge for a low price. It's super low margins. And one of the main things is to get really accurate results in how many items do we need to buy. Some of the things we can very accurately do because customers buy bread and then we bake this bread overnight and we ship these bread to you in the morning. So that is quite easy. But for quite a few other items, we need to predict is a little bit ahead.

Jelmer Borst [00:04:55]: These are the first main use cases that you start out as a company in simple forecasting type of problems, which is kind of a logical use case to start with. But then as you as a company grow, there's many more use cases that start appearing. So around deliveries, around how many trips do we need to drive? How many seconds do we stop in front of your door? How long does it take us to park? But then that is all still very focused on just the operations, but of course towards your customers. We also need to, hey, we also want to improve. And then you start to personalize things more and more. So you want to have better search results, you want to recommend items to people, you want to make sure that they're not forgetting anything to buy. You want to make sure that we also are getting more and more into the space where we are offering meals and recipes towards our customers. So we want to help them recommend so that you don't eat the same things every single week.

Jelmer Borst [00:05:52]: Because research shows that people generally cook like seven recipes, just having unique recipes on a yearly basis. People are quite boring in what they cook, but we can help with that. But you want to sort of help them explore whilst at the same time still recommend sort of relevant stuff. If you eat Italian very often, let's maybe not recommend you thai food because maybe that's too much out there, but at the same time you still want to help them explore new areas. And I think focusing more on all those different use cases that sort of popping up, that becomes quite tricky to figure out. How do we make sure that we're running this on a large scale in production? Well, at the same time, hey, how do we help sort of easily build and iterate to set this up whilst at the same time maintain a good culture of learning from each other, a good culture of deploying, let's say, best practices from an engineering point of view as well, and then also from an organization, a leadership point of view, who should be that? I think what we see around us is where there's a big push generally of trying to fit everything is like multidisciplinary teams. And that seems good one hand because, hey, we have one team altogether, focusing on one single problem at the same time. That means if you have a junior data scientist in a single team with no data scientists around him or her, how is this person going to learn? And how do you still connect with the rest of your peers? And how do you make sure that you're actually building the best models out there and get challenge from others? Hence, we went on to a little bit of a journey of how we best organize.

Jelmer Borst [00:07:45]: And I think, in the end, still go for quite a sort of centralized approach for this, which is maybe not so common in many companies, but there's definitely quite a few out there. I think Doordash and Netflix come to mind. I think especially the. Well, both have actually also spoken about this quite clearly on how this helps them and how this also helps them set a very high bar for your people as well, because you then have people who are doing the recruitment and who's doing, basically, we're running these teams with a lot of data science experience.

Demetrios [00:08:23]: So if I'm understanding it correctly, you had one narrow use case. You proved value with that use case very quickly, and then it started to balloon. And what you recognized was that, all right, this forecasting model is really cool. We can also try and optimize routes. We can try and recommend things to folks. We can try and see what other ways we can plug in the MLK into our systems. But instead of opting for each business unit having its own ML team, you said, how do we centralize this so that we can get the most out of it to make sure that all of the learnings across the entire business are able to be reincorporated into the business as quickly as possible?

Jelmer Borst [00:09:17]: No, 100%. And I think these first use cases started already very quickly after we launched. We launched in 2015. So that's already almost ten years ago. So initially, we started actually very early of utilizing ML. And of course, as many organizations, you start with a company, start with heuristics, use ML more as an additional sort of insight, an additional way of, hey, does it actually beat maybe the other heuristics? But we have more and more of these things that actually are just, that's operationally critical, outperforming that sort of heuristics and humans are doing. And given that you're seeing the value for that, you want to do this almost everywhere. So we get way more requests to build all kinds of ML models where we have way more models in production than we have people building these models.

Jelmer Borst [00:10:04]: And then how do you keep, well, first of all, maintenance law, but how do you also keep outperforming that? Because each of those models we know we can do still so much better.

Daniela Solis [00:10:13]: I think it also happened because when picnic started, I think we went against what most companies were doing. We were in the area where, like, in the time where everyone would have a data lake, and then we will figure it out. And I think picnic went against that, and we started building the data warehouse, and we had so clean data that I remember when I first joined as a machine learning engineer, it was one of the things where I decided, I really want to work this company. Actually, when they asked me, like, do you have any questions? I was really asking, is it really true? Is it really true that I have clean data? Because to be honest, I was working in consultancy before, and I've seen a whole lot of not clean data. That's how clean can your data be, right? And I really think that that really changed things and how we grew, because we had this say where data scientists take 80% of your time reclaiming data, and then only 20% is building the model. And in our case, that was not true. The moment that we wanted to build something, we just throw some SQL queries. And we were up to go with experimentation and modeling and building new models.

Daniela Solis [00:11:21]: We were a team of maybe five data scientists. We grew up to ten. But if you see the amount of models we were building, we were already in 20 models and ten data scientists. Then we scale of quite quickly, and we were able to do a lot with little people. But then, of course, you see the value of it, and you want to grow even further. And that's when we started going to this rabbit hole of how do we scale it, but at the same time, maintain the high quality and really finding the best possible way of building machine learning models?

Demetrios [00:11:52]: No, I imagine there's a lot of complexity that comes with scale of models, not just on the organizational side or the platform side. But I've heard about, you have different companies that have deployed models and they've gotten to a point where they've deployed so many models, they don't know which models are actually affecting anything. And it's like there's zombie models out there and people are afraid to take them off of the platform because they don't know if it's really affecting ROI or not. And so they don't want to be the one that takes a model off. And then all of a sudden you see revenue take a hit. And I can imagine there's this question that you all think about quite a bit, which is how do we tier the models to know which ones are in experimentation phase? And we don't really have a clear sense of the ROI on these and which ones are very high, high priority models that we want to make sure are always performing their best, which is.

Jelmer Borst [00:13:04]: On one hand on the performance side, and then also actually on how do you actually use them? Right. And I think there it's really important as an engineer to really have that end to end view. So not only your boundary doesn't stop when the model is live and running predictions, the boundary stop. Are you delivering the business value that you're aiming to do? That's why you have the model in the first place. One of the things that relates to that is what if the model is unable to predict? Or what if it's down, or what if it's giving you wrong predictions? Or what are additional safeguards you can have in place? What are potential fallbacks you can use? Say it has issues, let's say operationally. What other models would you then use? Or is there sort of a heuristic you can fall back to? Or what will you show your user otherwise? And I think thinking about these questions even up front, before you even maybe deploy the first model, helps a lot in defining automatically. Can I even take it offline later? Or what is sort of the phase of this model?

Demetrios [00:14:07]: Is it operationally critical or is it nothing?

Jelmer Borst [00:14:10]: And that's why I think they're having that connection with the other product teams. Having a connection with the business teams is incredibly important because you need to understand how your model is being used, because otherwise you get in need into this phase that you're just spinning these up, you're producing some results, but nobody really knows what it does, how it does it.

Demetrios [00:14:34]: It's spitting out some results, but does.

Jelmer Borst [00:14:35]: It actually work or not work? And it's super easy to fall into the trap of just to spin up new experiments and spin up new ways and just see what it does. But yeah, I think we tried to.

Demetrios [00:14:52]: Find the right balance between that versus.

Jelmer Borst [00:14:55]: Being careful in how we skill it up. In the beginning, you might very quickly say this seems to work, let you skill it out to everyone, but taking a bit more of a gradual approach of, hey, does it really work on this small segment of users? And does it work not only on the average, the average performance or meaning, but really go into. For which customer segments is this really working? For which customer segments is it not working? Because maybe we should only roll it out for this particular segment and scale it up there, whereas actually for the others really develop and iterate. So we have, I think, sometimes a bit more of a careful approach there to really make sure it's doing what it's supposed to be doing and also.

Daniela Solis [00:15:36]: Having this end to end mindset. Right. We call ourselves full stack. Like we're really thinking of a solution up until the end of maintaining it and the impact that it will bring from the moment that we're starting to design and to do experiments.

Demetrios [00:15:52]: So you mentioned something, Daniela, that I don't have think I've ever heard anyone say before, which is that clean data, and you were almost in awe of the clean data and how you could basically do so much and create so much with it. I would love to know about what you did to make sure that that data was so clean. What does the rest of the platform look like? And that will help us also set the scene for then later, when we talk about the organizational piece, how different teams can pull from the platform, I.

Jelmer Borst [00:16:28]: Feel extremely lucky in that regard. Well, luck would sound as if it was unintentional, but what I think has helped us a lot is that from very early days, we started with a very centralized approach, and at the time, these data lakes were all the hype. But the challenge is that. So we have around, I think, 150 or so analysts, data scientists, Bignick, who are using this data on a daily basis. If you're shifting basically all the cognitive load and the effort of using this data and cleaning this data to everyone, there's a lot of duplicate work. So that's where we started with more of a centralized approach, so that you have a view of what's happening towards your customer, not only what happens, how did the shopper journey in our app went to what happened during the delivery of the groceries, and basically relating all those points to each other. And I think for us, given that we are in a business, so a supermarket business that is known for like super low margins, and also in a quite operationally heavy business as well, the only way how you can make this skill and how you can make this work is to basically get efficiency out of every single small piece. In order to do that, you need to collect data.

Jelmer Borst [00:17:49]: You cannot purely rely on gut feel on what to optimize. You need to use the data on what are the processes, how efficient is certain things, how often does a certain edge case actually happen and relate them to each other, to really optimize across the entire supply chain. And having there a centralized data engineering team that basically pulls data from, I think now 500 or so sources on a day, let's say throughout the day. Cleaning that and exposing that was very painful in the beginning because it's a lot of work and investment that you're doing to get to this particular state. And once you have that, trying to maintain it and update it is quite tricky, because every time your data model actually changes, it's super painful because you need to migrate from one to the other. So it requires to have quite a bit of a future looking view as well. Like how can this business evolve in order to that? These data models actually still support that particular directions. And we don't need to revamp and restructure everything every single time.

Jelmer Borst [00:18:56]: So there also, in terms of hiring, I think we looked very much at people who not only have a very strong hard skills in terms of data engineering, but also I think most of them come from more of a consultancy type of background. So they have a lot of experience in working with different type of companies and different type of segments, and having that approach to connect and talk to many people across the business to really understand not only the current use cases, but think along what could be the future use cases. And that was initially we sort of started with one person who just built on a postgres instance locally, sort of the v one of this, then later going to redshift and now running on Snowflake has worked actually quite well for us and paid off there. It is a forever battle because it is a slower approach, even though it scales better, where people might actually have a new use case they want to launch and get done today. This is always the conversation that you end up having. You might sometimes slow things down to.

Demetrios [00:20:03]: Go faster in the future, and that's always tricky.

Jelmer Borst [00:20:05]: But I think in the end, whilst.

Demetrios [00:20:07]: It has worked quite well, for many of our analysts, I've been in many conversations, as I was having a discussion.

Jelmer Borst [00:20:15]: With somebody, I simply wrote the SQL select statement as we were discussing to get the data to then actually directly answer and solve it straight away. That's, I think, the model that you're in, which is so much better than, let's discuss this, oh, I need a weak time to analyze this data and come back to you. And then what I think maybe we didn't even foresee maybe at the time, but we absolutely realized in the last.

Demetrios [00:20:40]: Couple of years that from an ML point of view, this is amazing, right?

Jelmer Borst [00:20:43]: Because then you don't need to spend all the time to figuring out how this data is structured because you have the data engineering team who have already basically prepared that for you. Of course, you still need to do.

Demetrios [00:20:54]: Your feature exploration in your extraction engineering.

Jelmer Borst [00:20:57]: And try to experiment with all kinds.

Demetrios [00:20:59]: Of different models and see what works well and what doesn't work. But that's already, I think, part of the process of what an ML engineer does anyway.

Demetrios [00:21:07]: But are you just building directly on Snowflake? Is that all of these models are done in Snowflake? Or do you have the data platform and then the ML capabilities and the ML platform built on top of that?

Jelmer Borst [00:21:20]: Go subcli?

Demetrios [00:21:21]: Yes.

Jelmer Borst [00:21:22]: However, there's no spark pipelines or airflows or whatever. You write a SQL care query to.

Demetrios [00:21:31]: Get your data and then you train your model and to do your predictions.

Jelmer Borst [00:21:35]: You call our data warehouse to get.

Demetrios [00:21:38]: Your data to do inference and then you run it. Now, of course, if you run this in production for recommendations to our customers that you need to do on demand, then it's a combination of historic data you can get from there versus more real time and on demand data you get from other sources.

Daniela Solis [00:21:55]: But for example, of course, we also now have a feature store and other components that are there. But I do think that how machine learning has evolved, it's been quite different from other companies because our infrastructure has been quite lightweight and we have been using the same infrastructure that other tech teams use. Then of course, we have been adding components as we need them. But the components on our ML platform has grown with us and with the complexity we have. We have this mindset where we build on top of the solutions that are out there. We have something that is open source, we have something that is used in the rest of our tech company, but we have a lot of support from just having really clean and good tooling that has allowed us to scale up quite easily without a lot of overhead of how to even start building things.

Demetrios [00:22:49]: And so you added the component of a feature store. I imagine you have to have some kind of model registry, especially as your models scale. What other components did you add as time went on?

Daniela Solis [00:23:01]: Yeah, for example, for the model registry we have something really lightweight. We build it on top of an S three Amston, but like really lightweight. We have the feature store, the jobs and the services are using of course kubernetes and all the tech infrastructure that we have here at picnic.

Demetrios [00:23:20]: So we use ML flow for experiment tracking, experiment tracking as well. I think if we look at back end engineering here, picnic primarily is Java and Python. We already have a platform team that really provides a lot of standard tooling out of the box around CI CD, around linting, formatting, all of that. So your Python formatting, your SQL formatting, et cetera, is already taken care of by default. And then given that most of these, all of it actually just runs on top of kubernetes as just a Python service or a Python cron job that allows you to use a lot of these off the shelf components that are already there. That still means there's quite a few ML specific tools that you might still be missing. So for example, even feature stores already five years ago, you already sort of think about hey, we might actually need something like this. At the same time, you want to figure out what is the right solution out there.

Demetrios [00:24:22]: And I think for a long time, every time we evaluate, we're still in this mode where there's not an amazing.

Jelmer Borst [00:24:28]: Solution really out there.

Demetrios [00:24:30]: So we keep rolling a bit our own.

Jelmer Borst [00:24:32]: And yes it's a centralized, easy to.

Demetrios [00:24:35]: Use off the shelf component, but the.

Jelmer Borst [00:24:37]: Moment, which is fine if you only have one or two models that need it, but at some point you get.

Demetrios [00:24:43]: To the point where many models need it and then you start investing in it. For these type of major components we.

Jelmer Borst [00:24:52]: Go a little bit slower maybe, so we kind of let everyone just roll.

Demetrios [00:24:57]: A bit of their own.

Jelmer Borst [00:24:58]: And yes there's going to be duplication, but to generalize too early has also a risk, right? Because then you have generalized very early on and figure out that certain use cases do not even fit the bill. So we'd rather see a bit of duplication happening in a few places.

Demetrios [00:25:14]: And then you definitely see all the different patterns and the different options and struggles in the various places so that you can from there actually figure out a ah, this is actually what we need and how we can really solve this. And then, but then having a few people dedicated on this ML platform to really build and maintain and roll out is quite key for, for this.

Demetrios [00:25:35]: I've heard that explained as the pull method as opposed to the push method. Right. You're getting, instead of you pushing some certain type of technology onto the teams, you're getting pulled in a direction. And at a certain point, ideally, what happens is you have this critical mass that tips over and you realize, okay, enough teams or enough folks are using this or trying to create their own or hacking something together, it shows there's that demand and we have that pool for it 100%.

Demetrios [00:26:11]: And I think one of the key things is to have that discussion, right? So, for example, we have, we have what we call an ML weekly, which is basically where all the ML engineers just come together and just discuss pain points and issues and challenges or maybe something they've learned in a different state. But one of the things that just pops up is that if more and more people start complaining that certain things.

Jelmer Borst [00:26:31]: Do not scale or do not work well or struggle with something, then at.

Demetrios [00:26:34]: Some point, you know, there's value in trying to sort of solve this. So that helps you a lot in.

Jelmer Borst [00:26:39]: Figuring out which direction to go.

Demetrios [00:26:43]: It means adoption becomes very easy as well afterwards because this was the pain point of everyone, and you rather want to meet everyone where they are with the pain points that they are, instead of trying to prematurely optimize for things and then come off with, hey, here's a feature store. And they're like, yeah, why do I need it rather? So wait a little bit longer in that case.

Demetrios [00:27:02]: And they feel the need, they're asking for it. So when it comes to monitoring, you mentioned before that you want to monitor how it's doing on the business side of things. You also want to monitor the actual model, and I imagine you have the DevOps folks that are trying to monitor the system. There's a lot of tasks and functions that are involved in monitoring an ML model. Is there one person that owns all of this different monitoring? How did you set up the team so that you can monitor these things and get that information out there? Is it a little bit more dispersed? What does it look like, and how did you arrive at that decision?

Daniela Solis [00:27:42]: So I think, like, we, we started calling ourselves full stack data scientists, but now we rename ourselves as machine learning engineers. But I think where it comes from is we own the entire solution. So that means we also deploy it and we also monitor it once it's up in production. It is tricky, as you said, because, of course, you need to make sure that the model is performing good, but also that you're actually delivering the business value that you want to bring. I think what has really helped there is to be very like, having a central organization does not mean that you don't have the same goals. So you work really closely with your business teams and you really, what we've seen that really helps a lot is to have these really short feedback loops where every time that you're deploying a new version, there's an analyst on the business side that knows a lot of the domain knowledge from the area where you're working on that is also working towards the same goal and trying to improve and gathering new insights to come up with new features or understand what's going on there. That's one side that I think is really key to make sure that you're building the right things and you're keeping track of them. And on the other hand, more on the tooling side, we build also a tool called model performance monitoring that is actually using DVT to run tests on top of our predictions to sort of like keep track of the performance of our models and make sure that we're not drifting.

Daniela Solis [00:29:12]: So again, it's very lightweight way of building stuff that we just built when we see things that are needed. And we also saw that we needed to keep track of that and the performance of our models also on a more automated way. But I do think that in order to do it properly, you need both things. You need to have this tooling set in place and have the right test on your data, but you also need to make sure that you work together with the business and to really work towards the same goal.

Demetrios [00:29:43]: What do those tests look like?

Daniela Solis [00:29:48]: We make tests against actuals when we have the actuals, but also making sure that our predictions are not too far off from previous predictions, that they're actually there to make sure that the model actually run.

Demetrios [00:30:01]: Yeah, that's the first step. Is it actually predicting anything?

Daniela Solis [00:30:04]: It sounds quite easy, but it has happened in the past that you don't realize and then you don't have the predictions up right.

Demetrios [00:30:11]: Yeah.

Daniela Solis [00:30:12]: And I think what is really nice is that because we collaborate so closely with business analysts, then also the knowledge that they bring or the kind of things that they like to start looking at, we can also automate them and write tests with them, together with them to see what kind of things we need to monitor.

Jelmer Borst [00:30:32]: For example, if you have a model.

Demetrios [00:30:34]: More, for example, that predicts how many.

Jelmer Borst [00:30:37]: Items we need to buy on one end, we're monitoring things like WebP to.

Demetrios [00:30:43]: Figure out how the model is performing.

Jelmer Borst [00:30:44]: But the end, that doesn't really matter. In the end it matters, do we have products available for our customers and how much waste do we have by not being able to sell it that back to also in the model performance and evaluation as well? That helps by bringing everyone on the same page of what is it that we want to optimize. But we all want to optimize the balance between products and ways and find a good balance between that and not so much in the underlying model metrics. Of course, we monitor them. And if we experiment and we look at past data, that helps us directly already to see, hey, this other model version, will it perform better, yes or no? But the end sort of the final proof in the budding is actually what it's doing in production on the final sort of business metrics and getting everyone sort of aligned and getting your test and your monitoring on those as well helps a lot in iterations. It helps a lot in.

Demetrios [00:31:53]: Optimizing for business value.

Demetrios [00:31:56]: So can we just geek out on this idea of how many, let's say apples? How many apples do we need to buy for the next week or two weeks, right? And you're predicting something, you have the model that's predicting something, and at a certain point I imagine you're going to have apples that go bad. First off, how do you know those apples have gone bad? Are you getting, is there some computer vision that's there, or is it just the workers that are in the factory or wherever it may be that are throwing away apples and you have it on a scale. That's the first interesting piece that I'm thinking about. And then second off, that seems like it is a long time. I don't know. I buy apples and it takes like a week or two weeks for them to go bad. I guess if it's organic, it's less time. But that's a long feedback loop, right? So you can't just buy predict it and then buy them and then know the next day if you got the right or wrong score.

Demetrios [00:32:55]: So how does that all look in that one little use case?

Jelmer Borst [00:33:00]: So, no, we don't use vision, although I think we will go into the direction of also actually using it. So, for example, we launched about a year or two ago an automated warehouse, wherever a lot of it uses robotics. And what is amazing is you can of course put a camera on top of a conveyor belt and you can.

Demetrios [00:33:17]: Actually monitor also the quality of your.

Jelmer Borst [00:33:21]: Apples or also the quality of what you ship towards your customers. That's something we can definitely do and are thinking about. But overall, I think, long story short, usually you are inspecting the stock or.

Demetrios [00:33:35]: You'Re trying to pick something and you figure out how this is actually not.

Jelmer Borst [00:33:38]: Good and gets thrown away, unfortunately. But we do actually actively measure how much we're throwing away to really minimize that. And fortunately, if we compare this to other supermarkets, it's roughly, I think, a factor of ten lower. But that doesn't mean everything that goes to waste is still waste. So we still should optimize as much as possible to further reduce that. But then to your point, I think one of the very early feedback loops we can look at is availability of items. So if we're predicting, because the availability that people buy, in many cases you're buying an item we haven't even bought yet. So even if you're buying something today to get delivered tomorrow, we haven't bought this item yet.

Jelmer Borst [00:34:19]: So what we do is basically make predictions continuously to figure out how likely is it that we will have this on stock given supplier availability, based on our customer demand, based on seasonality trends, based on maybe other items being unavailable. So for example, if one pasta brand is unavailable, people start switching to another pasta brand. So we need to figure out that actually people will probably start buying more of this one because not because of historic demand, but just because the other one is unavailable or maybe the suppliers unable to deliver these items to us. So trying to take that all into account, then we do measure in the end, in our app, when people are trying to buy this, we still show these to our customers, but then the moment they actually tried to buy this, we show, sorry, we don't have this available. And that helps us to give back this feedback loop of, hey, these are the amount of customers that would have bought this item, but we're not able to do instead, which is also definitely required in the end for your evaluation because, well, if you look at actual demand, but you didn't have it available, nobody bought it. So it basically taints a bit your historic data that you need to tweak.

Daniela Solis [00:35:25]: And I do think that in that sense, availability is a bit more important for us than other supermarkets that have physical stores. Because in some cases, if you don't have an item but you're already in the store, you're going to buy your groceries. But if you don't have certain items and then you rather just go to the store because you need that, then why would you place an order with us? So for us, it really brings that like way more pressure into having a proper forecast because we really want to minimize the waste as much as possible, but we also cannot afford not having availability of our items.

Demetrios [00:36:01]: Yeah, it's a real just in time situation, and it's cool to think about all the different ways that you want to make sure you're tracking the signals that folks are giving you through the interactions on the app. So let's go back to that point where you joined Daniela, and it was, I think you said, ten machine learning engineers, and then you realized, wow, we need to scale up because we've got 20 models in production, but pretty soon it's going to be way more than 20. And there you hit this speed bump. We could say of, how do we even go about hiring? Who do we need to hire? What skill sets are we looking for? How does that look? When you did some research, you couldn't quite find any literature out there, so you almost had to wing it. And you've learned in the process. What have been some of these key learnings along the way? As you scaled up, what did the scale up look like? And then what did you learn along the way?

Daniela Solis [00:37:00]: Yeah, I think when we started scaling up, we had a small team, but we were really delivering value fast. We had. If you look at other companies, we were still keeping up with state of the art models wherever it's needed, not just because of the hype. We had a lot of knowledge sharing, and I do think that that helped us streamline our development. You know how in machine learning there's a lot of experimentation and that can. You can really go into a rabbit hole. So by really sharing from other use cases, you can sort of learn from that and iterate faster. So we had a lot of benefits there and we didn't want to lose those.

Daniela Solis [00:37:41]: You also want to stay close to business because at the end is the main goal, right? Like, that your models should deliver the value that business wants. But at the same time, how do you keep all the advantages of a centralized model while also scaling up? How do you make sure that we keep on having the best practices, we keep on having amazing people? Because I really think that I like to call our team, like, really unicorse, because we do find the best possible people that are really good at solving very challenging problems and that really pushing the limits. So how do you keep on attracting talent? How do you keep on building them the best machine learning solutions while scaling up and making it also suitable for the new challenges that come with the business? So then you see a lot out there. And you have purely central teams, hybrids, hub and spokes, purely decentralized. And you looked at all these blog posts and what companies are doing, but everything is always the same, right? Like three pros, three cons, but not real insights of what are the real struggles or how is it actually organized from the inside to make sure that whatever structure you choose that really works. And I think that was quite a challenge.

Demetrios [00:39:07]: It was all too high level, basically.

Demetrios [00:39:09]: Yeah, exactly.

Jelmer Borst [00:39:10]: And I think where especially in many.

Demetrios [00:39:16]: Large companies, you see also the combination of this sort of matrix management style.

Jelmer Borst [00:39:20]: So maybe you're in a team, but.

Demetrios [00:39:23]: Then maybe you're the only data scientist or ML engineer, but then you have an engineering manager that sort of oversees many of these different teams. The other week I was talking to somebody who has like 30 reports as she was an engineering manager for many sort of different teams.

Jelmer Borst [00:39:40]: But then how are you going to support this person where you're not in the same team? You have no idea what this person is actually working on. How can you? Of course you can help this person by pointing to some conferences or blog posts or trying to sort of spar and think along, etcetera.

Demetrios [00:39:57]: Or if there's personal issues.

Jelmer Borst [00:39:58]: Of course you can help this person, but what becomes super hard if you want to really help people grow into.

Demetrios [00:40:05]: Their ML skillset and it's still, I.

Jelmer Borst [00:40:09]: Think, still also a bit of an ill defined skillset. But you need data engineering skills, you.

Demetrios [00:40:14]: Need analytic skills, you need ML skills.

Jelmer Borst [00:40:17]: But even that is quite specific for the various areas that you're working on. Vision experience is completely unrelated to recommendations experience. And then you need a lot of software engineering and a lot of sort of how to use certain tools. And if you join or work here, you also expect from your lead that that person will help you to grow, right. And I think that's what anyone should.

Demetrios [00:40:43]: Be able to expect.

Jelmer Borst [00:40:44]: So we need sort of a model that allows to do that whilst at the same time a world model with sort of this one single team kind of worked, but at some point it simply doesn't fit in a single team anymore. Right. So ten people was already sort of pushing it and then we knew we had to grow a lot in terms of people. How do you still make that work?

Demetrios [00:41:09]: And as you're sort of opening this.

Jelmer Borst [00:41:11]: Discussion, there's a lot of opinions and.

Demetrios [00:41:13]: I think also partly pressure from other teams to completely decentralize and embedded.

Jelmer Borst [00:41:20]: Why? Because if you conceptually feels, if we just have one team that thinks about search, it feels much better that there's just one team and one person that I can talk to that will do everything search related. Whereas if you put, let's say, all the ML models that we use in search in a different team, but now I need to talk to two different teams and that becomes harder in terms of maybe communication alignment. But the moment you, you are able to find a good collaboration and work on mod for that, I do feel that the end result is going to be much better by having, let's say, a lot of people with the same.

Demetrios [00:41:59]: Skillset and same knowledge together in a single team.

Jelmer Borst [00:42:03]: Now, for hiring wise, this is quite tricky, as you already mentioned. How do you do that?

Demetrios [00:42:09]: Wait, I didn't get it. Did you decide to go with that embed model or did you decide because you went from ten to how many people are you now?

Jelmer Borst [00:42:22]: So we are, I think, soon to be 1818.

Demetrios [00:42:26]: And what was the decision? Was it, how did you break up the team? By use cases. By.

Jelmer Borst [00:42:33]: So, so we, so we started with.

Demetrios [00:42:35]: One, let's say, or two cases split.

Jelmer Borst [00:42:40]: Out where they were actually embedded in a team and then sort of reviewing.

Demetrios [00:42:46]: Is it going well? So we had basically sort of a hybrid setup where we had some parts.

Jelmer Borst [00:42:52]: Centralized and some parts embedded where we.

Demetrios [00:42:55]: Said, hey, this is like a critical major model.

Jelmer Borst [00:42:57]: We have two, three people working on.

Demetrios [00:43:00]: This, collaborating on this, a single use case as being embedded of this sort of larger product around this use case.

Jelmer Borst [00:43:07]: And that, I think whilst there was.

Demetrios [00:43:09]: Definitely benefits, it didn't end up working for us. So after quite some discussion, we ended up actually going back to a more.

Jelmer Borst [00:43:21]: Centralized way of organizing, but still into multiple teams.

Demetrios [00:43:26]: So there's where you define a domain or an area that that team is responsible for, but that still is a team consisting of ML engineers altogether responsible, let's say, for multiple models across an entire domain. And as we'll grow, we'll probably start cutting that up into smaller pieces and we might actually end up to a case where we get, again closer towards the product teams. But I think for now, keeping this relatively more centralized seemed a better option.

Demetrios [00:43:58]: Yeah, well, it does make sense, going back to what you said earlier with the, if you have a junior data scientist embedded into a team where there's no other data scientists on that team, and then their manager is, or they have two types of managers or the project manager, and then you have the data science manager who doesn't really know about that specific team and that use case, but they're trying to help them grow, there's a lot of places where that can fall apart. So I fully buy in to the idea of let's organize it with all the folks that want to geek out about the ML use cases. Let's get them here together, put them even at the same desk so they can just ask questions to each other and things can move faster and you have the ability to, when you're stuck, hopefully get unstuck really fast because somebody is right there.

Demetrios [00:44:54]: No, 100%, 100%. And I think that's where the moment we sort of sort of reviewing again, sort of these embedded setup, after a.

Jelmer Borst [00:45:01]: While we started looking at how did.

Demetrios [00:45:04]: We progress over time. And actually one of the scary parts was, holy crap, we actually did not.

Jelmer Borst [00:45:11]: Actually, the model started performing worse and worse and worse and we haven't been able to actually keep up and actually improve as we should have compared to.

Demetrios [00:45:19]: Actually with fewer people than we did in the past.

Jelmer Borst [00:45:22]: So whilst it conceptually sounds like a great idea, the actual business impact was actually really lacking. And then having, let's say this closer with each other, where people are really also keeping each other accountable for things because you see people around you where you see them doing are maybe more knowledgeable about certain topic, et cetera, also.

Demetrios [00:45:45]: Makes users strive for learning about that and become better and grow.

Daniela Solis [00:45:52]: And also the way you design your solutions. Right. If you're really focused on a business area, you're only focusing on, we want to improve this specific thing and we're going to tailor like a roadmap and the way we approach this, solving this problem towards the actual pain point that we have. Whereas if you have more of a holistic view or how the field of ML is going and how certain problems behave similarly, even though intuitively won't look like at the end, we're still struggling with the same type of things. You can approach it in a more future robust way or pushing forward innovation and things like that. That on another, you would only just be stuck into your specific problem.

Demetrios [00:46:36]: Yeah, that knowledge sharing comes much easier as opposed to if you're more isolated. I could see that too. So at the end of the day you said, all right, let's get the team together and we've got different areas that the team is working on. So the 18 folks aren't all thinking about all the different use cases. You've got maybe the recommender systems or how did you break it up, how did you buy use cases or buy something else?

Demetrios [00:47:08]: So for now we didn't want to.

Jelmer Borst [00:47:09]: Go to break up to too many small teams.

Demetrios [00:47:13]: So for now we have sort of organized one very much on our consumer side. Where everything happens, let's say, with the personalization recommendations, etcetera, and then another team very much focused on the world operational supply chain use cases. Even that is still actually a very broad domain. And I do expect that at some point we'll split it up more fine grained, but we didn't want to sort of prematurely cut it up into too many sort of smaller pieces, because then at some point you might also have, let's say, I think it's good to keep in mind a good balance between execution versus leadership. And if you get to too many fine grained teams, then it gives a lot of overhead and more alignment and more discussion and more whatever not. Whereas if you're just a single team, it becomes a lot easier to execute.

Daniela Solis [00:48:06]: And the platform.

Demetrios [00:48:07]: On another site, I was going to say, do you also have a platform team? So there's the platform team, and then there's the use cases team to use.

Daniela Solis [00:48:14]: Like two big domains and the platform team makes sense.

Demetrios [00:48:19]: Okay. When you were searching for how you can organize the teams, what are some things that you wish other blogs had talked about?

Jelmer Borst [00:48:34]: I think, first of all, it starts.

Demetrios [00:48:35]: With validation around this.

Jelmer Borst [00:48:38]: So it seemed that everyone was saying, well, if you're a small company, it makes sense to have like a central sort of data science practice. But as you grow, surely that doesn't work anymore. And you need to decentralize, you need to put everyone across the entire company. That's, I think, where you see this in many. But I think I, with sort of two notable exceptions, was, I think there's actually a lovely podcast with the CTO.

Demetrios [00:49:05]: Of Netflix, I think on the Linux.

Jelmer Borst [00:49:07]: Podcast, I think it was. And there's one with the data science lead of Doordash. We're actually both actually going quite against that model. I think that helped, at least us to validate this quite a bit, even though we were already sort of in the whole process. So having that earlier would have helped tremendously. Then again, the second part is, what is extremely difficult to articulate is that if you are performing actually with a very small team to do actually lots of things, if the rest of the company doesn't know any better, it becomes super hard to articulate what you will lose the moment. You will change that. And some of it you kind of maybe need to accept by just doing and just see.

Jelmer Borst [00:49:55]: Will that work or will that not work? But I think a couple of things that would have been helpful to see from other models is, I think, the downsides of embedding. I think there is many areas where people are articulating why embedding might work better, but not necessarily articulating the challenges of an embedded model. Embedding always makes, it reminds me of Genii as well, so it becomes a bit of a similar of the term features. I think it's always a bit of ambiguous word, but that aside, are you.

Demetrios [00:50:33]: Thinking vector database as soon as you say embedding?

Jelmer Borst [00:50:35]: Yeah, exactly. But I think articulating the downsides and the trade offs, whilst it, it's easier to scale that. The question is, are you accepting the downside of performance and of the knowledge sharing and of the. Well, I think, yeah, primarily the final business performance. Articulating that is quite hard. But getting more clear cases and proof of how that leads to lower performance in companies would have been extremely helpful. But most companies don't want to share that, right? Because most companies in the blogs and things are sharing their successes and not their failures.

Demetrios [00:51:15]: Yeah, yeah. It's a rare day when you see a company writing on their engineering blog how bad we brought, or how we brought down the whole, the whole service for 24 hours. Yeah, that's true. Like we need to see more of that type of engineering blog because I would love to.

Daniela Solis [00:51:35]: It's where you learn the most and I think we're only sharing our successes.

Demetrios [00:51:39]: So it makes a lot of sense. One thing that I wanted to ask too, with the organizational. Well, is there anything else that you wish you would have known while you were reading those blogs?

Daniela Solis [00:51:54]: Yeah, I think you always look into pros and the cons. Right. But what I really like more now looking at in hindsight, what I would like to know is like, okay, you want the best of both worlds. What kind of structure allows you to really get them? Because I do think, for example, we're now going towards a centralized one, but if you were embedding the business teams, you would have more alignment with your team or try to strive for the same goals or stuff like that, that kind of thing you can still achieve in a centralized model. But if you decentralize, how do you do the knowledge sharing? How do you really push for innovation? It's a bit harder. And I do think that looking at it more high level, you lose this, you win this. It's a bit hard to really know what the best structure is for your company. And I really think that sort of finding solutions to get the best out of both worlds, looking into how your organization is, how do your company's organize can really help because I don't think that there's a unique model that helps for every company.

Daniela Solis [00:53:05]: I do think that you need to find how to apply a certain model to you and how to really make sure that you lose the less by creating these connections or making solutions around it that can really help and how.

Demetrios [00:53:18]: Key, I guess, to your organization, that sort of innovation or pushing it.

Jelmer Borst [00:53:22]: Right. In some cases, just decent performance might actually just be good enough. Right. Because maybe you have a business model where you have huge margins and you're making bank anyway, then doing a little bit of a good forecast or recommendation or whatever is already sort of good enough. Whereas in the other case, in our case, we kind of need to make this whole thing work in terms of margins. Maybe supermarkets are one of the worst things you can get into. So we kind of have to really push the boundaries on all these things because otherwise it would just not work. Right.

Jelmer Borst [00:54:02]: So I think then you kind of want to maybe put a higher emphasis on the sort of renovation or going for more state of the art models, etcetera, rather than maybe applying a multitude of more simpler regressions, which maybe for quite a few other companies would already be good enough.

Demetrios [00:54:25]: You're playing the game on hard mode right now.

Jelmer Borst [00:54:31]: Yes. And I think then I think to your other point on learning on the hard mode, I think especially a year ago trying to go into this embedded model, it really didn't work where people were quite unhappy, and then we had multiple people leaving actually the company as well, which actually only stretches even more. And I think that really increased playing on hard mode even more so, whilst at the same time you're trying to recruit people, but you're trying to recruit these unicorns that are sort of super friendly, energetic, fun to work with, but also actually extremely knowledgeable, have the right software engineering skills and have the right ability to think along on the business side as well. Hey, if shit hits the fan, you just fix it, right? You have that full enduent ownership and trying to find these people, it's not easy, but I do really value that in the end, waiting longer in hiring people, but hiring the right people pays off so much. It's very painful in the short term, but in the long term it pays off very well.

Demetrios [00:55:38]: Yeah. What a strong signal that the embedding model wasn't working. The folks are leaving, they're telling you they don't enjoy it. Your model performance is going down. It feels like you were doing better with less people. There's so many things that were pointing to that wasn't the way to organize it, at least at this point in time. Going from the ten to 18, maybe later you will retackle this and you'll recognize when you're at 200 data scientists, it's good to have one or two embedded in each one. But from ten to 18, it's not the moment.

Demetrios [00:56:12]: And I think that's a awesome learning that you can share with the rest of us because I can imagine there's a lot of folks that have been in your shoes and they've thought like, okay, is it me or is it this company? The way that we architected this thing isn't really working. Yeah. I wonder how many people listening have left their jobs because of that very same reason.

Jelmer Borst [00:56:39]: I would not be surprised. Right. So also people around me that I sort of talk to, then there's so many also in other companies where they're like, yeah, we've organized it like this, but it just doesn't work. I really don't like it, but. Well, some people decided to organize it like this and I think that's such a bad sign. I think you need to, like, if everyone in your team is telling you this is not a good idea, then there's probably some truth to it, right? But of course, sometimes you need to change things. And as you scale and as you. Right.

Jelmer Borst [00:57:11]: We started ten years ago with only a handful of people, the entirety of picnics over now with 1300 in total. It's skilled. Right. At some point you need to basically reorganize in order to make it work with more people and as your business goes along. But it does need to go in a way where we were able to work effectively.

Demetrios [00:57:31]: Awesome. I think that's it. That's all I got.

Jelmer Borst [00:57:48]: Close.

+ Read More

Watch More

Vision and Strategies for Attracting & Driving AI Talents in High Growth
Posted Aug 08, 2024 | Views 1.1K
# AI Talents
# Company Alignment
# Sustained Growth and Success
The Birth and Growth of Spark: An Open Source Success Story
Posted Apr 23, 2023 | Views 6.4K
# Spark
# Open Source
# Databricks
MLOps - Design Thinking to Build ML Infra for ML and LLM Use Cases
Posted Mar 29, 2024 | Views 2.5K
# MLOps
# ML Infra
# LLM Use Cases
# Klaviyo
# IBM