Reinvent Yourself and Be Curious
Stefano Bosisio is an MLOps Engineer, with a versatile background that ranges from biomedical engineering to computational chemistry and data science. Stefano got an MSc in biomedical engineering from the Polytechnic of Milan, focusing on cellular biology, genetics, and molecular simulations. Then, he landed in Scotland, in Edinburgh, to earn a PhD in chemistry from the University of Edinburgh, where he developed robust physical theories and simulation methods, to understand and unlock the drug discovery problem. After completing his PhD, Stefano transitioned into Data Science, where he began his career as a data scientist. His interest in machine learning engineering grew, leading him to specialize in building ML platforms that drive business success. Stefano's expertise bridges the gap between complex scientific research and practical machine learning applications, making him a key figure in the MLOps field. Bonus points beyond data: Stefano, as a proper Italian, loves cooking and (mainly) baking, playing the piano, crocheting and running half-marathons.
At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
This talk goes through Stefano's experience, to be an inspirational source for whoever wants to jump on a career in the MLOps sector. Moreover, Stefano will also introduce his MLOps Course on the MLOps community platform.
Stefano Bosisio [00:00:00]: Stefano Bosisio, working for Synthesia and I'm Italian, so a strong homemade coffee with my own machine.
Demetrios [00:00:12]: We are back for another Mlops community podcast. I am your host as always, Demetrios. Today, talking with Stefano, we got into some of his biggest failures as he has learned how to be an ML engineer and learned how to create ML platforms. He went from the world of PhDs in chemistry to building ML platforms at financial companies, aka Banks, to starting from scratch and having to build out what he called a Ferrari of an ML platform, only to realize that it got no internal adoption. Let's get into it with Stefano and hopefully you do not make the same mistakes as he did because you learned from what he talked about. Today, I will also call out that he recently released a course on the Mlops community learning platform. Highly encourage anyone who is looking to learn how to ask the right questions when you are building out your ML platform to go check out the course. Enroll.
Demetrios [00:01:26]: Stefano is a class act, and in this course, he specifically goes through the building an ML platform with open source tools and then with a tool like vertex AI. Let's get into it, dude. Well, talk to me about moving from academia to the industry. I know you have a strong background in chemistry, and then you've done a whole wide array of different things now since you've moved out of the academia industry. But what has that been like?
Stefano Bosisio [00:02:04]: Everything starts back like, more than ten years ago when I was a very little italian engineer with a degree in biomedical engineer. I was so deeply loved at the time. At the time, Aihdenhe, as everybody say, it wasn't a thing, but it was a thing though. Like, if we see the conversation with Gans and so on, it was a thing since 1991 and even before, but no one really applied very well. Like, Tensorflow was the first package, I think it was released. But yeah, to make it short, the passion of research was totally firing me up. I was on fire. I want to do research.
Stefano Bosisio [00:02:47]: I want to discover things. I was in love with genetics and so on, and physics and so say, wonderful, I want to do a PhD. And that's how everything started, unfortunately. Unfortunately, maybe. So that's how I landed in Edinburgh, Scotland for a PhD in computational chemistry. It was 2014, if I'm wrong. Amazing. So something I always recommend to everybody.
Stefano Bosisio [00:03:15]: So if you fail yourself, you have the passion for research, want to discover something new? You want also to continue to study, give yourself a chance and try to maybe apply for a PhD. Who cares if you're not earning money? Still remember I was getting later? Yeah. I was getting like something about, I can tell you numbers without problems, 1000 pounds a month. I was very, very stressed the situation, because you got. You got all the expenses. The british pound was very strong compared to the euro, so coming from Italy wasn't helpful enough. But apart from that, it was an amazing experience, really. So I always be thankful to my supervisor, because when you're trying to do research, at some point you start coding up, maybe especially computational research coding up, which means maybe my level of coding was more or less zero.
Stefano Bosisio [00:04:16]: And then you start to get more around python C c at the time as well, and say, hmm, cool, I can do serious stuff from here. And, well, the level of details you can get through, you can get very deep, deep inside, like multi processing or crazy stuff, and statistics as well, because that was the theme of the PhD that you got. You gonna acquire a lot, a lot of skills. The only problem is maybe that you're too attached to academia, so you're too attached to a way of thinking, so you think everyone knows what you're talking about. You really don't have a lot of soft skills. You don't know exactly how to interact with people. This can be a problem with many supervisors as well. Not mine, thankfully.
Stefano Bosisio [00:05:09]: He was a great, he's a great guy. But of course, of course, that's. That's the spot of the game. And from there, what happened? Well, yeah, the PhD is finished. So that rings me a bell, saying it's time to find a job. A job when, what, and who? Well, it was 2018, if I'm wrong, so it was the. Since 2016, there's been a huge hype for data scientists. So since I did a lot of coding, a lot of physics, a lot of maths and statistics, I said, what the hell? I can do this thing.
Stefano Bosisio [00:05:47]: I can give it a try and then jump on a joint venture. So I say that was the very first contrast in my life, passing from academia to a real company. The first jump I've done was working as a data scientist for a fintech company. So it was a bank. We were selling products like mortgages, loans, credit cards, whatever. Whatever a bank can sell, we were selling. That was a good time to join the company because they were trying to build up this new data science team. So what's data science? At that time, the question was, what is data science? What can we do? What a data scientist can do? The matrix.
Stefano Bosisio [00:06:30]: It was an amazing opportunity. Good memories, but very harsh to understand what I had to do, passing from a PhD, to a company. I remember many times these people were just talking about, about acronyms or weird financial stuff. I say, what? And I was trying to say, guys, this is like simple stuff, like introducing GitHub. This is GitHub. This is how it works. Soon. See, can you see? We can keep track of the code and so on.
Stefano Bosisio [00:07:07]: But it was very hard to give people to deliver this message, maybe to people as well as building models as well as the way we were building models, because, you know, I was used to run my stuff on a Ubuntu machine freely, do whatever I want, as many gpu's as I want. Right. And while here we were very constrained. On Windows laptop, you, you can't install anything because of course there might be all the packages must pass through some security checks and so on. So it was a really a heart attack for me, but it was a good opportunity, a very good opportunity. Fasinao, back in time, that gave me a lot of initial skills that many PhD students can't acquire. Unfortunately, you know, the PhD environment is very protected. Although I'm strongly a supporter for PhDs, sometimes I say yes, but please make sure you can get also some soft skill scores.
Stefano Bosisio [00:08:10]: Or do think of these soft skills, because they're not as useless as you might think initially. They're super useful, and once you learn them, you'll open up a lot of those. So that's more or less how everything started and how I changed my fire for research to a fire for data. Yeah. There's been, of course, many familiar thoughts, like, I would say, many, many reasons why I didn't continue with academia, like, not only the fact that you always have to look for grants and so on, but I, it's, it was mainly because my wife and I loved Edinburgh and wanted to stay here a bit more. So I said, okay, I can, I can have a stop, a break from academia and jump into this data science thing. But it was super, it was super amazing, really.
Demetrios [00:09:09]: Well, you have since evolved, basically taken the evolution from academia to the industry, working with fintech, and then you started to recognize that just data science isn't enough, I imagine. And if you want people to have value from the work that you're doing as a data scientist, you need to learn a little bit more of the platform side of things. Can you talk about standing up different platforms? And when that realization hit you, yes.
Stefano Bosisio [00:09:44]: Yeah. So totally right. And also here, there's a nice story. So, okay, so that was in this fintech English, we were developing models, and now that I think I start laughing. Because we were developing models on a Python notebook, of course. Yeah. You know, there's this amazing theme of pushing notebooks to production. Nothing against.
Stefano Bosisio [00:10:12]: So the. Right, deploy whatever you want. I'm totally fine if you manage to deploy by some point. Of course, we were struggling the level of going forward, and, like, at that time, I think there wasn't a real understanding yet of the mlops practices. So what happened is that we had DevOps team, and that was us, like, call it data ops. We were developing models, and they were trying to do something with these models. But something I didn't clear, like, I remember the time we had some weird EC two instances. There weren't even EC two instances on AWS kind of mixture, and these guys were taking the binary file, the model, and trying to make it run by.
Stefano Bosisio [00:11:01]: They couldn't even run real time. It was a very convoluted system to get to something and get a little bit of value out of what we were doing. So, as you can imagine, the value itself was very hard to deliver because there's also something I'm telling everybody, if you can't exploit as much as you can, the business value out of your data or out of your work, then unfortunately, there's a lot of things to do because we do need the business value. So my manager that time, another good guy, take me for a meeting, say, step, there we go. This is AWS. Why don't you develop an ML platform? Sync. I don't know anything, but I can try to see what you're talking about. So in those, I think, three months, I had a huge into AWS, trying to understand what was going on and trying again to reinvent something for my career.
Stefano Bosisio [00:12:09]: So I said initially, I said, okay, I'll be a data scientist. I'll develop a nice model. And then I was starting to look at ML plus one, say, hmm, these ML engineers know something that I don't know. And so initially, there wasn't really, I think, also many resources because it was still, 2019 is through the. I can't remember when the mlops community was founded.
Demetrios [00:12:35]: 2020. We weren't around yet, I think.
Stefano Bosisio [00:12:39]: So I say, where do I find all these resources? There were all good books about AWS and how to maybe make things communicating. So the first thing was a lot of philosophical discussion of what we really need to do, what we mean by deployment, how we can decompose all the problems we had. So it was really nice because it was reattaching myself to my engineering background. So giving a problem try to split into sub problems and try to reconnect everything in this weird jigsaw. And so, yeah, from there, like I started thinking, okay, we need something for data processing. And then I said, okay, we need something for model training. Okay, how can we really train the model and what can we do with the binary files that we have or the model themselves, how we can really spin them up? So it was an amazing journey because allowed me to really start from simple. So develop your ML platform with lambda functions and step functions and try, and then try to make up something more robust so it has the time for, it was the time for glue, for data processing using a proper database, create a data structure, bad people start to communicate better, big people, and make them understand what we need to do with data, what we need to do with the model, why we need all these we are weird stuff.
Stefano Bosisio [00:14:04]: I remember it was very hard, very hard. I think nowadays is super hard as well to talk to senior manager, very senior managers about these things. Like that was a lesson. Of course, they don't need to know all these nitty gritty details about what an ML platform is. They just want the business value. And what I was trying to do was to give them the business value. But say, watch, this is amazing. But apart from that, show them a.
Demetrios [00:14:30]: Demo for an hour. Give them an hour demo and show them how amazing it is.
Stefano Bosisio [00:14:35]: It's what I tried to do, but it was very hard really. But apart from that, it was an amazing opportunity really to try to understand how an ML platform could be developed and try to think really of naive terms of what Mlops could be. So trying to get more into the MlOps perspective. So what we're trying, what we are trying to do, what we're trying to achieve, and especially if, I think with very limited resources, because still I was working with my limited Windows laptop. So developing coding was the worst experience ever. And even pushing code to AWS Washington, a real nightmare. We need also it to be in the middle to say, okay, this data must be pushed over there, okay, but it has definitely moved a lot of conversation also in, I think we were one of the first banks where we were trying to develop for real an ML platform. So really think of an entire ML journey, really think how ML models should be created and constructed.
Stefano Bosisio [00:15:43]: So it was an amazing thing. And really in this time my partial for ML ops start to fire up. What the hell? I say, whoa, wasn't mlops, it was engineering. ML engineering in total, say, oh, I could develop model, but I could also push them to production, I think something similar. So that was the way maybe the kick ahead I was really, I don't know, maybe it's the fate, whatever that pushed me down this direction and trying to navigate a bit more into the ML op sector and gave me also the like. At some point I was fed up with all the financial things, good financials. But I'm not getting a bonus as high as the senior manager. So I'm leaving.
Stefano Bosisio [00:16:30]: And so I started to deeply focus really on mlops. And that's also where I got to know the Mlops community because started working really for an ML engineer for another company browsing and interest. Spotted mlops community. What are these guys doing? And that opened up an entire series of great things, dude.
Demetrios [00:16:53]: Well, channeling my inner bezos. Cause you're talking about AWS and you're talking about things that were hard back then and are still hard a la. Talking to senior management, I would love to know what you feel like over the years. Hasn't changed when working with mlops.
Stefano Bosisio [00:17:13]: Hasn't changed.
Demetrios [00:17:14]: Yeah, that hasn't changed. A lot has changed. Right. But what hasn't hasn't changed?
Stefano Bosisio [00:17:19]: Stakeholders. So the first thing that popped into my mind. Okay, stakeholders. Okay. Two things to essential stakeholders and selling products. What I mean is this, well, stakeholders is the major things, I think. And this is something, again, it's very hard to grasp, especially, I think if you're changing career, if you're moving from academia to companies and so on as the way you're communicating. So like every child, like as an engineer is very easy for us to create a new product, right? Do we want an experiment tracker? Boom.
Stefano Bosisio [00:17:54]: There we go. Do you want a superclass with 30 nodes? Under nodes. Thousand nodes. There we go. Do you want us to scaling. There we go. But what doesn't change is the way we are communicating. We're doing a magnificent job, right.
Stefano Bosisio [00:18:10]: But it's very hard to communicate and express this business value out of it to more senior managers, of course, or even our stakeholders. Like many times it happened to me. I was working for this company where we were really creating the very first ML platform for the entire company. And gosh, one day had the realization, the epiphany that we made an amazing job. We had an automatic way to train models. All the models were registered, all the models could have been keep track of all the experiments. It was everything very neat and polling.
Demetrios [00:18:50]: Monitoring it all.
Stefano Bosisio [00:18:52]: Yes, even monitoring, which is like it was the holy grail. The time. We know if your data are saying are a bit shady, that's why we have monitoring place. We even had data pipelines for validating data. Nice English. People were not using it had like two weeks of depression or even a month. And I say, why people are not using this thing? Because it's amazing. I say, the people should jump in.
Stefano Bosisio [00:19:16]: You say, oh, Stefano, this is an amazing product.
Demetrios [00:19:20]: You were expecting an award.
Stefano Bosisio [00:19:21]: I.
Demetrios [00:19:21]: You're what the hell? Or at least the go and gives the talk.
Stefano Bosisio [00:19:26]: Yeah, yeah, yeah. I was expecting fireworks because something like that. But nothing. It's, we're really bouncing against the wall. And that's the, maybe a naive, but this realization. I had to say I didn't communicate properly with my data science team, like, which means it's not only having meetings where you're explaining what you're doing, it's not having standards where you're showing the Jira board and so on, is giving demo tutorials, taking time with them. And it's a huge effort. It's a huge amount of time, but you must dedicate this to these people, otherwise you're not going anywhere.
Stefano Bosisio [00:20:05]: And like, I remember I started to record myself and giving tutorials to, for example, show how to ship a new pipeline for training or how these things are automatic and how they can improve our way of doing. Right. But at the same time, they were also deploying instantaneously if the model was behaving well and so on. So writing documentation as well was another thing. Like, I spent hours on all our packages writing documentation. I say, okay, if you want to use this, you need this, this and this. If it doesn't work, of course we are here to help. So like being more like a 24/7 support team, we are here to help, making sure everyone knows that we are here to help so that people could say, oh, okay, so I want to create a new model.
Stefano Bosisio [00:20:54]: I can go to Stefano's team and ask what are the best practices? And do because, yeah, we were getting hidden by, by everybody basically because no one knows what we were doing, why we were doing some stuff. What was the body we were giving to them. So I think the communication is something that I had to spend a lot of time and as well, as well.
Demetrios [00:21:20]: Wait, before you get into the next point, you basically had every founder's worst nightmare, which was launching to crickets, right? And you should have told me because I would have given you an award for sure, of best ML platform that's never been used or something along those lines. We would have in the mlops community, yearly awards. We could have totally given you that. But the, the thing that I am fascinated by is how you attribute the failure to your own lack of evangelizing for the product that you were building. And so it was almost like you did not build enough in public, like internally in the company. You weren't showing people like you said, you weren't doing demos, you weren't properly documenting or having potentially like pair programming sessions or just like learning sessions with the data scientists. Were the data scientists your only stakeholders, or were there other folks from different teams too?
Stefano Bosisio [00:22:24]: Yeah, also product managers or the user experience team, data analysts, a lot of data analysts as well, because we were actually, we were still giving tools to data analysts. So that was amazing. Like the ability to query in real time data through simple SQL queen. What the hell, Demetrius? We made an amazing job, but it wasn't received correctly by your rights.
Demetrios [00:22:48]: You know what, what a platform. What a platform. Oh my God. You alright? Well, I'm patting you on the back a few years too late.
Stefano Bosisio [00:22:56]: Like, I think back in time, it was an amazing job that we've done and like, it was really a good case study to start to learn how to do mlopse for real and write this stuff on the MLFS community slack channels. Yeah, yeah.
Demetrios [00:23:14]: You got a lot of questions answered in there. That's funny.
Stefano Bosisio [00:23:18]: But yeah, as you said, I think the, you do need to reel into an effort to show yourself to the business. That's a bit like to show what you're really doing so that people can have an idea at least. And that's not easy. Also, some people will hate this, unfortunately. But gosh, please try to put yourself into someone else's shoes. Try to speak up a little bit more because it's very important. Unfortunately, many people I've worked with are very shy, introvert and so on. So they don't like, they love coding.
Stefano Bosisio [00:23:57]: They're amazing coders. But I'm not the best coder ever, actually. I'm very, I'm very bad. But these people are amazing. Like I say, okay, try to push a bit more on these things. Also. One thing that's very important sometimes is the emotional intelligence. I think you do need to understand what are the feelings of other people while you're talking.
Stefano Bosisio [00:24:21]: So be empathic and so on. That's even another level of communication. But it's very important so that you can match up, find the right communication skills. And so, for example, I know that with Tom I have to token this way, with Frederick I have to token this other way and so on. So being like, I would say a swiss knife, where you get all the tools that are heating properly where you want, because otherwise, unfortunately, if no one is using your platform, it might be like this shiny beast, but no one will use it. Your company is actually spending money because the infrastructure is up. So the flooding at any way following, and at some point the CEO will say, hey, what are these? Money? So that's a, that's a huge effort.
Demetrios [00:25:05]: Yeah. It's like you have a Ferrari and you're taking it to the supermarket.
Stefano Bosisio [00:25:09]: That's what I'm saying. Indeed. When I remember we had, I was used to organize for this company what's so called ML clubs. So, like, I was inviting guests from, externally, from the company to show their data science cases and so on. And one day we try to make something internal, like inviting all the company and show what the data science department is doing. So data scientists will do this, data analysis will do that. Mlops is doing this. And I think the best definition ahead for Mlops was we're trying to build up a Ferrari.
Stefano Bosisio [00:25:41]: Data scientists are the drivers and we are trying, we are the mechanics, so we're trying to make the Ferrari even better so they can go as quick as possible and win all the races. But that's, that's the, that's the thing. And if you want bonus points, another thing that isn't changed is, like, people selling you adult bespoke solutions, which are super expensive, but they don't work. But we can close the panel.
Demetrios [00:26:08]: That's a hot take.
Stefano Bosisio [00:26:09]: Wait, wait, wait.
Demetrios [00:26:09]: You can't just stop talking after that.
Stefano Bosisio [00:26:12]: Tell me more. So, I remember many times we had chats with either external companies or cloud providers. I will not say which ones that. I mean, at that time, DMLos was still quite a thing. So that's why, that's why. But they were trying to say, you know, you could use the. They were just doing an infrastructure diagram, like so, nothing really complicated, something that you could do yourself. But of course, they were creating this for you and they were asking, like, more than 100,000 pounds a year just for it, plus access to all your data.
Stefano Bosisio [00:26:49]: Same. Okay, wait a sec, wait a second. This didn't change. I can still, still nowadays, I guess, see a lot of solutions which are, okay, amazing effort. But, I mean, this can be replicated by a good engineering team. If you don't have money for an engineering team, then it's okay to go for these solutions. Otherwise, just develop internally. That's my two cent.
Demetrios [00:27:15]: So this is fascinating to think about, because there's always that Roi trade off that you're going to be looking at. And is it best for your strong engineering team to go and build this themselves or to just buy it and then it unblocks them to go and build other cool stuff. Right. And so that's the fun thing that the leadership, I'm sure, loved to hear from you about and say, like, wait, so what's this and why would we need it? Why do you think it's important or whatever?
Stefano Bosisio [00:27:52]: Yeah, well, you hit a good point, though. I think it depends. Like if you want to deliver immediately something you do need to, many startups at the beginning of the race of the journey might decide to go for external products just because they're, you know, they're already done, they just cost some money. But you don't, you can't afford an engineering team. But if you have your team, I think to give them a chance, try to, try to understand their potential, what they can build, what they can do is also like very satisfactory from the human point of view. Right.
Demetrios [00:28:38]: And sometimes it's just egregious what some, like, if you look at Sagemaker and the prices on Sagemaker versus if you were to do it yourself, the endpoints are like eight x more expensive, right?
Stefano Bosisio [00:28:51]: Yeah, exactly, exactly. Yeah. Under good points.
Demetrios [00:28:55]: So there are things where you can, and that's why I'm a huge fan of finops and figuring out are there ways that we're doing things now that with just a little bit of tweaking, we can ideally save a lot of money? So is there low hanging fruit that we did when we were trying to go fast, but we now can go and almost like refractor our vendors and who we are supporting and make sure that we're just getting the best and the most out of every tool that we're using. Otherwise we can cut it under percent.
Stefano Bosisio [00:29:41]: 100% agree. Yeah, absolutely.
Demetrios [00:29:43]: So then, all right, you built a few platforms, you learned the value of that internal evangelism. You also had the experience of learning what exactly mlops was. Do you feel like, and this is a question that I constantly am thinking about because you've had many different use cases. Are there pieces that you can generalize and say, in general almost always going to need these parts of the pipeline or of the platform? If you are doing something with computer vision, it might be totally different. And that's kind of why I wonder if you can generalize on this. Like you're still going to be having, you're dealing with data. Right. But a computer vision problem is way different.
Demetrios [00:30:44]: Than a fintech fraud detection problem.
Stefano Bosisio [00:30:47]: Yeah, yeah, you're right. Hundred million dollar question. But this is something like I'm bouncing on right now in my current job because we are dealing with a lot of computer vision problems. Like so now our data are videos and nothing tabular word. And this is a drastic change. So like I would say that on the paper, like you need to carefully subdivide all the other problems. So like the starting point is always the data, right? So how can we make sure we have a data platform? How we can make sure that data engineers know what we want and people can query their data easily. What does it mean querying a video? As you know, it might be querying like a path to a bucket where you have a video.
Stefano Bosisio [00:31:49]: I might be querying metadata of the video. So if you see in this way, it looks like the tabular problem and the computer vision problem might be similar. It's a query, you need metadata, you need to know where your video are. Okay, this can be, this can be maybe generalized of overall data pipelines are always there. Maybe for computer vision problem, they, yeah, also for computer vision problem, for example, thinking on general case they could validate data. For example, you might have a video whose length is 0 second. You see what's a video? Hold 0 second. Or you can extract like audio information, right? And from there you can make up of some validation tools.
Stefano Bosisio [00:32:33]: What for example is not general is the way of course you're treating the videos. So you must be critic enough to understand what's the business case you're investigating over. Do you need yellow detect because you want to detect something or you want to, I don't know, general description of videos and so on. So, but here we are going on to a deeper level. So I think on the general point of view, maybe data side, we could say, okay, there's a database. The database may work differently, but overall, what are we going to use? As you know, athena postgres bigquery bigtable. Yes. These tools are always on the standard side of data ops.
Stefano Bosisio [00:33:17]: And then you move forward and you start thinking, okay, the next problem is how I can make people training model. So I think personally for training models we're getting more, we're converging to more best practices and standard ways. So like, okay, I know that you're not, but I am. I'm a big fan of Kubeflow. Love for Kubeflow. The majors.
Demetrios [00:33:40]: I got nothing against Kubeflow.
Stefano Bosisio [00:33:42]: All the power of AI lost Kubeflow.
Demetrios [00:33:45]: Look at that.
Stefano Bosisio [00:33:47]: This is AI for mechanical. So like, I think overall what you need is to create a training pipeline, right? A pipeline that is able to retrieve your data and then crunch this data and generate a model, whatever. Now let's not take into account LLMs for just for a second. If we need to create a training pipeline, this problem can be easily generalized also. Yeah, in my opinion, can be easily generalized because what's the difference between creating a random forest and creating a model that maybe is running a yellow algorithm or an object detection algorithm? At the end of the day, the pipeline will retrieve the data that you need, it will process the data that you need, it will train your model, right? So more or less, and this is why I'm strong supporter, Kubeflow is because if you're also reducing the learning barrier to your stakeholders, and Kubeflow is very easy to be used because you just need add components, add pipeline. The engineers will create us SDK for you, so you can just ship whatever you want. Okay, we're a step forward, so we are reaching a great level. Then different, different points for LLMs.
Stefano Bosisio [00:35:12]: LLMs of course are huge models. So the way we are training also these models is just slightly different. You need different infrastructures. I don't think Kubeflow may support any llns. I never heard of it, but yeah.
Demetrios [00:35:27]: I haven't heard that either in principle.
Stefano Bosisio [00:35:31]: But better not to. There are better methods to train more, let's say. So, okay, if you go down the road, but the matrix is a question for you. How many companies are using and training LLMs?
Demetrios [00:35:43]: Yeah, very few.
Stefano Bosisio [00:35:44]: Okay. Like, so 90% of the companies nowadays will still enjoy music and sorry if I'm saying logistic regression, randall forest, super deep forest algorithms, right? So guys, think carefully about the cost you injecting in your company, because it's meaningless. Like spending a million dollars a month for training on nlms from scratch when you can do a 95% accurate and amazing job with a random forest, whatever forest you want or whatever aggregates you want, right? The last step is of course the deployment. So deployments, I think we are not converging yet. In my personal opinion. There are tons of ways to deploy models and this is really depending on any company. So there might be companies who would like to use vertex AI endpoints, okay? Pay the money and you can deploy them. There are companies which say, no, no, no, we have, we're going to hit an EC two instance, we're going to have our own kubernetes infrastructure, so we got all the endpoints we want.
Stefano Bosisio [00:36:56]: Okay, computer vision, that's another terrible problem. You need machines which are of course able to do the video rendering as well. So here, I think the navigation is a bit complicated, but I hope we are converting to specific, specific direction.
Demetrios [00:37:17]: I think, you know, Andy came on the podcast a few weeks ago, and he was talking about how for him, the change to LLMs versus traditional machine learning, and also deep learning or computer vision and all that fun stuff, doesn't really change much about how he thinks about the platform. What it does change is how he builds or the tools that he needs to add to the platform. Right. And so it's what he was saying. And I really like this idea. I like most of Andy's ideas, but I thought his idea around creating an ecosystem, allowing your platform to be looked at as an ecosystem where if someone wants to use an LLM, they can. If someone wants to use that traditional ML, they can. If someone has a computer vision problem, they can.
Demetrios [00:38:13]: And obviously that is like maturity level 500 versus your maturity level, like one, and you're just trying to get a few models out and make sure that they are reliable and consistent. But for me, what became very clear is that LLMs are very much a part of the conversation and what people want when we talk about doing ML and AI these days. And so how can you set up the platform now to leverage the gigantic transformer models that are out there, whether it's an open source llama that you grab or it's, you're hitting GPT-3 or GPT four?
Stefano Bosisio [00:39:02]: Amazing question again. So I think I partially agree with what Andy said, because this is the dream platform. So like, it's something we're always trying to create, right? For any company like you, you got this metaverse where you can ping the service you want, and this will do all the rest when it comes to LLMs. Now the discussion is like, as you said, we could actually using like hugging face, for example, APIs, or calling public APIs to use these LLMs. Nothing against, that's totally fine. And it's not so complicated to have them deployed, because all you will need is just a simple microservice that spins up for your request. So your engineering team could even create a simple package where you just importing all the models you need. And these models can be freely pinged from the school, from the Internet.
Stefano Bosisio [00:40:02]: The only caveat sometimes that I'm thinking of these solutions is what if there's downtime on the external services, so you're blocking your business. Okay, the, what's the name of the.
Demetrios [00:40:17]: Guarantee service SLA's or slos or proxies that you can.
Stefano Bosisio [00:40:22]: SLA? Yes, SLA's like 90% on usage all the year. Okay. But you need to consider it. You don't want to block the business. Then what's the cost for pinging these models and what's the number of customers that you're dealing with? Because now, like for Syntisia, when I'm working now, we using external models would be pretty hard for us because we are serving thousands, thousands, thousands of people and we do need GPU's. Right. So what could be the cost that we should support? An external provider will charge you a lot. Say aha, you need GPU's.
Stefano Bosisio [00:41:02]: Yeah, no worries. Here's the bill. Of course you need to start thinking something that can be also money saving. So what if you can use spot instances? What if you can use maybe sharing the same process across multiple GPU's? Sorry, different process across same GPU's and so on. So it really depends on your business. That's why I say you do need to put things into the myelops perspective and start to understand really what's your business value, what's your, what your business wants to do. Because the step from going from spending zero to spending a million dollars is very narrow nowadays, especially with these LLMs, because everyone was using that. Yeah.
Stefano Bosisio [00:41:48]: So, but overall, what to say? Yeah, I'm actually love Andy's idea. It's definitely amazing. And it's also easy to realize it's not something impossible. I think nowadays we can do it enough, not in a week and a month, we just some terraform in the middle and you also have everything as a code. Overall it really depends on the business. So that's why it's very hard to give you a neat answer on this.
Demetrios [00:42:18]: It's funny, I've been trying to craft the perfect tweet around this idea of, you want to know how to be really valuable at your job as a data engineer, ML engineer, data scientist. Get really good at understanding auto scaling and make sure that you have that on point and you are going to be very loved in your job.
Stefano Bosisio [00:42:48]: Auto scaling. And also let's go for some multi processing, which is always useful. But auto scaling doesn't come for free, which means it's true, you need to understand it, but try to understand as deeply as you can, like, because I think there's always a surprise when you're using auto scaling mechanisms. We spotted a lot of times when we were using our Kubernetes infrastructure that auto scaling was either going to skyrocket level or wasn't working at all. Why? Because all the settings were all right. We set up everything correctly. We say don't use these. And you was using this.
Stefano Bosisio [00:43:31]: So auto scaling is a great mechanism, but as you said, deeply what, how to use it. Exactly. Because they can also skyrocket your build as well. Just to go back on the build discussion. So I think only if spinning up a 100 under a 100 instances of the same type. So. Yeah, but you're right, you're right. This is something that nowadays is getting more and more important.
Demetrios [00:43:59]: Well, that's why I was saying that you're super valuable if you can be the expert in the auto scaling because it's so easy to lose money so quickly.
Stefano Bosisio [00:44:09]: Yeah.
Demetrios [00:44:09]: If you don't configure everything properly or you don't set up the alerts or the limits or whatever it may be, then I get the feeling that everyone's gone through that story that you just told where you're like, huh, I wonder why it's so expensive all of a sudden.
Stefano Bosisio [00:44:26]: Yeah, yeah, exactly. So bear in mind, but very good, very good advice. Really?
Demetrios [00:44:33]: Yeah, that's what I've heard. Some people put it that when you have those mistakes, like the auto scaling mistake, and it is a little bit expensive, all you did there is you paid a dumb tax or a stupid tax, and that's the tax you have to pay for being stupid.
Stefano Bosisio [00:44:52]: Yeah, well, that's a, that's a price, isn't it? But let's pass this message. We're not perfect. So, yeah, so to everybody, especially to managers, if engineers do mistakes, it's okay, we are human. So there's no way. I made tons of mistakes. Actually, when I was delivering the shiny, amazing ML platform for a week, our main models was, model wasn't running correctly. Why? Because I forgot to add a flag to the model. A simple flag, like just a dash, dash something text to the model.
Stefano Bosisio [00:45:30]: I say, oh, dear. So it's mistakes are there. Even in production, I got some money, but what can we do?
Demetrios [00:45:42]: What are some other ways that you or some people that you've worked with have had to pay that tax?
Stefano Bosisio [00:45:49]: Many people, all the people I know, they're working with this update, there are always, always huge problems in production. Like there has people that left some even simple lambda functions running in the background for standard systems. And gosh, unfortunately, they were spinning up. God only knows what because the setup was wrong and it's been like $50,000 in a weekend.
Demetrios [00:46:21]: Oops. Well, yeah, you log out on Friday. My job's done here.
Stefano Bosisio [00:46:28]: Come on. But like, gosh, I have some stories, but I can tell you. But there are many stories of people like even the simplest auto scaling you can think of, like data flow itself. That also is a product that I like because allows you to create the easy data pipelines with Apache beam. It has an auto scaling mechanism. Right. But as you said, farley, deeply how it works, because just mess up with a number, puts like 100 nodes as a maximum or whatever, and the auto scalar doesn't recognize this number, and I do. It has happened that it's going to scale up to thousand nodes.
Stefano Bosisio [00:47:10]: And of course you say why. There's a spike in the cost here. Oh, dear. And it's a stupid problem again. So these stories are. I think they're always easy. I actually remember, okay, this is. I don't know if I should say this, but I will.
Stefano Bosisio [00:47:25]: When I started my career as an MlOps engineer, the very first thing I've done was to also transition from AWS to GCP. And I didn't know how exactly this company was working with GCPD. Said, okay, these are my keys in JSON. Okay. I don't know, maybe it was, uh. It was during the lockdown, so wasn't totally my right shape. I just do.
Demetrios [00:47:48]: You're watching videos on WhatsApp while you were deploying.
Stefano Bosisio [00:47:52]: I just do a GitHub out everything. Shit. I even push my GCP keys publicly on GitHub, so anyone could have actually access to GCP. Our GCP environment said, whoops, thank God. Here is a matter of. Now what's mlops sacking for sex? Security. Right. Immediately spotted.
Stefano Bosisio [00:48:14]: Say, oh, here there's a huge, huge setup of plague. Wait, so. Well, I'm the first one to say we're human. Things happen. Yeah.
Demetrios [00:48:26]: Yeah, that's fun, man. Well, another thing that I'm really excited about is the course that you just put together for the mlops community learning platform. And I think it's pretty special because you basically walk through the process of standing up a platform, and it's soup to nuts. It's very thorough. And you're using one of my favorite platforms. You show people how to use Kubeflow, right?
Stefano Bosisio [00:48:54]: Yeah, exactly. And even how. It's still Kubeflow locally, which is. And nice. So don't.
Demetrios [00:49:03]: You said it. I'm glad you said it and I.
Stefano Bosisio [00:49:04]: Didn'T, because I've even said this in the course, but don't try this at all unless you're following my course. So, yes. Yeah. So the idea of the course is just to be an interface to students or people who are interested in mlops and want to give it a try and start to learn, really hemal ops. And I think this refers to the discussion we were having about how we can standardize mlops. And so if we can find common patterns for most of the tasks that we have to do. So it covers like data processing, model training, and model experimenting or tracking, not model deployment, because I said there are multiple solutions. It wants to be a very first, gentle course for everybody.
Stefano Bosisio [00:49:49]: The idea is to just guide students and people throughout what the industry wants and what I've learned so far, which might be a very simple course to follow to have the right transition into the MlOps perspective. So I'm spending a lot of minutes of the course explaining what's mlops perspective, so how we should subdivide each problem into the Mlops perspective, data processing, what it means and why. What's the model training from the data science point of view and from the mlops point of view, why we need to track experiments and so on. So it's like me guiding with my hands all throughout the course and showing said, hey, be careful here, and giving a lot of comments, but also a lot of freedom. Like, I don't want people to get crazy with quiz or tests and say, oh, don't have time. I want people to enjoy and practically on their own. We are using the GCP platform, which comes also for free. If you want to start with initially, that's a good case for you to learn, have your hands dirty, which is my preferred approach, and start really to make up.
Stefano Bosisio [00:51:05]: I also give bonus pipelines, like a pipeline escape data from Google or I, and gives you a cloud of words. So there are many little things, but it wants to be really what is missing, like a gentle introduction to this mlops word.
Demetrios [00:51:23]: Well, the cool thing about the ideas that you proposed there, where you're saying there's the tech part, which is great, but inevitably that's going to change, that's going to constantly be changing as time moves on. There's going to be better ways to do it. But the mindset and the way that you approach these problems and understanding what the motives are behind the data scientists, the data engineers, the ML platform engineers, what each job and responsibility and how these folks think about it is going to be invaluable as you progress in your career.
Stefano Bosisio [00:52:06]: Exactly. So what, sorry, what also to give to people is a conceptual or critical way of thinking. So the technology stack is that one. We can maybe make it sound or whatever, but the cool thing is be critic about the tools that you're going to use and is the first also to all our conversations. So what I'm showing as well is by Apache Beam why we should use it, why we should not use it, what are the pain points of using it, pros and cons and so on, for all the packages, for all the products that we have a good engineer as maybe a structured way of thinking, say, okay, my business has this problem. I can see on Internet that we have these tools to solve it. Why should I use tool a rather than two b and so on? So then this refers also to all the soft skills I was speaking about before. And it's all part of a big soup that has a huge value.
Demetrios [00:53:16]: Well, and just thinking about the questions that you should be asking, how should I examine this issue that's in front of me? It's funny you mention that because I'm literally reading this book right now called bulletproof Problem solving. It's by a few ex McKinsey consultants, bigwigs, but it talks about breaking down very complex questions or problems into decision trees and understanding them and being able to tell where you need to do research and what kind of outcomes you're looking for and defining the problems. I'm really loving it. I've heard that some people are like, ah, but it takes the creativity out of things. I don't know so far. I think that might just be the way that you approach it. And creative people are going to be creative in general. But the idea is sound like, how do I approach a problem? How do I know which questions to ask? And when? I'm looking at a ML platform and I'm thinking about different tools that I want to add in here, what kind of questions should I be looking at? I think that's one of the reasons that so many people like Joriss book, because in each section of the data engineering lifecycle that he maps out, right.
Demetrios [00:54:42]: He has what you should be thinking about. What questions should you be asking in this life cycle piece? Right. And so if it's ingestion or if it's transformation, and by how you answer that, it gives you a few constraints and then it helps you make your decisions a lot easier.
Stefano Bosisio [00:55:03]: Yeah.
Demetrios [00:55:03]: Because if you're just going off of, oh, well, it's new, it's got a lot of attention around it. I've heard, I've heard good things from people about XYZ tool that's not going to be the most adequate way of doing it. And so, yeah, man, I appreciate you creating the course. Hopefully a bunch of people out there go and they check it out and they learn a ton from you. This has been really fun.
Stefano Bosisio [00:55:28]: It was really nice. I hope people will like it. And also they can write a comment, say Stephanie, you've been too serious. Please add some fun to the course that maybe some points. I was very serious where when it comes to explaining cube floor, how to install it, you need to keep embrace yourself. But to be just so this, if.
Demetrios [00:55:50]: This is the 101 course, what's going to be the 202 course? What's the next?
Stefano Bosisio [00:55:54]: God only knows. But definitely something about. Okay, everyone is trying to do something about LLMs, so it might be training LLMs, dealing with LLMs. We can explore that path in my bin. Something either lms or something very, very, very technical like infrastructure as code.
Demetrios [00:56:12]: Nice.
Stefano Bosisio [00:56:12]: Because terraform for shoplay, guys. Terraform. We do need it. We do need it. Having our ML platform being screwed up. Yeah, the matrix. We can talk about it.
Demetrios [00:56:27]: I love it. And if anybody has any other ideas on courses, just hit us up in the comments. And because we're building out the Mlops community learning platform, and I think it is a great excuse to get some of the incredible people like yourself in the community to create courses for the platform and disseminate the knowledge amongst the other community members.
Stefano Bosisio [00:56:52]: Two kinds. It's a way to give back to the community because there's a mutual benefit to learn also from the Slack channel.
Demetrios [00:57:01]: Boom, there we go.