Reinvent Yourself and Be Curious
Stefano Bosisio is an MLOps Engineer, with a versatile background that ranges from biomedical engineering to computational chemistry and data science. Stefano got an MSc in biomedical engineering from the Polytechnic of Milan, focusing on cellular biology, genetics, and molecular simulations. Then, he landed in Scotland, in Edinburgh, to earn a PhD in chemistry from the University of Edinburgh, where he developed robust physical theories and simulation methods, to understand and unlock the drug discovery problem. After completing his PhD, Stefano transitioned into Data Science, where he began his career as a data scientist. His interest in machine learning engineering grew, leading him to specialize in building ML platforms that drive business success. Stefano's expertise bridges the gap between complex scientific research and practical machine learning applications, making him a key figure in the MLOps field. Bonus points beyond data: Stefano, as a proper Italian, loves cooking and (mainly) baking, playing the piano, crocheting and running half-marathons.
At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
This talk goes through Stefano's experience, to be an inspirational source for whoever wants to jump on a career in the MLOps sector. Moreover, Stefano will also introduce his MLOps Course on the MLOps community platform.
Stefano Bosisio MLOps Podcast Stefano: [00:00:00] Uh, Stefano Bosizio, working for Synthesia and I'm, I'm Italian. So a strong homemade coffee with my own machine. Demetrios: We are back for another MLOps Community Podcast. I am your host as always, Demetrios. Today talking with Stefano, we got into some of his biggest failures as he has learned how to be an ML engineer and learned how to create ML platforms. He went from the world of PhDs in chemistry to building ML platforms at financial companies, aka banks, to starting from scratch and having to build out. What he called a Ferrari of an ML platform. Only to realize that it got no internal adoption. Let's get into it with Stefano, and hopefully you do not make the same [00:01:00] mistakes as he did, because you learned from what he talked about today. I will also call out that he recently released a course on the MLOps community learning platform. Highly encourage anyone who is looking to learn how to ask the right questions when you are building out your ML platform to go check out the course, enroll. Stefano is a class act. And in this course, he specifically goes through the 101s of building an ML platform with open source tools. And then with. A tool like Vertex AI. Let's get into it. Dude, well talk to me about moving from academia to the industry. I know you have a strong background in chemistry and then you've done a whole wide array of different things now since you've [00:02:00] moved out of the academia industry, but what has that been like? Stefano: Everything starts back like more than 10 years ago when I was a very little Italian engineer, uh, with a degree in biomedical engineering. I was so deeply in love at the time. At the time, AI As everybody's saying, wasn't a thing. But it was a thing, though. Like, if we see the conversation with GANs and so on, it was a thing since 1991 and even before. But, no, I really applied very well. Like, TensorFlow was the first package I think it was released. And, but, yeah, to make it short, I was, uh, yeah, the passion of research was, uh, uh, totally firing me up. I was on fire. I want to do research. I want to discover things. I was in love with genetics and so on, so, and physics, and so I said, wonderful, I want to do a PhD. Ah, and that's how everything started, unfortunately, unfortunately, maybe. [00:03:00] So, uh, and that's how I landed in Edinburgh, Scotland, for a PhD in, uh, Computational Chemistry. Um, it was 2014, if I'm not wrong, uh, amazing. So something I always recommend to everybody, so if you feel yourself, you're, have the passion for research. Want to discover something new? You want also to continue to study, give yourself a chance, and try to maybe apply for a PhD. Who cares if you're not earning money? Um, I still remember, I was getting That can come later. Yeah, I was getting like something about, I can tell you numbers without problems, a thousand pounds a month. I was very, very stressed the situation because you got, you got all the expenses. Uh, the British pound was very strong compared to the Euro, so coming from Italy wasn't helpful enough. intro: Yeah. Stefano: Um, but apart from that, it was an amazing experience, really. So, um, I'll always be thankful to my [00:04:00] supervisor, uh, because when you're trying to do the search, At some point you start coding up, maybe. Especially computational research. Coding up, which means maybe my level of coding was more or less zero. And then you start to, uh, get more around Python, C, C at that time as well. And say, mmm, cool, I can do serious stuff from here. And, well, as I said, the level of details you can get through, uh, you can get very deep, deep inside, like multiprocessing, or, uh, crazy stuff, and statistics as well, because that was the theme of the PhD at the end, uh, that you got, uh, you're gonna acquire a lot, a lot of skills. The only problem is, maybe, that, uh, you're too attached to academia. So you're too attached to a way of thinking, so you think everyone knows what you're talking about. Um, [00:05:00] you really don't have a lot of soft skills, you don't know exactly how to interact with people. This might be a problem with many supervisors as well. Not mine, thankfully, he was a great he's a great guy. Uh, but of course, of course, that's that's this part of the game. And, and from there, what happened, uh, well, yeah. The PhD is finished, so that rings me a bell, saying it's time to find a job. A job where, and what, and who? Well, it was 2018 if I'm not wrong, so it was the Since 2016 there's been a huge hype for data scientists, so since I did a lot of coding, a lot of physics, a lot of maths and statistics, I said, what the hell, I can do this thing. I can give it a try and then jump on to invention. So, I'll say, that was the very first contrast in my life, passing from academia to a real company. The first jump I've done [00:06:00] was, uh, working as a data scientist for a, uh, we'll say a fintech company. So, it was a bank, we were selling products like mortgages, loans, credit cards, whatever. Whatever a bank can sell, we were selling. Yeah. Um, that was a good time to join the company because, uh, they were trying to build up this new data science team. So what's data science at that time? The question, what was, what is data science? What can we do, what a data scientists can do? Um, yeah, the matric, it was, uh, an amazing opportunity. Uh, good memories, but, uh, very harsh to understand what I had to do. Passing from a PhD to a company, I remember many times these people were just talking about, about, uh, acronyms or, uh, weird financial stuff. I'd say, what? What? Who? Where? And I was trying to say, guys, this is, like, simple stuff, like introducing GitHub.[00:07:00] This is GitHub, this is how it works. Uh, so you see, can you see? We can keep track of the code and so on. Um, that was very hard to give people, uh, to deliver this message maybe to people, as well as building models, as well as the way we were building models, because, you know, I was used to run my stuff on a Ubuntu machine, freely, do whatever I want, as many GPUs as I want, right? While here we were very constrained on Windows Laptop, you can't install anything because of course there might be All the packages must pass through some security checks and so on. So, it was really a heart attack for me. But it was a good opportunity, a very good opportunity if I see now back in time. It gave me all the, a lot of initial skills that, uh, Many PhD students can't acquire, unfortunately. Um, you know, the [00:08:00] PhD environment is very protected. Um, although I'm strongly a supporter for PhDs, sometimes I say, yes, but please make sure you can get also some soft skills course, or, uh, do think of these soft skills, because they're not as useless as you might think initially, they're super useful, and once you learn them, uh, you'll open up a lot of doors. So that's more or less how everything started, and how I changed my file for, um, the search to a file for data. Uh, yeah, there's been, of course, many, um, familiar thoughts. Uh, like I would say, many, many reasons why I didn't continue with academia. Like, not only the fact that you always have to look for grants and so on, but, um, It's, it was mainly because my wife and I loved Edinburgh and wanted to stay here a bit more. So I said, okay, I can, uh, [00:09:00] I can have a stop, a break from academia and jump into this, uh, data science thing, people, but it was super, it was super amazing, really. Demetrios: You have since evolved. Basically taking the evolution from academia to the industry, working with fintech. And then you started to recognize that just data science isn't enough, I imagine. And if you want people to have value from the work that you're doing as a data scientist, you need to learn a little bit more of the platform side of things. Can you talk about what that means? Standing up different platforms and when that realization hit you. Stefano: Yes. Yeah. So totally, um, you're other percent, right. And also here, there's a nice story. So, okay. So I was in this fintech, um, and gosh, we were developing models. Uh, and now that I think I start [00:10:00] laughing because we were developing models on a Python notebook, of course. And, uh, you know, there's this, uh, amazing theme of pushing notebooks to production. Um, nothing against a solo write. Deploy whatever you want, I'm totally fine if you manage to deploy. Um, by some point, of course, we were struggling the, uh, the level of going forward, um, Like, at that time, I think, there wasn't a real understanding, yet, of the MLOps practices. So, what happened is that we had a DevOps team, and that was us, like, call it DataOps. We were developing models, and we were trying to do something with these models, but something I didn't clear, like, Um, I remember at that time, we had some weird EC2 EC2 instances on AWS, Kind of mixture, and these guys were taking the binary file, the model, and trying to [00:11:00] make it run, but it couldn't even run in real time. It was a very A convoluted system to get to something and get a little bit of value out of what we were doing. So, as you can imagine, the value itself was very hard to deliver because there's also something I'm telling everybody. If you can't exploit as much as you can the business value out of your data or out of your work, Then, unfortunately, there's a lot of things to do, because we do need, we do need the business value. So, uh, my manager at the time, another good guy, uh, take me, uh, for a meeting, say, Steph, there we go, this is AWS, why don't you develop an ML platform? I say, I don't know anything, but I can try to see what you're talking about. So, in those, I think, three months, I had a huge, deep immersion [00:12:00] into AWS, trying to understand what was going on and trying to reinvent something for my career. So I said, initially I said, okay, I'll be a data scientist, I'll develop nice models. And then I was starting to look at ML Plasma and say, Hmm, these ML engineers know something that I don't know. And so, uh, initially, there wasn't really, I think, also many resources because it was still 2019. It's through the I can't remember when the MLOps community was founded. Demetrios: 2020. Oh, right. We weren't around yet. Stefano: Where is the other thing? So I said, where do I find all these resources? There was They were all good, good, good books about AWS and how to maybe make things communicating. And so the first thing was a lot of philosophical discussion of what we really need to do, what we mean by deployment, how we can decompose all the problems we had. [00:13:00] So it was really nice because it was reattaching myself to my engineering background. So giving a problem, try to split into sub problems and try to reconnect everything. In this weird jigsaw. Um, and so, yeah, from there, like, I started thinking, Okay, we need something for data processing. And then I said, okay, we need something for model training. Okay, how can, how can we really train the model and what can we do with, uh, the binary files that we have, or the model themselves, how we can really spin them up. So it was an amazing journey because it allowed me to really, um, start from simple. So, develop your ML platform with Lambda functions. And step functions. And try, and then try to make up something more, uh, robust, so It has the time for, uh, it was the time for, uh, glue, for data processing, Uh, using a proper database, create a data structure, Um, let people start to communicate better with people, And make them understand [00:14:00] what we need to do with data, what we need to do with the model, why we need all this weird, weird stuff. I remember it was very hard, very hard, I think nowadays it's super hard as well, to talk to senior managers, very senior managers about these things. Like, that was a lesson, of course, they don't need to know all these nitty gritty details about what an ML platform is, they just want the business value. And what I was trying to do was to give them the business value, but I say, watch this, this is amazing. But apart from that Demetrios: Oh, the demo? For an hour? Give him an hour demo and show him how amazing it is. Stefano: It's what I tried to do, but it was very hard, really. But apart from that, it was an amazing opportunity, really, to try to understand how an ML platform could be developed and try to think, really, of naive MLOps could be. So trying to get more into the MLOps perspective. So what we're trying, what we're trying to do, what we're [00:15:00] trying to achieve, and especially if you think with very limited resources, because still I was working with my limited Windows laptop, so developing coding was the worst experience ever, and even pushing code to AWS was A real nightmare. We did need also IT to, to be in the middle to say, okay, well, this, this data must be pushed over there. Okay. Um, but it has, uh, definitely moved a lot of conversation. Also in, uh, I think we were one of the first banks where we were trying to develop for real, uh, an ML platform. So really think of an entire ML journey, really think how, uh, ML models should be, uh, created and constructed. So, it was an amazing thing, and really, and this time, my passion for MLOps started to fire up. What the hell? I said, whoa, and it wasn't MLOps, it was engineering, ML engineering in total. I said, ooh, I could develop models, but I could also push them [00:16:00] to production, I think, something similar. So, that was the way, maybe, the kick I had. It was really, um, I don't know, maybe it's the state, whatever, uh, that pushed me down this direction and trying to navigate a bit more into the MLOps sector. And gave me also the, um, like, uh, at some point I, I was fed up with all the financial things. Okay, good financials, but I'm not getting a bonus as high as the senior manager, so I'm leaving. Um, so I started to deeply focus really on MLOps. And that's also where I got to know the MLOps community, because I started work really for an, as an ML engineer for another company, um, browsing on the internet, spotted MLOps community, what are these guys doing? And that opened up an entire series of great things. Demetrios: Dude, well, channeling my inner Bezos, because you're talking about AWS, and you're talking about things that [00:17:00] were hard back then, and are still hard, a la talking to senior management. I would love to know what you feel like, over the years, hasn't changed when working with MLOps. Stefano: Hasn't changed. Demetrios: Yeah, that hasn't changed. A lot has changed, right? Stefano: But what hasn't? Hasn't changed. Stakeholders. So, the first thing that popped into my mind, okay, stakeholders, okay, two things hasn't changed. Stakeholders and selling products. What I mean is this, uh, well, stakeholders is the major things, I think, and this is something, again, it's very hard to grasp, especially, I think, if you're changing career, if you're moving from academia to, uh, companies and so on, um, as the way you're communicating. So, like, uh, every child, like, as an engineer, it's very easy for us to create a new product, right? Do we want an experiment tracker? There we go. Do you want a super cluster with, uh, 30 [00:18:00] nodes, 100 nodes, 1000 nodes? There we go. Do you want auto scaling? There we go. But what doesn't change is the way we are communicating. We are doing a magnificent job, right? But it's very hard to communicate and express the business value out of it to more senior managers, of course, or even our stakeholders, like many times it happened to me. Um, I, I was working for this company where we were really creating the very first ML platform for the entire company. And gosh, one day, at the realization, the epiphany that we made an amazing job. We had an automatic way to train models. All the models were registered. All the models could be in check, keep track of all the experiments. It was everything very neat and polished. Demetrios: Monitoring it all. Stefano: Yeah, even monitoring, which is like, it was the holy grail of the time. You know, if your data are a bit shady, [00:19:00] that's why we have monitoring in place. We even had data pipelines for validating data. Nice. And gosh, people were not using it. Had like two weeks of depression or even a month. And I say, why people are not using this thing? Because it's amazing. I say, the people should jump in music. Oh, oh, oh. Because they found out this is an amazing product. Demetrios: You were expecting an award? Stefano: What the hell? Or at least DeGuardi gives the top. Demetrios: Yeah, yeah, Stefano: yeah. I was expecting fireworks or something like that. But nothing. We were really bouncing against the wall. And that's the, maybe I'm naive, but this realization I had to say, I didn't communicate properly with my data science team. Like, which means, it's not only having meetings where you're explaining what you're doing, it's not having stand ups where you're showing the Jira board and so on. It's giving demo, tutorials, taking time with them, and it's a huge effort, it's a huge amount of time, but you [00:20:00] must dedicate this to, uh, these people. Otherwise You're not going anywhere. Um, like, I remember I started to record myself and giving tutorials to, for example, show how to ship a new pipeline for training or how the strings are automatic and how they can improve our way of doing, right? At the same time, they were also deploying instantaneously if the model was behaving well and so on. So, uh, writing documentation as well was another thing. Like I spent hours on all our packages, writing documentation. I say, okay, if you want to use this, you need this, this, and this. If it doesn't work, of course, we are here to help. So, so like being more like a 24 seven, uh, support team, we are here to help making sure everyone knows that we are here to help so that, uh, people could say, oh, okay, so I want to create a new model. I can go to Stephanos team. And ask what are the best practices and do, um, [00:21:00] because yeah, we were getting hidden by, uh, by everybody, basically, because no one knows, uh, what we were doing, why we were doing some stuff, what was the value we were giving to them. So I think the communication is something that I had to spend a lot of time and as well as, um, Demetrios: Well, wait, before you get into the next point. You basically had every founder's worst nightmare, which was launching to crickets, right? And you should have told me because I would have given you an award for sure of best ML platform that's never been used or something along those lines. We would have in the MLOps community yearly awards. We could have totally given you that. But the, uh, the thing that I am fascinated by is how you attribute your The failure to your own lack of evangelizing for the product that you're building. And so it was almost like [00:22:00] you did not build enough in public, like internally in the company, you weren't showing people, like you said, you weren't doing demos. You weren't properly documenting or having potentially like pair programming sessions or just like learning sessions with the data scientists. Were the data scientists your only stakeholders or were there other folks from different teams too? Stefano: Yeah, also product managers are, uh, The user experience team, data, data analysts, a lot of data analysts as well, because we were actually, we were still giving tools to data analysts. So that was amazing. Like the ability to query in real time data through simple SQL queries. What the hell Dimitris, we made an amazing job, but it wasn't received directly by your Knights. You know what? Demetrios: What a platform, what a platform. Oh my God. All right. Well, I'm patting you on the back a few years too late. Stefano: Like. I think back in time, it was an amazing job that we've [00:23:00] done. And Mike, it was really, uh, a good case study to start to learn how to do MLOps for real, and, uh, write this stuff on the MLOps community Slack channels. Yeah. Demetrios: Yeah. We had this. Yeah. You got a lot of questions answered in there. That's funny. That's awesome. Stefano: But yeah, as you said, I think the, uh, you do need to reel into an effort to, uh, show yourself to the business. That's a bit to show what you're really doing, so that people can have an idea at least. And, and that's not easy. Also, some people will hate this, unfortunately. But, gosh, please, try to put yourself into someone else's shoes. Uh, try to speak up a little bit more. Because it's very important. Unfortunately, many people I've worked with are very, are very shy, introvert and so on, so they do like, they love coding. They're amazing coders. But I, I, I'm [00:24:00] a, I'm not the best coder ever actually. I'm very, I'm very bad. But these people are amazing. Like. I say, okay, okay, okay, okay, but wait, wait, wait, try to, uh, push a bit more on this thing. Also, one thing that is very important sometimes is the emotional intelligence. I think you do need to understand what are the feelings of other people while you're talking. So be empathic and so on. That's even another level of communication, but it's very important. So that You can match up, find the right communication skills. And so, for example, I know that with Tom I have to talk in this way, with Frederic I have to talk in, uh, this other way. And so on. So being like, how to say, a Swiss knife, where you got all the tools that are hitting properly where you want. Because otherwise, unfortunately, if no one is using your platform, it might be like this shiny beast, but you know, no one will use it. Your company is actually spending money because the infrastructure is up. So the funding, at any rate, flowing and at some point the CEO will say, hey, [00:25:00] what are this money? So that's a, um, that's a huge effort. Demetrios: Yeah. It's like you have a Ferrari and you're taking it to the supermarket. Stefano: That's what I'm saying. And then it's when I remember we had, I was used to organize for this company, um, what's so called ML clubs. So like I was, uh, inviting guests from externally from the company to show their data science cases and so on. And one day we tried to make something internal, like inviting all the company and show what the data science department is doing. So data scientists will do this, data analysts will do that, MLOps is doing this. And I think the best definition I had for MLOps was, we, we're trying to build up a Ferrari, data scientists are the drivers, and we are trying, we are the mechanics, so we're trying to make the Ferrari even better so they can go as quick as possible and win all the races. That's But that's, that's the, that's the thing. And if you want bonus points, another thing that is in change is like people [00:26:00] selling you ad hoc bespoke, uh, solutions, which are super expensive, but they don't work, but we can close it down. That's a Demetrios: hot take. Wait, wait, wait. You can't just stop talking after that. Tell me more. Stefano: So I remember many times we had chats with either external companies or cloud providers. I will not say which ones. That's, uh, I mean, at that time the MLOps wasn't still quite a thing, so that's why, that's why. But they were trying to say, you know, you could use the, they were just showing, um, an infrastructural diagram, right? So, nothing really complicated, something that you could do yourself, but, of course, they were creating this for you, and they were asking, like, more than 100, 000 a year, just for, uh, for it, plus access to all your data. And, okay, wait a sec. Wait a second, this didn't change. I can still nowadays, I guess, see a lot of solutions which are, [00:27:00] okay, amazing effort, but I mean, this can be replicated by a good engineering team. If you don't have money for an engineering team, then it's okay to go for these solutions. Otherwise just develop internally. That's my two cents. Demetrios: So this is fascinating to think about because there's always that ROI trade off that you're going to be looking at. And is it? Best for your strong engineering team to go and build this themselves or to just buy it and then it unblocks them to go and build other cool stuff. Right. And so that's the, that's the fun thing that the leadership, I'm sure, love to hear from you about and say like, wait, so what's this and why would we need it? Why, why do you think it's important or whatever? Stefano: Yeah. Well, You hit a good point though. I think it depends, like, if you want to deliver immediately something, you do need [00:28:00] to, like, okay, like, many startups at the beginning of the, of the race, of the journey. Uh, might decide to go for external products just because they're, you know, they're already done. They just cost some money, um, but you don't, you can't afford, uh, an engineering team. But, if you have your team, I think, do give them a call. Uh, a chance, uh, try to, try to understand their potential, what they can build, what they can do. Um, it's also like very satisfactory from the human point of view, right? Demetrios: And sometimes it's just egregious what some, like, if you look at SageMaker and the prices on SageMaker versus if you were to do it yourself, the endpoints are like 8x more expensive, right? Stefano: Yeah, exactly. Exactly, yeah. Another good point. Demetrios: So, there are things where you can, and that's why I'm a huge [00:29:00] fan of FinOps, and figuring out, are there ways that we're doing things now, that with just a little bit of tweaking, We can ideally save a lot of money. So is there low hanging fruit that we did when we were trying to go fast? But we now can go and almost like refractor our vendors and who we are. supporting and make sure that we're just getting the best and the most out of every tool that we're using. Otherwise we can cut it. Stefano: A hundred percent. A hundred percent agree. Yeah, absolutely. Demetrios: So then, all right, you built a few platforms. You learned the value of that internal evangelism. You also had The experience of learning what exactly MLOps [00:30:00] was. Do you feel like, and this is a question that I constantly am thinking about, because you've had many different use cases. Are there pieces that you can generalize and say, in general, I'm almost always going to need these parts of the pipeline or of the platform. If you are doing something with computer vision, It might be totally different, and that's kind of why I wonder if you can generalize on this, like, you're still going to be having, you're dealing with data, right? But a computer, computer vision problem is way different than a fintech, uh, fraud detection problem. Stefano: Yeah, yeah, you're right. Uh, hundred million dollar question. But this is something like I'm bouncing on right now in [00:31:00] my current job because we're dealing with a lot of computer vision problems. Like, so now our data are videos and not tabular word. And this is a drastic change. So, like, I would say that on the paper, like, you need to carefully subdivide all your problems. So, like, The starting point is always the data, right? So, how can we make sure we have a data platform? How we can make sure that data engineers know what we want and people can query their data easily? What does it mean querying a video? As you know, it might be querying like a path to a bucket where you have a video. I might be querying metadata of the video. If you see in this way, it looks like the tabular problem and the computer vision problem might be similar. It's a query, you need [00:32:00] metadata, you need to know where your videos are. Okay, this can be, this can be maybe generalized. So, overall, data pipelines are always there. Um, maybe for computer vision problem, they, yeah. Also for computer vision problem, for example, thinking on general case, they could validate data. For example, you might have a video that's length is zero seconds, you see. What's a video? Oh, zero seconds. Or you can extract like audio information, right? And from there you can make up some validation tools. Uh, what, for example, is not general is the way, of course, you're treating the video. So you must be, uh, critic enough to, uh, understand what's the business case you're investigating over. Do you need a YOLO detect because you want to detect something or you want to, I don't know, general description of videos and so on. So, but here we're going on to a deeper level. So I think on the general point of view, maybe data side, we could say, okay, there's a [00:33:00] database, the database may work differently, but overall, what are we going to use? I don't know, uh, Athena, Postgres, BigQuery, Bigtable. Yes, these tools are always on the, uh, standard side of, uh, DataOps. And then you, you move forward and you start to think, okay, the next problem is how I can make people training a model. So I think, personally, for training models, we're getting more, we're converging to more best practices and standard ways. So, like, okay, I know that you're NOSP, I am. I'm a big fan of Kubeflow. Demetrios: Oh, love for Kubeflow, but the matrix I got nothing against Kubeflow. Stefano: Oh, the power of AI. Love for Kubeflow. Look at that. This is AI for Mac. So, um, like, I think, overall, what you need is to create a training pipeline, right? A pipeline that is able to retrieve your data, I [00:34:00] And then crunch this data and generate a model, whatever. Now, um, let's not take into account LLMs just for a second. If we need to create a training pipeline, this problem can be easily generalized. Also, also, yeah, in my opinion, can be easily generalized. Because what's the difference between creating a random forest? And creating a model that maybe is surrounding a, a yellow algorithm or, uh, an object detection algorithm. At the end of the day, the pipeline will retrieve the data that you need. It'll process the data that you need. It will train your model. Right? So more or less, and this is why I'm a strong supporter Q Flow, is because if you're also reducing the learning barrier to your stakeholders and Q Flow is very easy to, uh, to be. Use because you just need add component, add pipeline. The engineers will create a SDK for you, [00:35:00] so you can just ship whatever you want. Ok, we are a step forward, so we are reaching a great level. Then different points for LLMs. LLMs of course are huge models, so the way we are training also these models is just slightly different. You need different infrastructures. I don't think. Kubeflow may support any LLMs, I never heard of it, but Demetrios: Yeah, I haven't heard that either. Stefano: In principle, but better not to. There are better methods to train more. That's right. So, okay, if you go down that road. But, the matrix is a question for you. How many companies are using and training LLMs? Demetrios: Yeah, very few. Stefano: Okay, like, so 90 percent of the companies nowadays will still enjoy using, and sorry if I'm saying, but just a regression, Randall Forrest. Super deep forest algorithms, right? So, guys, think carefully about the costs you're injecting in your [00:36:00] company. Because it's meaningless, like spending a million dollars a month for training an NLM from scratch, when you can do a 95 percent accurate and amazing job with a random forest. Whatever forest you want, or whatever algorithms you want, right? The last step is, of course, the deployment. So, deployments, I think, we are not converging yet. In my personal opinion. There are tons of ways to deploy models. And this is really, uh, depending on any company. Like, there might be companies we would like to use, uh, vertex AI endpoints. Okay. Pay the money and we can deploy them. Uh, there are companies which say, no, no, no, we have, uh, we're gonna hit an easy two . Uh, or we are gonna have our own, uh, Kubernetes infrastructure. So we got, so the points we want, okay, computer vision, that's another terrible [00:37:00] problem. You need, uh, machines, which are of course able to do the video rendering as well. So, here, I think, the navigation is a bit complicated. I hope we are converging to specific, specific direction. Demetrios: I think, you know, Andy, uh, came on the podcast a few weeks ago and he was talking about how, for him, the change to LLMs, Versus traditional machine learning and also deep learning or computer vision and all that fun stuff doesn't really change much about how he thinks about the platform. It does change is how he builds or the tools that he needs to add to the platform. Right? And so it's what he was saying, and I really liked this idea. I like most of Andy's ideas, I tend to steal them and then just repurpose them until people think that I said that, uh, so, sorry Andy. But, uh, I thought his idea around creating an ecosystem, [00:38:00] allowing your platform to be looked at as an ecosystem, where if someone wants to use an LLM, they can. If someone wants to use that traditional ML, they can. If someone has a computer vision problem, they can. And obviously that is like maturity level 500. Versus your maturity level, like one, and you're just trying to get a few models out and make sure that they are reliable and consistent. For me, what became very clear is that LLMs are very much a part of the conversation and what people want when we talk about AI these days. And so how can you set up the platform now? To leverage the gigantic transformer models that are out there, whether it's an open source LLMA that you grab, or it's you're hitting GPT 3 [00:39:00] or GPT 4. Stefano: Ah, amazing question again. Um, so I, I think I partially agree with what Andy said because this is the dream platform. Um, so like it's something we're always trying to create, right? For any company, like you, you got these. Metaverse, where you can ping the service you want and this will do all the rest. When it comes to LLMs now, the discussion is, like as you said, we could actually using like HuggingFace, for example, APIs or calling public APIs to use these LLMs. Nothing against, it's totally fine and it's not so complicated to have them deployed because all you would need is just a simple microservice that spins up Uh, for your request. So your engineering team could even create a simple package where you're just importing all the models you need. And these models can be freely, uh, pinged [00:40:00] from the, let's call it, from the internet. The only caveat sometimes that I'm thinking of these solutions is what if there's a downtime on the external services, so you're blocking your business. Okay, the, um, um, what's the name of the guarantee service? Demetrios: SLAs or SLOs or proxies that you can Stefano: SLA, the SLA is like 90 percent of usage all the year. Okay, but you need to consider it, you don't want to block the business. And what's the cost for, uh, um, uh, pinging these models? And what's the number of customers that you're dealing with? Because now, like for Synthesia, when I'm working now, We using external models would be pretty hard for us because we are serving thousand, thousands, thousands of people, and we do need GPUs, right? So what could be the cost that we should support? An external [00:41:00] provider would charge you a lot. We say, oh, you need GPUs? Yeah, no worries. This the. Of course you need to start thinking something that can be also money saving. So, um, what if you can use spot instances? What if you can use, uh, maybe sharing the same process across multiple GPUs? Sorry, uh, different process across GPUs. Uh, same GPUs and so on. So, um, it really depends on your business. That's why I say you do need to put things into the MLOps perspective and start to understand really what's your business value, what's your, what your business want to do, because The step from going from spending zero to spending a million dollars is very narrow nowadays, especially with these LLMs, because everyone wants to use LLM. So, but overall, I have to say, yeah, I'm actually in love with this idea. It's definitely amazing and it's also easy to realize it's not something impossible. I think [00:42:00] nowadays we can Do it in a, uh, not in a week, in a month, but just, um, some Terraform in the middle. And you also have, uh, everything as a code. All in all, it really depends on the business. So that's why it's very hard to give you, uh, a neat answer on this. Demetrios: It's funny. I've been trying to craft the perfect tweet around this idea of you want to know how to be really valuable at your job as a data engineer, ML engineer, data scientist. Thanks. Get really good at understanding auto scaling and make sure that you have that on point and you are going to be very loved in your job. Stefano: Auto scaling and also, uh, let's go for some multiprocessing, which is always useful. But auto scaling doesn't come for free, which means it's true. Uh, you need to understand it. [00:43:00] Uh, But try to understand as deeply as you can, like, because, I think there's always a surprise when you're using autoscaling mechanisms. We, we spotted a lot of times when we were using our Kubernetes infrastructure that Autoscaling was either going to, uh, skyrocket level or, uh, or wasn't working at all. Why? Because all the, all the settings were all right, right? We set up everything correctly. We say, don't use these, and you was using this, and you say, oh, shit. So, um, autoscaling is a great mechanism, but, as you said, I studied deeply what, what, how to use it exactly, because it can also skyrocket your build as well. Just go back on the build discussion. Um, so I think only if it's spinning up a one hundreds, uh, a hundred, a 100 ista C at the same time. Uh, so, yeah. But by, you're right, you're right. This is something that nowadays is getting more and more important. [00:44:00] Demetrios: Well, that's why I was saying that you're super valuable if you can be the expert in the autoscaling because it's so easy to lose money so quickly. Stefano: Yeah. Demetrios: If you don't configure everything properly or you don't set up the alerts or the limits or whatever it may be, then I get the feeling that everyone's gone through that story that you just told where you're like, huh, I wonder why it's so expensive all of a sudden. Stefano: Yeah. Yeah. Yeah, exactly. Bear in mind, but a very good, very good advice, really. Demetrios: Yeah, that's what I've heard some people put it, that when you have those mistakes, like the auto scaling mistake, and it is a little bit expensive, All you did there is you paid a dumb tax, or a stupid tax, and that's the tax you have to pay for being stupid. Stefano: Yeah, well, that's a decent price, isn't it? Unfortunately, we, we, but, let's pass this message. [00:45:00] We're not perfect, so Yeah. So, to everybody, especially to managers and engineers It's okay, we're humans, so there's no way. I made tons of mistakes. Actually, when I was delivering this shiny, amazing ML platform, for a week our main model wasn't running correctly. Why? Because I forgot to add a flag to the model. A simple flag. Like, just a Uh, dash dash something text to the model and say, oh dear. Oh, so it's mistakes are there, even in production. Yeah. It costs some money, but, um, what can we do? Demetrios: What are some other ways that you or some people that you've worked with have, uh, had to pay that tax? Stefano: Many people, all the people, I mean, they're working with this, are doing this. And there are always, always huge, uh, problems in production, [00:46:00] like there's, has people that's, uh, Left some, uh, even simple lambda functions running in the background for, uh, for standard systems, right? And, and gosh, unfortunately, they were spinning up god only knows whats, because the setup was wrong. And I spent like, uh, 50, 000 in a weekend. Oops. Uh, well Yeah, you log out on Friday. Yeah, it's awkward too. Well Come on, but like, um, gosh, I have some stories but I can't tell you, but there are many stories of people like even the simplest autoscaling you can think of like Dataflow itself that's also is a product that I like because it allows you to create easy data pipelines with Apache Beam. It has an auto scaling mechanism, right? But, as you said, study deeply how it works, because, just mess up with a number, put [00:47:00] like 100 nodes as a maximum or whatever, and the auto scaler doesn't recognize this number, and I do, um, interceptance that is gonna scale up to 1000 nodes, and of course you say why there's a spike in the cost here? Oh dear! And it's a stupid, maybe, dash dash problem again. So, these stories are I think they're always easy. I actually remember Okay, this is I don't know if I should say this, but I will. When I started my career as an MLOps engineer, the very first thing I've done was to also transition from AWS to GCP. And I didn't know how exactly this company was working with GCP. I said, okay, these are my keys in JSON, okay? I don't know, maybe it was, uh, it was during the lockdown and so it wasn't totally my right shape. I just do WhatsApp. You were watching videos on WhatsApp while you were deploying. I just do a GitHub add everything. Shit. I even push my GCP keys publicly on GitHub so [00:48:00] anyone could have actually access to GCP, our GCP environment. I was like, whoops. Thank God, uh, here it is a matter of now what's, uh, MLOps seeking for sex security, right? Immediately spotted, say, oh, oh, oh, oh, here there's a huge, huge set of flag, wait. And so, well, uh, I'm the first one to say we're human, uh, things happen. Demetrios: Yeah, yeah, that's fun, man. Well, another thing that I'm really excited about is the course that you just put together for the MLOps community learning platform. The And I think it's pretty special because you basically walk through the process of standing up a platform. And it's soup to nuts. It's very thorough. And you're using one of my favorite platforms. You show people how to use Kubeflow, right? Stefano: Yeah, exactly. And you know how to install Kubeflow locally, which is a [00:49:00] nightmare. So, don't try this at home. You said Demetrios: it. I'm glad you said it and I didn't because I'm Stefano: I didn't even say this in the course, but don't try this at home unless you're following my course. And so, yes, yeah, that's pretty much it. So the idea of the course is just to be an interface to students or people who are interested in MLOps and want to give it a try and start to learn really MLOps. And I think this refers to the discussion we were having about how we can standardize MLOps. And so if we can find common patterns for most of the tasks that we have to do. So it covers like, uh, data processing, uh, model training, and model experimenting, or tracking. Uh, not model deployment, because as I said, there are multiple solutions. It wants to be a very first gentle course for everybody. The idea is to just guide students and people throughout, uh, what, uh, the industry wants, [00:50:00] and what I've learned so far. Which, uh, might be a very simple course to follow to have the right transition into the MLOps perspective. So, I'm spending a lot of minutes of the course explaining what's the MLOps perspective. So how we should subdivide each problem into the MLOps perspective. So, data processing, what it means and why, uh, what's the modern training from the data science point of view and from the MLOps And why we need to track experiments and so on. So it's like me guiding with my hands all throughout the course and showing said, Hey, be careful here and giving a lot of cameras, but also a lot of freedom. Like, I don't want people to get crazy with quiz or tests and say, Oh, don't have time. I want people to enjoy and practically on their own, we're using the GCP platform, which comes also for free if you want to start with, uh, initially. And that's a good [00:51:00] case for you to learn, have your hands dirty, which is my preferred approach, and start really to make up. I also give bonus, uh, pipelines, like a pipeline to scrape data from Google or Uh, and gives you, uh, clouds, um, of words. Um, so there are many, uh, little things, but it wants to be, uh, really what is missing, like a gentle introduction to this MLOps word. Demetrios: Well, the cool thing about the ideas that you propose there, where you're saying there's the tech part, which is great, but inevitably that's going to change, that's going to constantly be changing as time moves on, there's going to be better ways to do it. But the mindset and the way that you approach these problems and understanding what the motives are behind the data scientists, the data engineers, the ML platform engineers, what. Each job and responsibility and [00:52:00] how these folks think about it is going to be invaluable as you progress in your career. Stefano: Exactly. So, um, uh, what, what, uh, sorry, what I want also to give to people is, um, a conceptual or critical way of thinking, right? So the technology stack is that one. We can maybe make it sound or whatever, but the cool thing is, uh, be critical about the tools that you're going to use. And this refers also to all our conversations. So what I'm showing as well is, like, Apache Beam, why we should use it, why we should not use it. What are the pain points of using it, pros and cons, and so on. For all the packages, for all the products that we have, a good engineer has maybe a structured way of thinking, saying, OK, my business has this problem. I can see on the internet that we have these tools to solve it. Why should I [00:53:00] use tool A rather than tool B? And so on. So then this refers also to all the soft skills I was, uh, speaking about before. And it's all part of a big soup that's, it has a huge value. Demetrios: Well, and just thinking about the questions that you should be asking. What, how should I examine this issue that's in front of me? I'm It's funny you mention that because I'm literally reading this book right now called Bulletproof Problem Solving. It's by a few ex McKinsey, uh, consultants, bigwigs. But it talks about breaking down very complex questions or problems into decision trees and, and understanding them and being able to tell where you need to do research and what kind of outcomes you're looking for and defining the problems. I'm really loving it. I've heard that some people are like, ah, but it takes the [00:54:00] creativity out of things. I don't know. So far, I think that might just be the way that you approach it. And creative people are going to be creative in general. But the. The idea is sound, like, how do I approach a problem? How do I know which questions to ask? And when I'm looking at a ML platform and I'm thinking about different tools that I want to add in here, what kind of questions should I be looking at? I think that's one of the reasons that so many people like Joe Reese's book, because in each section of the data engineering life cycle that he maps out, right, he has. What you should be thinking about, what questions should you be asking in this life cycle piece, right? And so if it's ingestion or if it's transformation and by how you answer that, it gives you a few constraints [00:55:00] and then it helps you make your decisions a lot easier because if you're just going off of, Oh, well, it's new. It's got a lot of attention around it. I've heard, I've heard good things from people about XYZ tool. That's not going to be the most adequate way of doing it. And so, yeah, man, I appreciate you creating the course. Hopefully a bunch of people out there go and they check it out and they learn a ton from you. This has been really fun. Stefano: It was really nice. No, it was really nice. I hope people will like it. And also they can write a comment, say, Stephanie, you're being too serious, please. Add some fun to the course. There might be some points I was very serious, but. When it comes to explaining Kubeflow and how to install it, you need to keep embracing yourself. But, it's a Demetrios: bit risky. So this, if this is the 101 course, what's going to be the 202 course? What's the next addition? Stefano: Gosh, God only knows, but definitely something about Okay, everyone is trying to do something about [00:56:00] LLMs. So it might be training LLMs, dealing with LLMs. intro: Uh, Stefano: we can explore, uh, that path. Uh, it might be even something, um, either LLMs or something very, very, very technical like Infrastructure as Code. Demetrios: Nice. I like it. Stefano: Terraform, Photoshop PlayGuys, Terraform. We do need it, we do need it. It's your best friend. We don't like having our ML platform being screwed up. Yeah. Go on the materials page. We can talk about it. Demetrios: I love it. And if anybody has any other ideas on courses, just hit us up in the comments because we're building out the MLOps community learning platform. And I think it is a great excuse to get some of the incredible people like yourself in the community to create courses for the platform and disseminate the knowledge amongst the other community members. Stefano: Too kind. Well, it's a way to give back to the community because There's some mutual benefits to learn also from the [00:57:00] Slack channel. Demetrios: Boom. There we go.