Sign in or Join the community to continue

From Notebook to Kubernetes: Scaling GenAI Pipelines with ZenML // Alex Strick van Linschoten // DE4AI

Posted Sep 17, 2024 | Views 466

Share

speaker

Alex Strick van Linschoten

ML Engineer @ ZenML

Alex Strick van Linschoten is an ML engineer at ZenML. He has three cats and lives in Delft (NL).

+ Read More

SUMMARY

This lightning talk demonstrates how ZenML, an open-source MLOps framework, enables seamless transition from local development to cloud-scale deployment of generative AI pipelines. We'll showcase a workflow that begins in a Jupyter notebook, with data processing steps run locally, then scales up by offloading intensive training to Kubernetes. The presentation will highlight ZenML's Kubernetes integration and caching features, illustrating how they streamline the development-to-production pipeline for generative AI projects.

+ Read More

TRANSCRIPT

Demetrios [00:00:07]: Alex is coming at us from ZMl. Where you at, Alex? There he is. Hey.

Alex Strick van Linschoten [00:00:13]: Hey. How's it going?

Demetrios [00:00:14]: I'm great, man. How you doing?

Alex Strick van Linschoten [00:00:17]: I'm good, I'm good.

Demetrios [00:00:18]: Well, I know you got a presentation for us. I also am here just to keep us on time, keep everything going smoothly. So I'm going to share your screen, bring it to the stage right now and then we'll get rocking and rolling. Actually, no, you don't even have a present. You took the bold route, didn't you?

Alex Strick van Linschoten [00:00:39]: Just going full live demo today.

Demetrios [00:00:42]: All right, everybody pray to the demo gods this works. I'll leave you to it, Alex. Thanks, man.

Alex Strick van Linschoten [00:00:48]: Thanks, Demetrios. Hey, everyone, thanks for coming. Definitely been a great bunch of talks that I've seen so far, and a bunch on this kind of theme of like getting from like experimentation and working as a data scientist and so on, and getting things into production. So that's what I wanted to talk about. Talk about Zenml. Zenml is an open source framework and tool that allows people to work within teams and get their machine learning workflows into production. We integrate with a whole bunch of different tools, 50 or more actually out of the box, fully customizable. You can run your workflows or whatever on all the major clouds.

Alex Strick van Linschoten [00:01:27]: All of this kind of thing is taken as a given, and orchestration is at the heart of what Zenml is doing. We allow you to make your pipelines reproducible across your organization and within your team. This allows you to iterate quickly, to track everything that you're doing so that you can rewind and go back to the previous run or whatever it was you were doing two weeks ago. Your bus suddenly needs to see and minimizing the costs of switching stacks and so on. But what I wanted to talk about today is a recently improved feature of doing work within notebooks. And so I'll talk a little bit about just some kind of high level Xenl concepts to start with, and then we'll get into an actual genai use case. So with XML, you have stacks, and stacks are basically a kind of a collection of your tools and components that you have. And you start out with, obviously the local one, and this just runs on your local machine, wherever it is.

Alex Strick van Linschoten [00:02:25]: You can have a bunch of steps, and steps make up your pipeline and you can compose those steps, which are basically just python functions that you can see on the screen. Running them is pretty easily, and you put them together in a pipeline, but let's say you have something a little bit more involved and you actually wanted to run it on a GPU or want it on a really powerful machine with a lot of storage or memory or so on. You can just switch stacks. So we have a CLI command for this. You can also do it from within the notebook as I do here. I have a Kubernetes stack set up with a powerful GPU machine and we can just run the same pipeline on a Kubernetes stack. And that all happens really quickly. I just ran it just ahead of this presentation, but I promise you that happens in well, look, and we can see how long it took.

Alex Strick van Linschoten [00:03:16]: It took 1 minute, 23 seconds, so pretty fast to run. But what I want to talk about today is a little Genai example, just to show how you can experiment and iterate through a notebook and to do something that could then get you going into production for all machine learning. We have to do a little bit of annotation. So we're going to do some live annotation within our notebook. So I have two cats. One of them is called bloopers. This is bloopers, this is Aria. And we're just going to annotate some data because I only want the pictures of bloopers, actually.

Alex Strick van Linschoten [00:03:44]: So we need to separate out our photos. This is a kind of notionally simplistic data annotation that we're doing, but I'm just proving the point. Zenml annotates. Zenml integrates with a whole bunch of different annotation platforms and tools. And so this is just one which happens to work within a notebook. And there. So we've done a bunch of images, we can save the labels and these will show up in our little annotations folder here. And then I actually want to just like separate out the photos.

Alex Strick van Linschoten [00:04:16]: So if we just look at these annotations, it's just a JSON file really with a bunch of strings pairing them up. But you can obviously have more complicated annotation workflows and then we'll just move those particular blueprints images into their own folder. And then what I actually want to do is I want to augment these a little bit because for my dream booth like flux model fine tuning, I don't have quite enough images. So I'm just going to do a few things here. I'm going to add some random brightness, some noise, sharpen it a little bit and so on. This is just to augment our data. And then we're going to use the Zenml step. We're going to copy these images into the cloud Zenml, I can show you here on the dashboard in a second.

Alex Strick van Linschoten [00:05:00]: It's going to start pipeline. Yeah, so the copy images ran here, and this is just all of our images are versions and so on, and so we can use them in our actual pipeline. Now we did all of that locally. Now I want to do something in the cloud because even though I have a powerful gpu machine sitting next to me, it's not quite powerful enough to do some kind of actual fine tuning of flux. The flux model is pretty big. This flux dev monster, we do actually have some code here. This code, it looks a bit long and gnarly, but it's just stuff that we copied from the diffusers library. They have a bunch of really neat examples.

Alex Strick van Linschoten [00:05:43]: I definitely would encourage you to check them out. And this stuff could live in your notebook. I actually pulled it out into a separate train py file for convenience. And you can see here, I just import that here, but you can just shove it in your notebook as well. That's all fine. It's more for convenience that I've pulled that out. Then we're going to actually run this pipeline. It's actually running in Kubernetes right now.

Alex Strick van Linschoten [00:06:08]: You can see here, if we go to to our runs, it's actually downloading the code and so on from Zenml. Zenml is versioning all of the code underneath the hood as well as all of the artifacts and things that you're working on here. It showed up our train model. And so here you can see the stack that it's running on. You have the specific code that was used for this run. This is all version to track. You can view the logs of, you can see any configuration. You can see the bit packages which are used as well as any outputs and things.

Alex Strick van Linschoten [00:06:46]: This pipeline doesn't actually output anything. And actually it's completed its run. You see, it's a bit smart. It's using a cached version of the model. If I were to change any of this code, it would realize that we needed to unlock that caching and actually do it properly. But yeah, I realized we didn't want to stand here for half an hour as this trains, even though we have a fast machine and actually set up dream booth training with stable diffusion as well to have something to compare it to. I already ran this earlier. We don't need to run this, but that's here sitting in the dream booth pipeline.

Alex Strick van Linschoten [00:07:18]: SD similar kind of thing. Once these are done, we had it automatically push the models to hugging face to the hub. This is convenient. A lot of people use this as a de facto model model registry. And both of these models, you can run inference here, actually within hugging face. But what I want to do is actually get a sense of how well these models did on their own. So I'm just going to set this working again on our Kubernetes instance, and it's going to do a bunch of inference to see how well my model has trained with these few images of my cat. And we've got important extra images of blupus sitting on a laptop keyboard and blupus in a data center.

Alex Strick van Linschoten [00:08:02]: These are photos that I'll probably never be able to take. So it's important that we capture them. It's important that we capture the prompts as well as well as the images that come out. So this is all running again on my Kubernetes instance. We can check this back out in the xenomorph dashboards. Here we have the inference pipeline again, we ran this before, so it's going to just use the cached version. So I can just take a look at a. So check this out and we can look at the output, actually.

Alex Strick van Linschoten [00:08:33]: So this visualization gets a bunch of these images altogether. And this is all from within the Zenml dashboard. And you can see here blupus in the data center. Blip is, I think in the server room. Blupis around my blip is wearing some VR goggles. Yeah, all important images. Nice work, flux. And again, if we wanted to switch to, to run these locally, we can just run them here within our pipeline.

Alex Strick van Linschoten [00:09:04]: Sorry, from the notebook, from within our local system. And this is the stable diffusion model. As you can see, it's not quite as good as flux. Flux is the latest thing. It's not quite capturing what he looks like, but it's done an ok job. And of course we can do the same with flux. And. Yeah, and I just realized that flux is going to actually crash because we're going to run out of memory.

Alex Strick van Linschoten [00:09:33]: So just, yeah, just run flux and all of these hyper parameters and so on. If we wanted to capture all of this as well as the prompts, we could just again run it in as an ML pipeline, farm it off into kubernetes and have it run there. I think we ran out of Cuda, ran out of memory error. Yeah, there we go. And this is going to run again in a second. And we get the flux model, which did it properly. And you can see the kind of convenience of switching between something which is within a notebook, classic things that data scientists are happy and comfortable working with, and you have this kind of experimental feel to it, but it's actually you get the benefits of being in production and having this is, again, blipus in the data center. So you kind of have this.

Alex Strick van Linschoten [00:10:28]: Yeah, with the dashboard and all of the things that Zenml is support and doing behind under the hood, you're kind of getting the benefits of a production system. And this is really pleasant for both data scientists as well as perhaps ML engineers or an ML platform team who want to support you. And as I hope I kind of showed, it's basically just you're writing normal python. If you wanted to switch from running this on kubernetes and you actually wanted to run it on Vertex, you wouldn't need to change any of your code, you would just need to switch stack. So, yeah, that's kind of what I wanted to talk about today. I hope you enjoyed a little quick fine tuning of everyone's favorite model these days. Please do. Check out Zenml.

Alex Strick van Linschoten [00:11:14]: We're an open source library. Check us out on GitHub. Go to zenml IO if you want to get a high level overview of what we're doing. And we'd love to see you in slack, training your own flux models and so on.

Demetrios [00:11:27]: Excellent. So I think the key takeaway here was that because some people were asking in the chat, all right, cool. Notebooks, fun, but not production. Right? But I think the thing you're saying is you just swap them out really easily from notebooks to something more production grade.

Alex Strick van Linschoten [00:11:53]: Yeah, and I think the thing is, it's not necessarily something that we encourage people to only live in a notebook, but it's pretty easy. Once you've designed your code and stuff within a notebook, it's pretty easy to pull stuff out then into normal python modules, but it's just like allow people to do things where they're comfortable and then you can, you know, get, get to production easily.

Demetrios [00:12:16]: Excellent. Well, if anyone else has more questions for you, Alex, I encourage them to reach out and talk with you in either the ML ops community slack or in the Zenml slack. That's really cool. I'm super happy for you all and I look forward to more fun stuff, but for now, we're going to keep it moving.

+ Read More

Sign in or Join the community

Watch More

How to Systematically Test and Evaluate Your LLMs Apps

Posted Oct 18, 2024 | Views 15.1K

# LLMs

# Engineering best practices

# Comet ML

Small Data, Big Impact: The Story Behind DuckDB

Posted Jan 09, 2024 | Views 13.3K

# Data Management

# MotherDuck

# DuckDB

Building LLM Applications for Production

Posted Jun 20, 2023 | Views 11.1K

# LLM in Production

# LLMs

# Claypot AI

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io