Sign in or Join the community to continue

Vision Pipelines in Production: Serving & Optimisations

Posted Feb 22, 2024 | Views 625

# Vision models

# upscaling pipelines

# Finetuning

Share

speakers

Biswaroop Bhattacharjee

Senior ML Engineer @ Prem AI

Biswaroop Bhattacharjee is a Senior ML Engineer at Prem AI, hacking with LLMs, SLMs, Vision models and MLOps in general as an. Biswaroop has also worked on ML platforms and distributed systems, with stints under startups in Conversational VoiceAI @ Skit.ai, Chatbots from pre-and-current LLM era @ circlelabs.xyz and a bit of Fashion Hyperforecasting @ Stylumia.ai.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

The discussion will center on transitioning from solution development to production, particularly focusing on vision models. Topics explored include fine-tuning LORAs, upscaling pipelines, constraints-based generations, and step-by-step enhancements to achieve optimal performance and quality for a production-ready service.

+ Read More

TRANSCRIPT

AI in Production

Vision Pipelines in Production: Serving & Optimisations

Slides: https://docs.google.com/presentation/d/1vV1udNuEQCtaPXAu2g0d0jIVBVNP0KO2/edit?usp=drive_link&ouid=112799246631496397138&rtpof=true&sd=true

Demetrios [00:00:00]: To go now to my man Biswaroop. Where you at, dude? There he is. How you doing, bro?

Biswaroop Bhattacharjee [00:00:07]: Yep, doing good. How are you doing?

Demetrios [00:00:09]: I'm great. So you've got a little talk for us. I'm gonna share your screen. It's already up and ready and we're gonna kick it off. I'll be back in ten minutes. Sound good?

Biswaroop Bhattacharjee [00:00:22]: Yep.

Demetrios [00:00:23]: See you soon.

Biswaroop Bhattacharjee [00:00:27]: Hey, everyone. I am Vishwaru Bharatarji. So I'm working as a senior ML engineer at Prem. And first of all, thanks a lot for organizing this event like Dimitrios and the whole team, actually. And there has been some amazing lined up talks which I was listening to just a while back. Even the talk just back was very relevant on deploying llms in production in general. Now I will kind of do a similar take, but on the vision side, I would say. But it's not fully similar because it's a little bit different, I guess.

Biswaroop Bhattacharjee [00:01:07]: But yeah, it was quite relevant. Okay, I guess we can start. So I'll be just talking about vision pipelines in production and kind of how you basically maybe serve and optimize in general. So let's start. Okay, so this is just a quick rundown of the things like I'll discuss. So we'll first look at the problem like we are trying to solve and then we will just look at the solutioning bit a little bit. So how basically you are going to solve the problem if it's only one of the ways to solve the problem. And next we will cover some tricks and low hanging fruits which you can target and anyone can actually target for their vision related pipelines, I think, for increasing their performance and serving.

Biswaroop Bhattacharjee [00:01:56]: And then we'll take things a bit further on the performance side, we'll discuss onto it. Okay, so here you can see there is an image on the left side. So this is like an objective or the problem statement in our case, you can imagine. So something like, say, we have to do realistic and ethereal planets image generation. So now we have to generate lots of millions of, say, images. But it should be following few criteria. And you can see here, there are a few criterias I've already listed. So there should be like small space at both the sides and there should be small height space also.

Biswaroop Bhattacharjee [00:02:38]: And planets should be only coming under this region. And also there should be like few controllable parameters. Something like what the color should be of the planet as the planet in here is kind of bluish and whitish. But yeah, there can be lots of other physical attributes like how is the sky? Does it have any nebula? Or is there any rings in the planets? Or how is the atmosphere like? And there's like a last thing which is kind of important here, that all of these images should follow a particular aspect ratio, which is the 21 ish to nine or the ultra wide scale. So yeah, these are the kind of constraints which we will try to tackle, I guess. Okay, so I will just say that fine tuning can be one of the solutions. But why fine tuning? Why not just do any other approaches? Something like just basic generation with things like with more basically configurability. But it doesn't quite work for our use case.

Biswaroop Bhattacharjee [00:03:40]: So you can see how the image is generated at the top left. So it's like a plain simple generation and it's not very consistent with our requirements and it doesn't give us really high level of control. But now since we want a good amount of control, then obviously I think what's better than fine tuning? If you can actually collect a small number of data set, let's kind of, I guess, discuss how you would actually try to solve the fine tuning task here. So something like you can come up with a prompt creation strategy. So you can see that it's large text, but it's an ordered set of text. So it's not like any random text. So right now it starts with a color. So it's kind of for this image which you can see here.

Biswaroop Bhattacharjee [00:04:31]: So it's a deep blue planet and there can be things like frosted cloudy zones, so the atmosphere comes in the mid part of the prompt. Then there can be things like no moons or like how many moons and about the sky. Also we can write something, something like distance, dairy, dark sky and all. You come up with a prompt creation strategy and you kind of curate a small high quality data set. So you can use anything for that, like use Photoshop, mid journey, et cetera. Just come up with few number of images and it should be covering a good variance of the attributes you want to fill in with your prompts. Yeah, so here at the bottom left you can see there are a small snippet of the images and these are the things. And after that you can maybe choose a model based on how GPU four or GPU rich you are.

Biswaroop Bhattacharjee [00:05:27]: So it totally depends. And yeah, I guess let's quickly move to the results bit. But I mean, fine tuning while it happens, it's kind of boring. Let's assume it went well and now we have the results. Now the key things come up at the time of productionizing things, right? So you can see the results in here. I'm showing that before fine tuning was like this, but after fine tuning it was quite aligned with the things we wanted. So these are just like few example images, how it looks like. So it's kind of the hemispherical shape which we required.

Biswaroop Bhattacharjee [00:06:03]: And one thing we actually talked about was the aspect ratio that it should follow, 21 ish to nine aspect ratio. But here, what the model I've actually used, or anyone can use, is stable diffusion excel. Now there are various other vision models, so feel free to go with your own model, whatever you want. But actually stable diffusion Excel was trained on only specific aspect ratio of images. So it's kind of listed in here. But this is from the stable diffusion Excel paper, basically. So if we talk about the aspect ratio, in our case it was 21 is to nine, which comes out to be 2.3, if we see. And the closest aspect ratio which was used for training stable diffusion was 2.4.

Biswaroop Bhattacharjee [00:06:57]: And it had like height of 1536 and 640. So we will basically use this dimension, which is 1536 is to 640 for generating an image so that we get the highest quality possible from the model itself. Now, just simply tuning a model and generating a final image I feel is not enough. We can do much more things, we can improve it from there, much more. So here comes things like custom upscaling pipelines. So why custom upscaling pipelines? It's basically to fill all the special nitty gritty details and give it a lot of more quality. In general, I would say, okay, let's look at them, I guess. Cool.

Biswaroop Bhattacharjee [00:07:42]: So here is how a small upscaling pipeline looks like. So this is a very creative area, I feel like. So go full nuts onto it. Whenever you try to implement your own upscaling pipeline, there are lot of ways and there's no correct way. I feel currently everyone's still figuring it out. And yeah, currently how it looks like is you just take an upscaler model, you upscale it to x, then resize it back to whatever you wanted the actual size to be. Then you pass it through image to image pipeline. Then this image to image pipeline is basically to give it all the details your image was missing.

Biswaroop Bhattacharjee [00:08:18]: Now then you again upscale it and then resize it back to your requirement. So this is like a kind of pattern you can see happening in here. And in general it works quite well. I'll show you the results on the next slide. And there are actually also a good public community where people have been sharing lots of workflows or pipelines in general. So feel free to check out comfy workflows in general. They have been doing quite well in this area. So yeah, that's a shout out.

Biswaroop Bhattacharjee [00:08:47]: And yeah, here are some results. How a normal upscaling can make a difference in your images. You can see how much smudged effect this gives. But if you check this image, this is way more defined with lots of more magma caterings and all. And similarly goes for the below image. You can see how different the nebula is. This is like a bit smudged, but this is very distinct. Okay, so we talked about the quality of the model in general.

Biswaroop Bhattacharjee [00:09:20]: Now what about the performance? So how basically you can target the performance latency and throughput mostly of your pipeline in general. Okay, so if you are like fine tuning a model, mostly you are not doing, I guess full fine tuning, you might be using some kind of parameter efficient techniques. I mean, obviously you can go with full fine tuning also, but depends on you. So if you are using any parameter efficient techniques, then we have loras, right? So they are quite popular right now and kind of suggest to always fuse loras. So it helps quite sometimes if you have a loras in general to fuse it with the model. So it helps you with the improvement in the latency. Then always try to use torch compile whenever it's possible because compiled coda graphs are amazing here. And try to use quantized models as much as you can.

Biswaroop Bhattacharjee [00:10:21]: So I know it's always not possible to use quantized variants, but yeah, and that gives us a lot of boost if you see the graph in the right side. And also another shout out to very cool repository called stablefast. And they have been actually handling a lot of things which I just mentioned right now internally. So all you have to do is just use it and be done with it, I guess. But there are a few caveats which you can check like it works best with hugging faces diffusers library if you're using. But yeah, you can actually make it work with others also. And you can obviously go with serving frameworks like reserve, something I've been using and I find it quite useful. So it supports with a lot of things, specifically improving your throughput in general.

Biswaroop Bhattacharjee [00:11:09]: So helps you with direct out of the box micro batching support. And you can also do like multiple models, multiplexing and all sort of those things and lots of other things it supports. And before concluding, we at Prem actually started a grant program. So feel free to apply for it anywhere if you're working. So there are a lot of things we are providing. So like free fine tuning jobs, free ML model deployments, free ML ad hoc support or any kind of sort of thing you would like. And you can read more about it obviously in this link and key takeaways. So finally circling back, it doesn't matter if your data set is small, focus on the quality of data set.

Biswaroop Bhattacharjee [00:11:58]: When I say quality, it should be both like image and prompt. In the case of vision I guess then obviously target for the low hanging fruits, something like which I talked about, and optimizing the overall pipeline performance and parallelize and batch things like as much as possible. Since you are deploying a whole pipelines and not just single model, there are lots of opportunities to parallelize and batch I feel like. So we should be actually targeting them to improve the throughput and latency in general. And yeah, go nuts with custom upscaling pipelines because they can really make the difference which will make your generation stand out basically what I feel. Thank you. Yeah, that's all I had I guess.

Demetrios [00:12:44]: Biswaroop, my man. All right dude, excellent. Thanks so much for that. I just threw the link for the prem grant program into the chat, so if anybody wants to do that and get involved, please click on that, apply all that fun stuff. It is really cool to see and I love following your journey and seeing what Prem is up to. It seems like you've been doing some amazing work and I appreciate you sharing it with us bro.

+ Read More

Watch More

MLflow Pipelines: Opinionated ML Pipelines in MLflow

Posted Aug 02, 2022 | Views 2.7K

# MLX

# ML Flow

# Pipelines

# Databricks

# Databricks.com

Generative AI Agents in Production: Best Practices and Lessons Learned // Patrick Marlow // Agents in Production

Posted Nov 15, 2024 | Views 6.2K

# Generative AI Agents

# Vertex Applied AI

# Agents in Production

LLM in Production Round Table

Posted Mar 21, 2023 | Views 3.1K

# Large Language Models

# LLM in Production

# Cost of Production