Sign in or Join the community to continue

Accelerate ML Production with Agents

Posted Mar 06, 2024 | Views 1.7K

# ML Production

# LLMs

# RemyxAI

Share

Speakers

Salma Mayorquin

Co-Founder @ Remyx AI

Salma is a co-founder of Remyx AI, leading the development of agent-guided MLOps. Previously, she worked at Databricks where she helped customers architect their ML infrastructure. She also runs a research blog, smellslike.ml, where she shares and open sources experiments in applied ML.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Large language models (LLMs) can unlock great productivity in software engineering, but it's important to acknowledge their limitations, particularly in generating robust code. This talk, "Accelerate ML Production with Agents," discusses applying the abstraction of LLMs with tools to tackle complex challenges. Agents have the potential to streamline the orchestration of ML workflows and simplify customization and deployment processes.

+ Read More

TRANSCRIPT

Accelerate ML Production with Agents

AI in Production

Slides: https://docs.google.com/presentation/d/1hunjT9jBavE6ijM8iZeSEPL_R-g49xbd/edit?usp=drive_link

Demetrios [00:00:05]: And our next one up is Salma. Where you at, Salma? Hello.

Salma Mayorquin [00:00:13]: Hi. Hi. Pleasure to be here.

Demetrios [00:00:15]: I am so excited to chat with you. And we are like 20 minutes behind schedule, as you know. So I'm just gonna hand it over and we'll keep it rocking. Feel free to jump in, share your screen, all that fun stuff, and then we'll get moving.

Salma Mayorquin [00:00:36]: All right, well, we'll keep it rocking. I like the attitude.

Demetrios [00:00:40]: That's what we're doing here. That's it. I'll see you in ten minutes.

Salma Mayorquin [00:00:46]: Fantastic. All right. Well, hi, everyone. My name is Salamarikin. I am one of the co founders of Remix AI. And today I wanted to talk to you about accelerating ML production with agents. Hopefully, this is going to be an exciting and interesting tech with y'all. Kind of wanted to start off by a shameless plug, but also a way for us to illustrate what we mean by agents and how they might look like in action.

Salma Mayorquin [00:01:11]: At Remix AI, we're actually using an agent to help streamline the development of machine learning applications by integrating workflows all the way from data synthesis to data curation training, fine tuning and deployment. And I think the best way to illustrate that is through a quick little demo. Just to show you what we mean at remix, we have a little chat bot that you can go ahead and talk with. You can explain what kind of project you're working on. And through that vague conversation that you have with the remix agent, it'll understand to pick out and prepare a recipe for you in terms of how to prepare a data set, how to fine tune the appropriate architecture, and then how to deploy it. This example shows me trying to make a dog detector so that I can understand when my dog is in her tent, and then deploy that on a microcontroller. So it's selecting a really small architecture that I can go ahead and flash the binary onto my arduino, for example, and then turn on and off her heater whenever she's actually in there. So it's a cute little example of how with a couple of clicks and just a conversation, we can actually make a model from scratch, making it really easy for a lot of folks, whether you're both an expert or just a novice.

Salma Mayorquin [00:02:23]: And if you take anything from this conversation, hopefully you take a lot. But what we want to impart on you is that we think, and through our experiments in the last year, we found that generative AI has great potential to reduce the cost and complexity of ML deployments. I'm sure all of you have already had your war stories or your battle stories about what it takes to actually bring one of these machine learning applications into production. And we think there's a couple of areas where that could be really helpful. We found that in terms of augmenting your data set through finding open source image data sets or text data sets out there, plus synthetic data sets helps create that starting point. It could also help with orchestration. So composing or composition using llms with tools, we'll get that into that in a minute. And then we also think it could help with optimizing deployments.

Salma Mayorquin [00:03:20]: Specifically, we think it could really be useful in tuning models of all types for your application. All right, so that further ado, let's get into it. So, designing data sets, we think that generative AI could be really useful in this particular area. You can use it to help you synthesize data with generators. So you could use large llms like GPT four to create a data set for you. From very few examples, we have a link here of a chat that could help you kind of give you an idea of how you might be able to do that. You could use that synthetic data to then also augment your own data sets or even open source data sets out there. So with very few resources, you could make a really good data set to provide high quality data for your tuning.

Salma Mayorquin [00:04:09]: And we also foresee that there's going to be a lot of developments in auto labeling. So no longer will you have to spend a lot of money or time having to label your data. You could rely on these methods to make really small amounts of samples really impactful. All right, this is one of my favorite topics. I think this is an interesting intersection of using llms. I think we all know, and we've heard from other speakers that llms have great potential, but they also have their downsides. They're prone to hallucination. But we think that there's a way to pair this really awesome, flexible technology with tools that we already have out there that we rely upon on a daily basis.

Salma Mayorquin [00:04:56]: So we think instead of using llms necessarily to generate code from scratch every time, you could pair llms with tools. So we could use llms to map intent, so vague intent, like we did in that demo earlier, to match that to function calls. And you can chain together multiple function calls. So, for example, to be able to orchestrate a workflow like we did before. And what we mean by tools too, is that they are a narrowly defined function. That function can encapsulate an API call, or it could be a job or a dag, it could be anything you'd like. And how we do this behind the scenes at remix, we have the remix agent, which chats with the user. And through that conversation, it fills out this function call here called create job.

Salma Mayorquin [00:05:42]: And through that conversation it selects. Okay, this user wants to do an image classification versus detection, versus segmentation. All the other kinds of model tasks. We then identify what kind of data we need to create and curate, and then even what kind of architecture we need, what kind of resources we need to put together versus small models, versus large models. And then that can translate to a dag. Then we then orchestrate. I know I'm a little bit out of time, but I'll keep it really short. Other things that we think could be useful too is that we could tune these models to then be tailored to your use case.

Salma Mayorquin [00:06:21]: I think we think this is really important to actually make llms much more usable, much more consistent through specialization, essentially reducing the data that the LLM is touching upon in the background. And we think that llms could help both in training or tuning or guiding that process, as well as evaluating. On the right hand side, you see one of our evaluations on our platform that actually uses the FLAs methodology, which will also be linked in this deck, essentially using llms to then evaluate the performance of other llms. And a couple of tips that I want to impart on you if you're going to go ahead and try to fine tune your own. We love Laura, so we love low rank adapters. We think they work really great and they actually really work really great with less than 1000 samples. There's a couple of archive papers out there right now that talk about how you don't really need actually that much data. And a couple of things that we found also to be really useful is to train all the target modules, depending on your architecture, and also increase that rank and the number of ebox you train for.

Salma Mayorquin [00:07:28]: All right, thank you very much. I would love to connect with all of you. Go ahead and find us here on LinkedIn and also on our site.

Demetrios [00:07:36]: Holy smokes. That was so cool. Thank you so much, Salma. Wow. I think the chat blew up. They were in absolute amazement with what remix is doing.

Salma Mayorquin [00:07:52]: Wow, that's exciting to hear. Hopefully it's a ton of stuff in there. I'm happy to share all of the links and support everybody's experimentation right now. It's super exciting. Excellent.

+ Read More

Sign in or Join the community

Watch More

Generative AI Agents in Production: Best Practices and Lessons Learned // Patrick Marlow // Agents in Production

Posted Nov 15, 2024 | Views 6.5K

# Generative AI Agents

# Vertex Applied AI

# Agents in Production

From Idea to Production ML, From Idea to Production ML, From Idea to Production ML

Posted Apr 28, 2021 | Views 791

# Googler

# Panel

# Interview

# Monitoring

Building Reliable Agents // Eno Reyes // Agents in Production

Posted Nov 20, 2024 | Views 1.7K

# Agentic

# AI Systems

# Factory