Sign in or Join the community to continue

Navigating the Emerging LLMOps Stack

Posted Mar 06, 2024 | Views 471

# LLMs

# LLMOps Stack

# DoorDash

Share

speakers

Hien Luu

Head of ML Platform @ DoorDash

Passionate AI/ML engineering leader, building scalable AI/ML/LLM infrastructure to power real-world applications. Author of the Beginning Apache Spark 3 book. Speaker at MLOps World, QCon (SF,NY, London), GHC 2022, Data+AI Summit, XAI 21 Summit, YOW Data!, appy().

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

In this session, we delve into the intricacies of the emerging LLMOps Stack, exploring the tools, and best practices that empower organizations to harness the full potential of LLMs.

+ Read More

TRANSCRIPT

Navigating the Emerging LLMOps Stack

AI in Production

Slides:

Demetrios [00:00:05]: All right, man. It is all you, dude. It is all you.

Hien Luu [00:00:12]: Thank you. Nice.

Demetrios [00:00:13]: Are you going to share a screen or anything?

Hien Luu [00:00:14]: Yeah, I got to share the screen. I got to click the share.

Demetrios [00:00:17]: And just for everybody that is watching, Hayne has been here for the last 30 minutes and he didn't think to share his screen until now.

Hien Luu [00:00:28]: It shows up now. There you go.

Demetrios [00:00:31]: There we go.

Hien Luu [00:00:32]: Are you ready to go? The slides are up.

Demetrios [00:00:33]: They're up, man. I'm going to jump off and I'll be back in ten minutes.

Hien Luu [00:00:37]: All right, thank you. Thanks everyone for attending this session. My name is Henleu and I lead the ML platform team at Doordash. In this session I want to share about my learnings from exploring the stack, you know, in the context of trying to figure out a strategy for us to build our own stack. Here, I would like to get started with this quote that I saw last year from Cassie. Hopefully many of you knows who that is. When I saw this quote, I told myself that I got to use it in my next presentation. So here we are.

Hien Luu [00:01:17]: So it says, implementing Gen AI at scale in the enterprise is like teenage sex. Everyone talks about it, nobody really knows how to do it. Everyone thinks everyone else is doing it, so everyone claims that they are doing it. This statement is quite true in 2023, in my opinion. However, I think it's going to be less and less true as we move forward because I believe as the ML community we'll figure it out how to do this at scale in the coming years. Very similar to mops. Before we get talking about the stack and everything, I think it's good to take a step back and really think about the challenges, the unique challenges with building and deploying these OMS applications. This is a great blog to go through.

Hien Luu [00:02:12]: If you haven't, it really highlights these unique challenges in these five dimensions. I won't go through them, but I think the one that particularly interesting in recent times is about inference and serving. There's probably half a dozen or two dozens of startups in recent time that spun up to help solving the inference area of hosting models and such, particularly the one that you might have heard, Titan ML. Hopefully Miriam is listening. In addition to that, though, there's also other dimensions to think about as well. If your company is supporting like high scale qps use cases, those are costs and latency. So that's very top of mind for those kind of use cases. Besides that, we know there's proprietary lms and open source lms.

Hien Luu [00:03:10]: And as you think about building your stack you need to figure out how to support both of these world because the needs are slightly different. The infrastructure that's needed is going to be more challenging for one versus the others. What's really interesting is in some use cases that they might want to use one of those as the backup for the other in case of reliability or deformance and such. In some cases both of these could be used for certain use cases. So in this LM op stack we got to make sure how to best support both of these types of lms. Certainly the open source LM is quite fascinating in recent times of the accessibility to those models. I think for some of the enterprise use cases, more likely they will adopt these open source lms doing fine tuning with their own specific domain data and such. So from the stack perspective, what does it take to do that effectively, efficiently and quickly? Earlier the presentation from the databricks did talk about these application archetypes.

Hien Luu [00:04:27]: So prom engineer rags and fine tuning. I think the one thing to think about as putting together op stack is what are the components and pieces that we want to help productionizing. For example, prompt engineering is very foundational requirement for any geni apps. So how do we handle prompt management, versioning, release testing, evaluation? Garrell, the thing that was talked earlier about that potentially can use with some of those areas for rag the vector database, embedding models, pipelines, for constantly updating those embeddings and such, and then fine tuning. How do we make that process easy in a cost efficient manner? So automations around templates, around training and evaluating. So this is a very well known block. If you haven't, I highly recommend to go through this. It does a great job of laying out the reference architecture in a very thoughtful, thorough way.

Hien Luu [00:05:47]: So it includes many commonly needed components. The one that's quite interesting to call out is the bottom right, which is the LM cache component. I think that's quite interesting to think about to leverage for high qps use cases, particularly the one that's making call it to vendor lms to help with cost reduction as well as reduce latencies, right? But you need a very interesting cache. They're not your typical KV cache, so you need to figure that out. So the previous that screenshot is quite busy. So if you step back and look at it and think of a higher level abstraction, you can identify these pillars. And this helps with identifying the various pieces. You got to think about putting together your stack and maybe think about what the kind of teams you want to put together to build this out.

Hien Luu [00:07:14]: It's good to take a peek at some of the well known companies, at least for me. We in the valley to see what they're doing and to learn from them. This is the one from LinkedIn that was shared at the race summit in 2023. They have essentially a list of components or areas of their stack. The one at the top is now it's becoming obvious is the gateway. If your stack is going to support proprietary and open source, OMS makes a lot of sense. The playground promote innovations and experimentations for them. The trust and responsible AI definitely makes a lot of sense given the brand and the company.

Hien Luu [00:08:10]: I think for LinkedIn definitely very likely most of the use cases will use internally hosted models due to privacies and PI. So if your companies are in that similar space, that's something to think about. And then the next one is Uber. This came from a presentation that was given at scale conference in July 2023. Uber as we know, one of the companies that's in front of the pack in terms of their platform due to their scale diverse sets of use cases you can see on this diagram, their Uber Geni platform is an extension on top of their LM Ops platform. It includes these components that looks probably very most of them are familiar to many of you in the areas of training and tuning, inference and observability and such. So for me, this is not the stack, but essentially a list of components to help me to understand what are the big pieces in a typical mox stack. It doesn't mean that we're going to need to build every single pieces out here, but it's going to be depending on the use cases that the stack needs to support.

Hien Luu [00:09:45]: And then from there you can figure out what pieces need to be built 1st, second and third. So essentially let the use cases drive which boxes that you need to build out. But I think it's good to have a high level sense of what are the big pieces within your stack and then figure out how deep to go on each one of these. And I think the key thing is step back to think about not just from the technology perspective, but it's all about velocity, friction and cost. How does the stack help with enable high velocity for team to build gen AI applications within the company with minimum frictions? So that means you got to focus a lot on automations, toolings. And at the same time, as we know these large language model needs gpus and fine tuning and inferencing. Be very mindful about the costs, managing those costs as part of the stack itself. So this is the last slide.

Hien Luu [00:10:55]: Thank you for attending this session. These two images were generated by image FX from Google. I typed in essentially the title of this conference, and that's what they spit out, so it's pretty cool. That's it on my end.

Demetrios [00:11:16]: Dude, those photos are awesome. Wow.

Hien Luu [00:11:21]: Yeah.

Demetrios [00:11:24]: On the right. Looks like a big broccoli. Looks like a piece of broccoli or cauliflower. That's true.

Hien Luu [00:11:34]: How long did I take? I forgot.

Demetrios [00:11:36]: Way too long. Get out of here, man. That was awesome, though. I really appreciate it. It's always a pleasure getting to hang out with you and getting to chat with you, dude. I feel like we don't do it enough, and so I'm looking forward to hopefully crossing paths with you again when I'm in good old San Francisco or when you come to Europe.

Hien Luu [00:11:59]: I will be in London in April.

Demetrios [00:12:01]: So there we go.

Hien Luu [00:12:03]: You happen to be in London that time. Let me know.

Demetrios [00:12:06]: There we go, man. I'll come just to see you. Don't you worry about that. And I will show you that I am now a large. All right. And by the way, for everyone that likes t shirts, did you see? We made a t shirt. And it is my prompt for my special sauce.

Hien Luu [00:12:24]: Nice.

Demetrios [00:12:26]: Wow. Don't hide your enthusiasm there. Get out of here, man. What is this? You're kicked off of this stage. I can't believe that. Anyway, though, it was great having you, and I'll see you later.

Hien Luu [00:12:42]: Thank you. See ya. Watch.

+ Read More

Sign in or Join the community

Watch More

LLMOps: The Emerging Toolkit for Reliable, High-quality LLM Applications

Posted Jun 20, 2023 | Views 4.1K

# LLM in Production

# LLMs

# LLM Applications

# Databricks

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io

The Post Modern Stack

Posted Jun 21, 2022 | Views 693

# Post-modern Stack

# Snowflake

# Real-world Recommendation Pipeline

Emerging Patterns for LLMs in Production

Posted Apr 27, 2023 | Views 2.3K

# LLM

# LLM in Production

# In-Stealth

# Rungalileo.io

# Snorkel.ai

# Wandb.ai

# Tecton.ai

# Petuum.com

# mckinsey.com/quantumblack

# Wallaroo.ai

# Union.ai

# Redis.com

# Alphasignal.ai

# Bigbraindaily.com

# Turningpost.com