Sign in or Join the community to continue

MLOps LLM Stack Hackathon Winner: Exploring the MLOps Community Trends

Posted Jul 21, 2023 | Views 663

# LLM in Production

# LLM Stack

# Virta

Share

speakers

Travis Cline

Engineering Manager, Platform @ Virta

Travis (tmc) is a long time open source contributor, the maintainer of LangChainGo, and an Engineering Manager at Virta Health.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

A quick run-through of our recent project to visualize and explore the MLOps community trends by building interactive tools to see Slack message content in new lights.

+ Read More

TRANSCRIPT

Next up we're gonna keep cruising and we have a very special guest. We are talking with Travis who won the MLOps Community Hackathon we had two weeks ago now in San Francisco. And so it was all about your LLM stack and your team one. Can you break that down for us? What was that whole process? Yeah, absolutely.

Just. Make sure you can hear me, see slides, everything. Yep. I can hear you. I can't see the slides yet, but Oh, really? Uh, tell us about the hackathon and then I'll throw the slides on. Okay. Um, so we, yeah, we, there were two tracks. Uh, this was, uh, super fun in person. Just one day, 12 hour hackathon. Uh, honestly a format I prefer over, um, doing like an overnight 24 hour thing and the.

Uh, huge shout out to everyone that helped, helps run it. Uh, Rahul p especially, I was just at ML Hops last night with him. Great guy. Nice. Provided some groundwork for all the teams. Um, so what we did is we kind of had a, a multi-pronged effort. Um, are there any slides yet? Oh, you're ready, you're ready to jump into it?

No, no. I'll, I'll go. All right. Here we go. Slides coming. Slides coming. Okay. Oh, I see myself now. Yeah. Then we're getting the infinite lost in it. All right. Awesome. And dude, I will, I will mention to everyone too, because you may not mention it, but you're re-implementing Lang Chang and go, right? I am. Yeah.

We've been a little quiet about it just yet, but, um, we're gonna, we're gonna be promoting it a lot and. Excited about getting more maintainers. Um, we're almost even just leaving, you know, it's not a full-time thing, just a side thing for me, just open source contribution and I don't really want to be reviewing hundreds of PRS a day.

Um, I don't know if we would get that successful, but I, but yeah, it's going really well. It's, and it's already, it's already fun to see. As soon as they dropped, Harrison and crew dropped TypeScript support. I was like, Maybe there's a multi-language, uh, option here. So I've been in touch with him and, uh, excited about how that project's going to evolve.

Super cool. All right, man. I'll get off the stage. I'll let you give the 10 minute lightning talk and then, uh, I'll be back in 10. Sounds great. Thanks Dimitri. Okay, so this is the presentation. So, uh, I'm gonna start out, um, talking about the team involved cuz I, I was just one person. I was kind of. Helping coordinate things and I definitely contributed, uh, technically, but I really wanna highlight the entire team here.

So, um, I was in some of the networking guts and had some, uh, existing open source slack work that, uh, helped us out here. Um, but Jang Hong was huge contributor in the, the Python layer. Also show you the architecture that we came up with shortly. And then Forrest was. Very ambitious and he, I, I'm gonna say he pulled it off and that he, Opted to do a from scratch point cloud implementation.

Uh, and I'm going to hopefully give you a brief demo of that. Uh, and then Brenda was super helpful throughout the project and then developed a really fantastic medium post writing it up, uh, which I'll link to at the end of these slides. So what do we do? High level, we were provided, again, huge shout to Rahul.

Uh, we were provided 10,000 embeddings of Slack conversations from the MLOps community, slack and. We thought, how do we make this, how do we make this useful? So we, we tackled it from kind of two directions. One was developing a Slack bot that tried that, you know, a retrieval based Slack bot that tries to answer questions, you know, with, uh, Redis vector based similarity context provided.

And we also, uh, built this embeddings explorer that has some integration into Slack, which, uh, I think I'm only sharing my Chrome screen, so I, there's, there'll be a slight gap in the demo there, but, Uh, you'll have to trust me. It works, I promise. Um, so breaking these down a little bit, so we have this go component, which does uses Link Chain Go, which I'll hype up a bit later on.

Um, and we, we were, Redis was a sponsor and it's a really fantastic tool. I use it in almost all my projects. And we used Redis Pub Sub as, as like a. RPC layer between our Python backend that was making open AI calls and the go server was there handling all the slack communication, so watching for new content and then, you know, ultimately responding to questions that it, it received.

Um, the Python layer, a lot of, you're gonna be familiar with that and it's not super novel. Uh, re retrieval, augmented LLM lookups. So basically given an incoming question, grab the couple nearest neighbors. From, uh, vector cosign distance similarity and then supply those to, to GPT four as context. Um, here are a couple examples of me pretending to be a robot.

I used my personal credentials. We didn't have the, I dunno if it would've been appropriate to get an official Slack bot. For hackathon. So just use my credentials. So that's me speaking as GPT four and showing that we have a, have some of that context provided. Um, you know, obviously if we were to take this further, that should be a link over to the relevant Slack conversation.

And then here's the topic, embeddings Explorer. So this is an interact, again, huge audit for an interactive browser visualizer that allows. Exploring of all the, the high dimensionality bedding and embedding space compressed down into three dimensions. This is using a technique called uap, uh, and huge shout out to the folks at Arise, um, Aparna and Xander in particular.

I didn't even know this was a thing until like two, probably three or four weeks ago. Uh, hackathon was two weeks ago now, and, um, I, it's really incredible seat and I'm gonna try to pop out. Can you see my, can you see the point? I don't have the, yes. Yes. So this is visualization of all Slack threads. Um, I upgraded a newer Mac Os and it is, my video performance is struggling a little bit, um, but I promise it should run smoothly for you.

And so this shows. Slack conversations here, and we have a little ability to jump right to the thread in question. So this jumped over to my Slack window, but I'm only sharing Chrome, so you'll have to trust Matt popped over to Slack. So that lets you go from the 3D visualization and explorer to jump right, jump right to the, the thread in question, which, uh, is a nice way to kind of tie that all together.

So this, the hackathon was about an LLM stack. And so a fun thing that w we used, I have, I've developed a hackathon template toolkit that's tilt based. And we used this to really get off the ground running. It was a 12 hour hackathon. We had to move fast and here's, here's all the components that were visualized, uh, plus a Postgres database, um, running in, in a tilt based development workflow.

This allowed us to all start just writing the things we were best. As soon as possible. Oh, really, really pun it there. And I wanna talk about now where we wanna take this. So we won the hackathon. There was a small cash price we were handed. Again, shout out Rahul, we were handle handed these 10,000 embeddings.

We want to continue generating embeddings over new slack, slack thread content. And presuming we have permission from uh, owners, moderators of the space, we wanna provide that. Set of embeddings to the community so people don't have to repay OpenAI for all those eight to two numbers. Um, other places we wanted to take it haven't yet is to get this, the ingestion pipeline real time all the way to the browser.

So new thread showing up as a new point cloud in this little time as we can do it, um, should be able to do it with very little overhead beyond the, you know, The open AI call's gonna be The slowest part. Should be the slowest part. Another thing we're thinking about is showing. The evolution of trends in that point cloud over time.

So I dunno if it's a slider or an an animation, but if we render that, that 3D point cloud, we can show conversations, emerging themes, emerging technologies emerging that are being talked about in the, in the Mlx ops community, slack, and really see how things are evolving over time. And maybe even have some sense about where, uh, new, new ideas have, have legs.

Um, I wanna do a deeper intro. Integration with Lang Ch Go's is actually originally using Lang Chingo, but we wanted to play to the team strengths and so we had the Python layer that was conducting the open AI calls. And with that, that is a brief overview of together for the ML MLOps LLM Stack hack.

That's a wonderful, um, link. So cool. Brenda has a really fantastic writeup there. Uh, and then watch this space, this TMC MLOps community. I'm putting everything there, so it's really easy to run all of this yourself and, uh, would really love additional contributors and would love to get those embeddings just available for everyone in the community.

Yes, I, I love that visual man. That is so cool. And. Yeah, it's really cool that you've been able to actually take advantage of the data. We have so much data in the community and I've always dreamt of doing things with it, and so I love the fact that we were able to make that happen. It's really awesome.

Yeah. And for, yeah, we'll probably be doing more and we'll probably do some virtually, so if there are people that did not get to attend the San Francisco Hackathon, we'll probably do it in. Uh, in the virtual space so that wherever you are, you can do some cool stuff with it. Anyway, man. I will hopefully see you.

I mean, you just heard, I'll be in San Francisco in two weeks and we're gonna do this, uh, LLM avalanche party, and so maybe you can present that there or you can do something about it. That would be really cool too. Yeah, I'll, uh, I'll talk to you later. We'll be in touch. Sounds great. Thank you.

+ Read More

Sign in or Join the community

Watch More

Exploring the Impact of Agentic Workflows

Posted Oct 15, 2024 | Views 7.8K

# AI agents in production

# LLMs

# AI

Exploring the Latency/Throughput & Cost Space for LLM Inference

Posted Oct 09, 2023 | Views 1.4K

# LLM Inference

# Latency

# Mistral.AI

MLOps at the Crossroads

Posted Jan 16, 2024 | Views 5.9K

# MLOps

# Kentauros AI

# LLMLOps

# AIMedic