Sign in or Join the community to continue

1984 All Over Again? An Open Ecosystem to Fight Closed Models

Posted Jun 20, 2023 | Views 434

# LLM in Production

# LLM

# Premai.io

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io

Share

speakers

Filippo Pedrazzini

CTO @ Prem Labs

Filippo holds a successful track record in scaling enterprise-based companies. He was CTO at Grand, Sicuro, and Gavagai, leading AI-powered chatbots, machine learning models, and software architecture design. He holds a master's in machine learning and software engineering from Politecnico di Milano and KTH. His academic contribution includes the development of an open-source deep reinforcement learning library.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Tech giants worldwide, like Samsung and Amazon are fuelling the growing trend to restrict the use of OpenAI’s products due to concerns over privacy and the sharing of sensitive information. Amid this landscape, open-source Large Language Models (LLMs) are experiencing a surge of innovation. These self-hosted models offer a privacy-preserving alternative to Open AI. Still, their integration into application production comes with challenges, including model interoperability and the complexities of working with a single API without any third-party interaction. In his upcoming talk, Filippo will dive into these issues, sharing insights from his own expertise and experiences. He'll discuss the innovative solutions he has crafted to overcome these obstacles, aiming to unlock the full potential of open-source LLMs for a more secure and efficient AI application environment.

+ Read More

TRANSCRIPT

Introduction

Let's see. I see your screen. You're good. I see full screen. Yeah. Game on baby. Nice pho. There he is. There he is. All right. Cool man. Hello my friend. How are Youhow amigo? Ciao. Doing well. It's good to see you here and I'm excited for you to present. What, how long did you think I was gonna hang on there, man?

Sorry, you weren't even watching, weren't you? Yeah. Zero. No. Cause you were pure, pure panic with Chrome. That's it. So people don't, don't know, but behind the scenes, sometimes the streaming software that we're using, the speakers. Will crash their computer as soon as they share their screen. And so Filippo got a first row seat to the computer being crashed.

And that's why we threw this in. We improvised, man. We improvised. So, all right. You are super cool at that. So it's perfect. Yeah, we can go with it. All right. I'm gonna let you get started and let's, uh, let's cruise. Okay, I'll be back in 20, 25 minutes, ma'am. Perfect. Uh, thank you very much. So, hello everyone.

Uh, I'm Philippo and, uh, I work as a CTO at the Grand, um, and I am also a co-founder at, uh, prem. Uh, this is like the outline of my presentation today. Uh, so first I will try to speak about the AI personas, uh, in the, I mean, in the current ecosystem. And then like, uh, the, my business, uh, personal problems, I would say.

Uh, and then the, I mean, the obvious solution given like, uh, the title of the deck, uh, but I mean theoretical, obvious theoretical solution, which is open source. And then like the challenges that you can face, uh, when, uh, trying to. Use these, uh, open source cell lamps that a lot of people are talking about.

And then like, uh, as a final point, I will, uh, give like a practical solution, uh, to these, uh, problems. So, uh, starting from like the ecosystem and the different groups, uh, in the AI ecosystem, the first group, it's obviously about the data scientists and the researchers. Um, I mean, they, they, they are the ones that actually know about math and, uh, and, and, and statistics.

And without them we wouldn't be, uh, speaking about, uh, amps in production. Uh, then the second group, it's composed of MOPS engineers. I mean, they, they, they are the, the guys that, uh, bring, I mean, take the model and bring it to production for real. Um, and, uh, obviously, I mean, like, uh, they are the, the serious guys.

And, uh, uh, then like the, the, the, the last group, it's about like the web developers, uh, that are, I mean, they are having a lot of fun building new AI applications, uh, using, uh, launch and, uh, and uh, YAMA index. Uh, and I mean, luckily they are here to help us out because, uh, otherwise, uh, all the, uh, web applications will be like based on stream lead or grad.

And I mean, this consolidates like in the, in a single spot. Why? Because basically, like, uh, uh, I've been, um, I, I started my career as like a data scientist. So I was, uh, fine tuning and training deep learning models. Um, and then like, uh, due to the fact that we were a small team, I had to, uh, learn how to actually bring them to production.

It's like I entered the rabbit hole of mops engineer and all the tools and frameworks related to that, to that. And finally, I mean, like, it's, uh, uh, more than one year now that I work as a cto. And in that case, I mean, uh, you, you touch and, and you have like big, I mean, the full picture on the entire, um, tech stack.

And obviously, I mean, like you, you, I, some issues that I was not expecting to, to face, for example, related to UI and UX actually, like they, they, they are super important and like, uh, front end environment, it's more, it's harder than expected, uh, I would say. Uh, and. Yes. So I mean, like, uh, um, I've been, let's say, facing a lot of issues during my career and, uh, um, like solving one, one problem after, after the other, basically.

Uh, but still today I have like, uh, big issues, uh, and like the main one is that, uh, I cannot use OpenAI. It's not because I am Italian. Uh, it's, uh, it's, I mean, like the, the, the main reasons that, uh, I'm dealing with the sensitive data, so like very sensitive data and, uh, on-premise infrastructures. So in this case, obviously we cannot use third party, uh, providers.

Um, and like we have like a big limitation considering like I would actually charge BT GBT four R. Okay. And, uh, for this reason, uh, I mean, after like, uh, the first release of, uh, just a few months after, uh, we know, uh, we know, I mean, we all know what happened. So, uh, like the madness started in the, um, open source community and all these, uh, I mean, uh, incredible animals had been out.

So from Yama, alpaca, et cetera. And, uh, due to the fact that like there was this huge, um, hype on Twitter, I started to dig into, uh, actually these models and see like if they could actually work in a business, uh, use cases like a production use case. Okay. Um, but I mean, it's not, uh, as nice as, as as expected, basically as always.

I mean, things seem super, super cool and nice, but in the end, I mean, you face a lot of challenges and I will show you like, I mean, I will give you like some problems that I faced. So like the, the first time, obviously it's about quality, so I mean, yeah, like, uh, I mean it's like a joke of course, but I mean the number of models beating GPT four on Twitter versus the number of models, uh, beating GPT four in reality.

Uh, what I can say is that, um, um, data sets are one thing. I mean, like, uh, and we have, I mean, I think that, uh, the community's improving day by day in terms of benchmarks, but then, like, uh, empirical evaluation is another one. Uh, uh, and of course, like things, uh, uh, uh, I mean like comparing the, the, the current, uh, uh, open source models with the gbd four.

It's a bit, uh, too drastic, I would say, at least right now. But progress is like doing amazingly. Um, And like for, uh, this, let's say, I mean, um, for each problem I will try to give some tips centric and for this, uh, specific problem, I mean, like the first starting point that I want to say is that specific, I mean, in, in a single specific business use case, you don't need agi.

Okay? So, uh, you don't need like the capabilities and the reasoning of GT four. So it's not a big deal even if, uh, these models are not as good as, uh, GPT four. Then like another, uh, I mean error that I have been doing like a lot during some, doing some, uh, evaluations of these models. It's like, uh, about prompt engineering, but not in the sense of like defining the good prompt in order like the correct result.

It's more like the fact that each model has been trained on, uh, on different data sets, uh, this open source models. I mean, and, uh, Obviously, I mean, each, uh, each one of them has, I mean, the, the team behind each one of these models has done like different data processing and data cleaning and uh, uh, obviously, um, there are different prompt templates for each model.

So when you do evaluation or like you test for the first time, like these models try, uh, to be careful and read a lot the documentation and how these models have been trained. Uh, then like another, um, tips and tricks or like suggestion is that, uh, um, a lot of, I mean, um, for, uh, a lot of, uh, question answering in use cases you can actually use like, uh, these, uh, 7 billion Q4 models, uh, coming out of, um, The community, but uh, obviously like they have a big limitation in terms of context, dimension.

So, uh, given the fact that, uh, I mean they, they given this limitation, you need to be careful in terms of, uh, uh, creating the right dimension for the chunks. So, uh, obviously, I mean, given this small prompt that you can put there, Um, you need to be very careful in, in, in, in this sense. Okay. Uh, and like then my suggestion is, uh, um, related, I mean the, the second point, it's related to the first time in the sense that just use send transformers, which is, uh, I mean, uh, fast and, and good enough for these use cases.

And, uh, and those handles like a small context. So, um, I mean the, the combination of the shoe in the, like trades like it, it took to your data a solution that actually works fine, I would say. Um, then like, uh, I mean if you really need, uh, go for fine tuning, but, uh, I would stay away from that as much as possible.

Uh, which is, I mean, it, it's not that it's complicated, it's that it's cost intensive and like you need, um, a lot of experiments in order to do it right. Uh, and. If you can, I mean, if, uh, your use case is just uses embeddings, keep going with embeddings. I mean, don't complicate or like, I mean, if you enter the fine tuning planet, then like, uh, uh, it's difficult.

Uh, I mean, like, you, you will have to maintain a lot of stuff and like you will, uh, uh, have a lot of throwbacks. Um, so like, keep, I mean, keep it simple and use embeddings as much as you can. Then another problem that I faced is about other constraints. So, I mean, I saw this message like, uh, too many times lately.

Uh, and, uh, the, the, the main reason is that, uh, in a on-premise infrastructure, you don't have like the latest super cool GBU with 80 gigabytes of, of, of memory. So you need, uh, to like adapt yourself and like, For a, I mean, let's say proof of concept, you cannot like, uh, start distributing the model, uh, on multiple gpu.

I mean, you, you, you need to start simple. And, and, and, and for this reason, I mean, like, my, my, my main suggestion is that start from the simplest model, the, the smallest model model possible, uh, to test your use case, um, in an open source manner. So, and, and also like, I mean, give bigger model doesn't mean better quality.

So on the left you have like an example. Which I mean, you see Dolly, uh, 12 billions failing on a, just, I mean, a low, uh, uh, quest. I mean, it's not even a question. It's like just a, a low explanation, uh, exclamation. And you have like a, the, the, as an answer, like the, the welcome message from stock workflow, um, on the other side instead, the 7 billion q4.

And, and, uh, it explains like very well, like the difference between nuclear, uh, fusion and fusion. So, uh, I mean, it all depends. So, but, but, but in gen, I mean, and these, like of course I did these screenshots and I found the specific, uh, failure for Ali and like, the specific, specific good answer from, for Vicuna in order, I mean, for the, uh, sake of the presentation.

But the message is general. So start very small and then like, uh, go, I mean, try to, to, to, to see if like, uh, uh, the, the small model can fit your needs and then go, go bigger. Uh, uh, um, if you have like limitations on, on the first one. Then, uh, like problem frame. So I mean, let, let, let's assume, let's create this scenario in which you are like, I mean, uh, checking on Twitter, the new latest updates, and you see like all these new models are super cool going out, uh, and they, I mean, all seem to, to be GT four or like, at least have a good quality.

And then the day after you say, oh, okay, let's, uh, let's, uh, speak internally with my team and say, okay, let's do it. You know, I mean like, let's try them out and see like if we can actually use them in, uh, in our use case. Uh, and like then of course, like the first thing that, um, your boss or stakeholders say that, okay, gimme like a, an environment where I can test it.

So like, uh, create a poc, create like a microservice do, do whatever in order to, for me to test it, you know, and see like if the quality, it's actually good, uh, uh, uh, as, as you think. No. And then this is this what happens? I mean, and I think it's, it's, I mean, temp is always in the AI industry because we are very used to.

To, to, to, to, to like bad developer experience. Okay. Uh, but I mean, these are the, all the steps. The two screenshot that I put, uh, behind the memo is that, uh, I mean all the instructions in order to run, um, open assistant 30 billion. So on the other end is like, uh, the, the installation process in order to run like a, um, a, a simple 7 billion q4 uh, model.

So, I mean obviously like we are, we are very used to to to bad developer experience. Uh, uh, but I mean, it's, it's, it's not nice. Uh, and, and like in other industries, I mean, like in other tech industries, these things are not happening. Um, so we are, we have like low standards I would say. Uh, and, uh, yeah, like, I mean in terms of tips, centrics not much to say.

I mean, like just, uh, keep calm and, uh, maybe the, the, the next day it'll work. Maybe not, uh, who knows? Uh, but, uh, I mean, uh, you will see okay, given, uh, independently. Okay. Independently from, uh, I mean the data type that you have to take care of and like privacy concerns that you have still, I mean, going for the open source and then go like, and do like the first step towards that.

Um, it'll for sure bring benefits on the long run, and it's all about influencing cost. I mean, uh, if you don't care about privacy, uh, and, and, and this, I mean, it's, it's, it's, uh, it's part of the, the reality of like, uh, um, of going for an open source solution. Uh, and yeah, like going back to the, to the initial slide, um, Obviously, I mean, like these three components are part of an complex ecosystem.

They work alone, but they work together well. I mean, like, uh, from a broad per perspective, but they are also super different. I mean, and it's different. I mean, In order. They have different mindsets. They have like a different attitudes and, and like obviously, I mean like creating like a good communication across for these three groups, it's very complicated.

Uh, what we're trying to build at prem. It's like a super simple, open, developer friendly ecosystem in order to handle that. And, uh, uh, like in terms of, uh, I mean for the data scientists, we will, uh, soon, um, release fine tune capabilities. Uh, uh, similar to, uh, how open AI does for ops engineers, you will be able to run in a few com.

A few comments and clicks, uh, eh, um, production ready container. Uh, while for web web developers, I mean, just on prem with a few comments or, uh, locally with the desktop app. And, um, you, you can keep going with your launch chain integration and just switch the Basel to, uh, your server ip or like another instance I would say, uh, I have a small video prepared for that.

Which is the same that you will find in the GitHub repository. Uh, we have like both the desktop app and the, um, server installation, which I mean, it's like an installer script, then install all the dependencies and it transl like a docker compose. Um, you, you can, I mean, we just few clicks run a service and then integrate, launching accordingly.

Um, And, uh, yeah, I mean, like we, we, we expose only few services. Now we are like still in a beta alpha phase. Uh, we are looking for help, of course, and, and, uh, I mean, but, but, but soon we will expose more services and we want to go multi model in the sense that not only a lens, but in general, like all the models.

Um, and yeah, this is like, uh, I mean, you can try it out on GitHub and there are like all the instructions in order to install it or to run it in your server infrastructure.

Okay. Uh, yeah. Okay. And, uh, like, uh, starting now, I mean, uh, we will start, like, right now, the prime challenge, uh, with the price of like, uh, 10,000 plus dollars. Uh, and it'll last until like, uh, I mean, it'll last for two weeks. Um, you can check on GitHub, uh, at this link. Uh, I think that will share it with you, like, probably like in the.

Later. Uh, and, and, uh, and we will like do a blog post after this, um, um, presentation. And, uh, yeah, I mean, it's about composibility. I mean, try to build on top of prem and see like how things work in, uh, only, only using open source amps. We wanna see like developers doing what they've done, like, uh, uh, few. Um, Few months ago with the challenges related to, uh, GT and, and, uh, open AI in general, like, uh, layer and models.

Uh, join us. I mean, like, uh, you can follow us on Twitter. Uh, we have like, uh, GitHub, you can, uh, I mean enter in our discord for any problem, technical issue. Uh, I mean, uh, we are there to, to help you out. And, uh, yeah, I mean, you can also check out our demo instance if you don't want to, to install the, um, the app in your infrastructure, uh, at, uh, app Brand Ninja.

And, uh, thanks a lot. I think that, uh, I took less time than expected, maybe. I don't know. Nicely done. That makes my job a lot easier then. I mean, I'm saving you time, you know. Nice. Yeah, exactly. Wait, I just wanna clarify some things because you're giving out, uh, you're giving out some money here and you're not talking about small dollars.

So what's the challenge and how much money can I win? I mean, uh, you can win. Uh, I mean, like, we need to speak with the CEO because I'm not like, uh, in charge of like the accounting part, but, uh, I believe that it's around $10,000. Um, and it's like, uh, uh, across the, the best projects, I would say. And, uh, but there, there are more like information about, uh, the challenge in the, uh, in the rhythm, uh, of that repository.

Uh, with some examples in order to, uh, how to actually, uh, get started with prem so that you can, uh, already see like how it works and, uh, uh, start building out of the box. Oh, that's awesome. So basically if we build, if we build an app with prem, I'm looking at the challenge right now on, uh, GitHub. I could probably even just share this with you all.

Yeah, actually. And, uh, so if. Basically prem challenge, uh, done specifically for the LLMs and Production Conference. I like that. What is a web app using many or one of the prem AI services and when is it happening? Starting today until June 30th. It's a virtual thing. You can be on the team or you can be solo.

Oh my God. Up to 10 K will be awarded to the final. Selected project. So basically it's like if you can create a cool app with some open source models and vector database from prem, then you're gonna be getting some cash thrown your way. I like the way that this is shaping up, man. This is really cool.

Okay, now I threw the link for this in the chat in case anybody wants it. And all right. Does it say how to? How can you submit it? Uh, yes. Uh, there's like, um, uh, a link, like a submission process, Google form, link, um, basic steps. I mean, uh, just to send us like your repository and, uh, simple. All right. Nice dude.

Well, I'm glad that you guys did that and I know that it is, uh, it's something very cool and you're, you're. I'm doing some incredible stuff at Prem. I really love it, and if anybody wants to know more about what you are doing at prem, I will direct them towards this good old tab on the left hand side that says solutions.

And you can go there, check out where is it? Prem, prem, prem, prem, prem. Enter the virtual booth and you get hit with some cool stuff from prem. I love the project. I mean, I know you guys just launched a few days ago, and so it's cool. Like everybody out there listening, give 'em some GI GitHub love because it's fully open source and you're trying to do some absolutely awesome stuff with it, man.

So thank you, Felipo. Thank you very much. I think you may have the best slide that I have ever seen. I cannot use open ai, not cause I'm Italian. I gotta give it up for you on that one. That was brilliant. All right, man. Yeah.

+ Read More

Sign in or Join the community

Watch More

Graduating from Proprietary to Open Source Models in Production

Posted Feb 27, 2024 | Views 828

# Machine Learning

# Open Source

# Baseten

We Can All Be AI Engineers and We Can Do It with Open Source Models

Posted Nov 20, 2024 | Views 694

# AI Specs

# Accessible AI

# HelixML

How to Systematically Test and Evaluate Your LLMs Apps

Posted Oct 18, 2024 | Views 15.1K

# LLMs

# Engineering best practices

# Comet ML