MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Build and Customize LLMs in Less than 10 Lines of YAML

Posted Jun 20, 2023 | Views 448
# LLM in Production
# YAML
# Predibase.com
# Redis.io
# Gantry.io
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io
Share
speakers
avatar
Travis Addair
CTO @ Predibase Inc.

Travis Addair is co-founder and CTO of Predibase, a low-code platform for predictive and generative AI. Within the Linux Foundation, he serves as lead maintainer for the Horovod distributed deep learning framework and is a co-maintainer of the Ludwig declarative deep learning framework. In the past, he led Uber’s deep learning training team as part of the Michelangelo machine learning platform.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Generalized models solve general problems. The real value comes from training a large language model (LLM) on your own data and finetuning it to deliver on your specific ML task. Now you can build your own custom LLM, trained on your data and fine-tuned for your generative or predictive task in ten lines of code with Predibase and Ludwig, the low-code deep learning framework developed and open-sourced by Uber, now maintained as part of the Linux Foundation. Using Ludwig’s declarative approach to model customization, you can take a pre-trained large language model like LLaMA and tune it to output data specific to your organization, with outputs conforming to an exact schema. This makes building LLMs fast, easy, and economical.

In this session, Travis Addair, CTO of Predibase and co-maintainer of open-source Ludwig, shares how LLMs can be tailored to solve specific tasks from classification to content generation, and how you can get started building a custom LLM in just a few lines of code.

+ Read More
TRANSCRIPT

Introduction

A lot of pressure from some marketing teams, not Anton's and not Travis's. Where are you at, Travis? Hey. Hey. How's it going guys? I think we, man. How you doing? Good, good, good, dude. So let's talk for a minute before we jump into your talk. We had an awesome panel like a week ago. It was me, you and a few others at the snorkel event.

That was great. And now you're gonna, it is, that was like the warmup, that was the sound check or what? Now you're gonna keep it cruising. Yeah. Yeah. So I mean, uh, definitely we had some, some hints in that panel talk at some of the things that we're thinking about at Preta Base and some of the things that we're building in the open source with, with Ludwig.

And so yeah, I'm really excited to now get to share some of that. Uh, with some slides and a little bit of a demo at the end as well. Excellent, dude. So if anyone wants to check out Preddy Base, you have a whole lot of information on the, um, where is it? Right here on, if you go to the solutions tab on the left of your screen where you're watching this, throw that on there, boom, click the solutions tab and you will see Preddy base is on there.

You can enter the virtual booth. But if you're just looking to, uh, watch this conversation, Travis, I'm gonna hand it over to you and before I do, I'm gonna mention it. So I'll let you share your screen. While you're doing that, I'm gonna mention to everyone that we actually had your good old co-founder on the ML Lops Community Podcast.

Uh, probably like. Actually now, probably like a year ago, I think time flies when you're having fun. But if anyone wants to check out the M Lops Community Podcast and hear more from Preddy Base and how Piero and the team, Travis', co-founder and uh, the CEO of Preddy Base thinks about machine learning, ml lops and open source.

Also, we got into like the idea of. How you want things to be flexible. You want it to run outta the box and you want it to be simple, but you want people to be able to turn the knobs if they know what they're doing. And that resonated with me obviously one year later. I still remember that theory and that idea, and so I love the ethos there.

I'll let people scan this QR code. If you want to join us in the M Lops community podcast, we drop 'em once a week, but. Now it is your turn. Travis, I see that you shared your screen. I'm gonna get this off of the screen and I am going to leave it all to you, man. I'll be back in about, I think it's, uh, what are we at 10 minutes?

We've No, we've got 30. Yep, we've got there. All right. All right. Awesome. Thank you Demetrius. And yeah, thanks everyone for coming out today. Uh, so I'm Travis, uh, from Credit Base. I'm the CTO and co-founder. Uh, we're a company that came out of, um, Uber's AI group, um, working on the machine learning platform known as Michelangelo, and also the applied research teams.

We worked on a number of open source projects there, including Hova, a distributed training project, and Ludwig, which is a declarative, um, low-code framework for building, uh, deep learning models, including large language models. And so today I'm gonna talk about, um, what we've done in the open source with Ludwig, uh, for large language models, as well as what we're doing in Preta base, which is all about building and customizing these LLMs, uh, using our declarative interfaces.

And so the promise there is that you can get to something that is customized for your particular task with LMS in less than 10 lines of configuration using a, a YAML like language. So in this, um, well talk not webinar, in this talk, we're gonna be, uh, starting with a little bit of framing on, um, ML lops versus mops.

You know, like I think at this talk we're held, um, you know, a year ago we'd be framing it as ML lops in some way. But now everything is very much oriented towards large, large language models and, and what these things can do for us. We're also gonna talk about what it actually means in practice to build and host a large language model.

And then I'll go into, you know, different options that you have in terms of. Choosing the right type of customization for your task, whether that's, you know, zero shot learning, you know, prompting based strategies, indexing, which was what the previous talked was, was covering, or even fine tuning. Uh, so actually training models yourself, um, you know, using a base large language model.

And then I'll also give you a, a live demo, um, knock on wood that everything works, um, covering some of the features, um, in credit base that we talk about in the slides. So first I wanna go back to the origins of, you know, this kind of type of problem that we're trying to solve, which is, you know, traditionally been based on using machine learning.

And so with machine learning, you know, we have this whole, uh, now domain of ML ops, which is all about productionizing these systems. So trying to get to production, keep things, uh, live and fresh in production. Um, where the artifacts that we're putting in production are these systems that enable us to derive insights from our data.

Uh, for performing useful tasks for a business. So like driving insights about which customers are likely to churn and being able to then take an action, like prevent them from churning in some way, like giving them a promotion or something like that. Or recommending products to people, uh, to buy, uh, when they're browsing our apps, things like that.

And so traditionally the MOPS journey kind of looks something like this. You have a user that wants to transform some data from a very large dataset into a snapshot. They do some exploratory data analysis on that. Uh, they engineer some features from that data. Then they create a model, uh, on top of the training data, uh, evaluate it on top of the test data and iterate on this until they're generally happy with the performance and they deploy it.

And then the model is available for real time and batch prediction. And so the important thing that I want to call out here is that, um, the value comes at the very end, right? The value is once you have something live in production that you can actually start generating predictions from. And so all of this part, which is typically like, uh, days, weeks, months, maybe even years long process, um, is all just costs that you're sinking into it until you can finally actually use something for, for solving a problem.

And so when I think about what LMS really do to change the game is that they basically, Take this entire upfront cost and kind of move it, move it away in some ways. And so, you know, you see this in practice with, you know, systems like Chat, G B T that I'm sure everyone here has used is that, you know, you have a system that performs a lot of these machine learning type tasks that is just out there to use and you've never had to do any work to get it to produce useful output, right?

Because companies like OpenAI and, and other companies that build LMS, like Google and Facebook or Meta, et cetera, Have already done all that heavy lifting for you. Right. And so when we think about the new L LM lifecycle, um, it really starts with building these foundational large language models, which has a self supervised training component over unstructured data like the whole internet.

Has an instruction fine tuning step over a labeled data set with like curated examples that you want the the model to emulate. And then there's also a reinforcement learning through human feedback stuff where you try to get the model to refine its output to, uh, be tailored towards human preferences.

So like not saying things that are offensive or illegal and making sure that generally the responses match what the company. Hosting the model once, once the outputs to look like. And then once this is done, you know you have the model living in external registry, like something like, uh, the hugging face hub, right?

And then you can deploy it as this large language model box here. And then all these sorts of tasks open up for you. You can do prompting on top of that model you might wanna do. Indexing where you have an embedding model that, you know, looks up relevant examples from an index and uses it to enhance the prompt for what we would call like retrieval, augmented language modeling, or in context learning.

And then you also have a fine tuning step that you can do where you take your label data and you, you know, refine it. And then the output becomes another model that now lives in an internal registry that you can similarly host. And then this becomes a, a flywheel, right, of continuous improvement in theory.

And the other nice attribute of this is that, You don't have to start anymore with the training. You don't have to start at fine tuning. Like you can, you know, start by prompting, you can do some indexing, and then as you go, you can build up a labeled data set by, you know, providing feedback or, or providing annotations on the results from the model.

And slowly work your way up to a data set without having to pay that upfront cost. But one thing that I want to call out that's very important here is that this whole part on the bottom left here is really the kind of general intelligence part. It's like you're building this foundation model that knows how to do all sorts of things, like, which might be your task, but might also be things that you don't care about, like generating poetry or whatever.

Um, and in most cases you don't need this general intelligence. Even though this is the part that's most expensive, what you really need is this solution-oriented flow here, which is all about getting the model to do something specific for, for your organization, right. And so when we then think about what the L L L M Ops lifecycle looks like in practice, it's really something more like this, which is just that previous diagram with all of the pre-training work cut out.

And really the journey is all about choosing which of these refinements steps you care about in order to get good results. Now, certainly some people actually are interested in building a large language model from scratch today. That typically is a very costly endeavor that can cost. Tens of thousands, hundreds of thousands, millions of dollars.

Um, but the nice thing is that, you know, because these systems are general purpose, most of us don't need to do that. Most of us can take open source models like the Falcon model that came out and fine tune them to our specific tasks. Without needing to spend all that upfront cost. Uh, again, but in practice, building and hosting, uh, these open source LLMs, um, is still quite a bit of a challenge even after the model's been initially trained.

So, you know, on the surface you think about hosting the LM for low latency zero shot inference, but as you start to go deeper, you can think about building the information retrieval layer for doing few shot inference and, and doing chunking of documents and document retrieval. Um, you can think about doing distributed training, uh, to the fine tuning over larger data sets of label data, and then you can go all the way down to, uh, you know, building the models from scratch with the pre-training and the r lhf and all that if you, if your use case really requires it.

So there's lots of complexity to this problem that sits beneath the surface. And so what we're trying to do with Ludwig and credit base by extension is make this whole process a lot easier to work with. Through declarative interfaces. So declarative is all about, you know, if you think of languages like sql, it's all about saying what you want, kind of describing your problem, describing your desired state, um, and then having a compiler system and an execution engine that sits underneath that translates that into some actionable model training process or model inference process to solve the task, right?

So, for example, in Ludwig today, Coming out in our new version, 0.8, you can specify a zero shot configuration using a model like the LAMA 7 billion model. You can describe what kinds of inputs you want, what kinds of outputs you want, what your task description is, and you can start getting value without having to do any training.

You just basically can throw it at a data set and, and start getting predictions about like what was the intent, um, from the user based on. The inputs, uh, the input text. Um, from there you can expand to indexing in few shots with just a few additional lines of configuration. So you can say, I wanna do semantic retrieval.

I wanna take the top three most relevant examples and insert them into the prompt. And then you can also transition over to full fine tuning by just adding an additional training section where you say, I wanna do fine tuning, and, you know, all of your normal, uh, supervised machine learning parameters then become available like, What optimizer I want to use, what learning rate, et cetera.

And so normally this would require quite a lot of infrastructure to kind of transition between these different points. But with credit base, we wanna make it as simple as just, you know, okay, I'm gonna change a little bit of my configuration. And now I have a very different type of, uh, task configuration and model configuration.

And so this is all Ludwig, but which provides this, uh, low-code interface, no reinventing the wheel while also providing quite a lot of depth to what you can do. And then what Preta base layers on top is an enterprise, uh, platform that manages infrastructure, data deployment, things like that. And so we have data connectors that can connect to Snowflake, Databricks, uh, s3, all sorts of different, uh, data lakes and data warehouses.

Uh, we have a model repository that tracks all of your experiments and all the, uh, trained weights from fine tuning and other processes. And then we also have, um, uh, an interface for querying these models, um, that allows you to, um, run interactive analyses like, uh, do, um, uh, a zero shot prompt or a batch prompt over a subset of your data or do index based querying, um, in a ui.

And then we also have the whole managed infrastructure layer. That sits on top of all this, uh, that provides the serverless, um, training and prediction capabilities of the platform. And so what this looks like in practice, you know, we have multiple different entry points. There's a ui, there's a, uh, Python sdk, there's also a command line tool.

And so from the command line tool, if you want to deploy a model like let's say Falcon 40 billion, it's very simple. You just say pase, deploy that model. And then you can start prompting, you know, what are some popular machine learning frameworks. From there, uh, batch prediction is similarly as simple as, you know, being able to, uh, take your prompts and then now apply them over a data set.

So, for example, given this input, provide a rating, and then you have a bunch of unscored examples, which lives as a data set, like a CSV file or something like that, and you want to do batch prediction over them. That's as simple as running a query over a data set. Then if you want to enhance those, um, prompts with, uh, an index of some sort, that's also similarly as simple as just providing, uh, an index, uh, that's exists in Preta base, um, to your command so that you know, instead of just, uh, using only the context of the, of the input and the prompt, you also now have the ability to augment that prompt with other relevant examples, um, to help it improve its responses.

And, you know, just to kinda make this a little bit more concrete, for example, you might have, you know, a sample input that's like, uh, let's say we're trying to predict hotel reviews, something like that. The hotel was nice and quiet, that could blah, blah, blah. Like that could be the input. The task is, you know, given the sample input, provide a rating, and then what Friday base is gonna do under the hood here.

Is we're going to go to our index and fetch the most relevant examples, uh, to help you, uh, complete the task. And then from there we'll feed that to an LM and make the prediction. And so, if you saw the last talk, I'm sure you're very familiar by now with, you know, this kind of retrieval based approach.

But one thing that I think is very interesting about, um, what we're doing at Preva is that we're applying this specifically to. Doing task-based workflows like supervised ml or classification or predictive tasks, um, and providing an interface that's very similar, uh, to what you would do, um, if you wanna do just traditional supervised machine learning, uh, as well.

And so being able to swap between those different strategies to, to trade off between cost and time and compute resources and performance. And so when you're ready to formalize this, uh, we represent these sorts of configurations as models in PREA base. So you can create a zero shot model definition, um, from your configuration, and then somewhere you can create a few shot model definition, um, as well with just a, a very simple, you know, command line interface.

From there, um, we also support fine tuning in the platform natively. And so for folks, you know, very quick primer, uh, if you haven't done fine tuning before, is essentially taking a pre-trained model, providing some examples of inputs and outputs that you want. And then from there, um, the model weights will be adjusted to be better at outputting the type of thing that you're interested in.

And translating this into, um, a configuration in lubricant. Prase is just as simple as, you know, giving a specific trainer type and then specifying some hyper parameters, optional hyper parameters, like how many, uh, blocks you wanna use, what your, uh, optimizer is, et cetera. And so the promise of fine tuning is really about being able to.

Achieve better performance on your tasks and also lower latency in some cases. Whereas if you got, you know, good performance with Falcon 40 billion without fine tuning, you know, and zero shot or few shot prompting, that might be good enough. But ideally, you might want to increase the, the throughput or decrease the latency by using a smaller model.

And oftentimes fine tuning a smaller model will give you good if not better performance, um, at a much lower latency cost. And there are different ways to fine tune. I think most people often imagine that when they talk about fine tuning, lm uh, they're thinking about fine tuning the full text to text model.

So you have an encoder. That outputs some embeddings. And then you have a language model head decoder that sits on top that generates sequence output. Um, certainly for some case cases like generative cases, uh, that's very useful. But oftentimes what you're trying to do is something like classification, right?

Where you have a fixed number of categories that you want to classify into. And in those cases you might actually be better off doing something like chopping off the language model head that generates text and instead just taking the hidden state of the model. And then attaching, uh, classification head on top.

Um, in which case you have a few different options. You might want to fine tune the whole thing together, or you might want to fine tune just the classification head and leave the original encoder frozen. And so thinking about like how to do these different things with pie torch, uh, even like frameworks that make this very.

Easy, uh, that sit on top of pie torch, uh, can often be a little bit of a, of a problem, right? But the nice thing about, uh, again, with Ludwig is that we make it very easy to transition between these different things, again, through the declarative interface. So if you want to do this sort of full text generation, fine tuning, um, it's as simple as a convey that looks like this.

Whereas if you want to train, you know, a model that just uses the large language models encoder, You just specify that as an encoder in your input feature, and then have an output feature that specified as category, which attaches the category classification head. And then similarly, if you want to freeze.

The, um, lm uh, parameters, which will, you know, speed up training, uh, you just have to specify that the trainable, uh, the weights are trainable equals false, which will freeze the weights and therefore give you better, uh, better throughput. And one thing I want to call out is that, you know, setting trainable equals false.

You might think, why would you want to do that? Wouldn't it always be better to adjust the weights? And from a strictly performance oriented view, like most of the time, training the weights is gonna give you better. Model performance. But I wanna point out that is much, much faster and cheaper to not train the weights of the model if you, if you don't have to.

And so I always recommend starting when doing an experiment with not training the weights of the l lm. And so just to give you a comparison point, like here's a training duration for one test. This was on, I think an I imdb refused data set. It took about 60 minutes to train with full fine tuning, and then going all the way to some of our optimizations that we provide in losing and credit base.

Which is automatic mix precision, fresh plus c encoder embeddings. We got that down all the way to, uh, just an order of less than a minute actually, um, to do the fine tuning, which is pretty crazy because at that point you're just, You know, doing one forward pass to generate the embeddings, and then you're just fine tuning, uh, a bunch of fully connected layers on top on the bottom.

Um, in addition to that, we have other state of the art techniques and Ludwig available with single parameters like parameter efficient, fine tuning. Um, methods like Laura, uh, which is essentially a way to augment, um, the model architecture with certain trainable parameters. But, you know, doing this is, you know, there are frameworks that make this easier, but still like a non-trivial thing in most cases.

But with Ludwig, again, it's just one parameter. And we also provide, uh, distributed training with model parallelism and data parallelism outta the box. So, for example, using deep speed. That's as simple as just specifying a backend configuration, Ludwig, or if you're using Preta base, there's no need to specify this at all because we, uh, do all the right sizing of compute for you based on the description of the model that you provide.

And so we'll do the selection of deep speed and the selection of compute resources for you. And with that, I want to get into a quick demo that shows you how some of these features work in practice. So let me switch over to, um, our, uh, credit base environment for a sec. And so this is our, where you land in credit base, you have the LM Query editor, and here I've selected, uh, model Vic Kuna, 13 billion.

Um, not important, which model. You have lots of different ones, uh, available. This one's just for demo purposes here, but. Um, here you see that I previously query the model, you know, what is machine learning? And it gave me a response. Now again, going back to the hotel reviews data set, I might wanna ask a question like, what are some popular hotel chains?

And then from there I'll go ahead and query the model.

And, okay, so generate this response, uh, which was, uh, then, you know, some include Marriott, Hilton, uh, Hyatt, et cetera. So it looks pretty good. Now what if I want to ask something a little bit more specific, that might be something, uh, about a data set in particular. So for example, what are some hotels with poor wifi?

Now the previous response was something that the model already knew because it had seen it in its training data when it was originally trained, right? But in this case, we're asking about something that doesn't know. And so its response is, you know, as an AI language model, I don't know anything about what hotels have for wifi.

And then here are some things that might help you with your answer. But, sorry, I can't help you. So at this point, you know, you might think, well, now I gotta fine tune this model. But, uh, actually it's not. The best place to start, in most cases, like what you probably wanna do is instead just augment, uh, your model with some domain specific data without having to do any fine tuning.

So for that, you know, we provide this, uh, parameter here called dataset to index. And then once I specify that, um, I can go ahead and send this query off to, um, the large language ball. And then the first thing it's going to do is. Uh, index the data if it hasn't been indexed before, and then go ahead and do the, uh, in context learning and and retrieval process on the fly.

And so here now you see that says, based on the given context, you know, we see that, uh, you know, the American N had poor wifi, uh, best westerns, uh, Carmel townhouse launch, et cetera. And if you look at the, you know, raw prompt that was submitted to the model, um, this one's a little bit. Not formatted particularly great, but, uh, you can see that, you know, in there was all the attribution needed to kind of give you a sense for why the model ultimately, uh, came to the conclusion that it did.

Now, once I do this, um, this is, I think, like a very good way to kind of get to know your data and do some qualitative type queries of your data. But oftentimes what you'll want to do is transition to something a little bit more quantitative as well, like, uh, prediction. And so what I can do is show you an example of, uh, trying to predict, uh, specific, um, specific hotel review scores.

So for example, you could say like, given a, uh, review, uh, which in this case I'm gonna, I know is in my data called, uh, reviews full, uh, predict a rating on a scale from one to five. And we'll go ahead and send that query off and see what happens. And just to give you a little bit of a sense for, um, what exactly is, uh, what data this, uh, what the data looks like under the hood.

Um, what we can see is that, you know, I upload this data set here, is that we have a, a data set of reviews on a scale from one to five, uh, with about half of them being five. And then a bunch of review texts, uh, that go along with them with the, you know, median sequence length being, you know, or mean sequence length being about 112.

Um, and then, you know, all the, the content of the reviews sitting, you know, here of various forms, right? And so if you look at the responses, um, we see that, uh, you know, it predicts like, oh, I think it'd be like a four out of five. I think it might be also a four out of five. Um, Two out of five, et cetera.

But the, you'll notice that the responses are not really in a format that we're particularly happy with, is the model's very verbose. Um, it's, you know, adding a lot of co of filler words that we don't really care about, though, you know, one thing you might wanna do is iterate on the prompt to try to get, uh, to something that more accurately, uh, reflects the format that you're interested in.

And so here I just have a, a prompt that I, uh, spend a little bit of time iterating on myself that shows okay, you know, here I've managed to get the model to output something that looks a lot closer to, um, what I would ultimately want. So, which is something like, you know, a, a particular format. And so now instead of the model kind of spewing a bunch of, you know, verbose stuff, it just gets straight to the point.

The review is four, the review is three, et cetera. And now at this point you can see that if you compare it to the ground truth, it's in most cases not too far off from the actual review score. So at this point, what I might wanna do is formalize this as a task definition in credit base. And so that's where you get into the model repos that exist in credit base.

Um, so here we have a model repo for this particular task that shows a zero shot. A few shot config with random retrieval and a few shot config with semantic retrieval. And in the interest of time, since we've ran a little bit, uh, low on time, I won't get too big much in the weeds here. You can see how I took my same prompt and then specified as the prompt for this model, and then told it that I want my output to be a category that has this vocabulary.

And additionally, I can provide some validators, uh, that kind of validate that the output, um, you know, matches what was in the out, what was in the, um, I'll put up the response in a particular format, and if I want to then go to doing few shots, um, with, uh, semantic retrieve, with random retrieval, it's as simple saying, I wanna do retrieval is random K equals three.

And if I want to do then semantic retrieval. It's as simple as saying, you know, I wanna do semantic retrieval. And the important point here that I wanna emphasize is that you can see also the progression in performance as you go from models that, uh, are doing just basic zero shot with 37% accuracy. All the way up to the semantic retrieval, which gets you to 66% accuracy.

And then if I wanna go further, we can start to look into fine tuning techniques, uh, where I can say, create a new model, um, from latest. And it draws me into this ui, uh, which is our model builder ui. So I select a large language model, um, as the type of model I want to build. And then from there, there are all these parameters.

I can select the model name. Uh, the prompt template that I want to use, retrieval strategies, um, and then my various, uh, aada adaptation methods like LO or alora, and then all of the different parameters which come with reasonable defaults, but that I can configure beneath that. And then all of this is also fully modifiable and, uh, yaml, uh, through the config editor.

Uh, which exists here. So, um, yeah, so all the options available, um, from zero shot, few shot, fine tuning all in one platform, and then all the ability to deploy this model back into production. So, So you can start querying again, um, from the, you know, original editor, uh, completes the loop so that you have it all integrated in one place.

And I'll go ahead and stop there and open it up to any questions, uh, from folks. So thank you. Right on. Very cool. Thank you for showing us this. And your prayers were answered because the demo gods were very nice to you this time around. No. Yep. No, uh, no crazy errors from what I saw, so it worked out well.

That is awesome. So it takes like 20 seconds for the, what we're saying right now to be seen by the people in the, uh, in the chat. So I'm gonna go ahead and ask a question that came through and, uh, Michael actually already answered it in the chat, but it might be good to. Talk a little bit more about like fine tuning different models and it doesn't matter if there's different mark model architecture or size or any of that.

Like what do you need to keep in mind when it comes to the fine tuning piece? Yeah, that's a good question. Um, so I'd say that with fine tuning, the most important part is definitely the data. Um, I would honestly say that most of the parameters, um, you don't really need to play around with them too much.

In most cases, like with fine tuning, your goal is really to avoid. Kind of screwing up all the background knowledge that the model already has, so that catastrophic forgetting problem. And so when you use like these parameter efficient techniques like Laura, because you're only adjusting a minimal set of weights, it really helps with that, uh, part of the problem.

I would say that the much more important thing is to make sure your data's in the right form. And then after that, I would say it's important to make sure that you use the right type of fine tuning technique. For the problem as well. So again, I think a lot of people often starve, like, I have a text classification problem, let's fine tune a large language models to output text that looks more like the type of thing I want to predict.

Um, in most cases, you'd much rather just get rid of the large language model output and then add that classification head on top. Uh, rather than try to stick to a text to text model, right? So choosing the right technique for the task is, is also the, the most important thing to consider. But then from there, in terms of like specific model architectures and things like that, I would say that the important thing is to try to start with something that's small enough to get the job done.

Because, you know, in terms of like the iteration velocity, um, Trying to start with like, fine tuning a 175 billion parameter model or something for a classification task. It's gonna cost you a lot and probably, you know, won't do any better than a smaller model if the task complexity is is you know, bounded enough.

Right. Yeah, be pragmatic. That is a great call. And also you don't really know what it is until. It comes out. So it's kind of like a crapshoot. Uh, you, you can, if you do it like what you were talking about with these gigantic models, then it may cost you a ton of money and it might not even be that good.

Yeah, absolutely. So aware of that. So I think oftentimes a good strategy is to, you know, a lot of these models come in different, um, sizes, right? So like Falcon 40 billion, Falcon 7 billion. I think a, a good strategy in a lot of cases is, Um, like use Falcon 40 billion for doing prompt engineering. Find like a prompt that gives you pretty good results.

Then try fine tuning Falcon 7 billion with, uh, for your particular task, using that prompt as the input prompt. Um, and you'll find that it's, in some ways you're kind of distilling the model at that point, um, until like this smaller model to predict, similar to how the original model predicted. But you'll find that actually it probably does quite well at the task at that point compared to the 40,000,000,001.

Excellent. All right, last one for you before we keep it cruising. Does Preddy base work with self-hosted open source LLMs? If the data is sensitive, does the data have to go to the Preddy base server? Great question. Yeah, so we have two different ways of deploying pre base. We have. Uh, a SAS version, which runs in our cloud.

So you don't have to manage any infrastructure yourself, but for the particular type of use case that that person, um, asked about, we also have a vpc, so virtual private cloud version where, uh, all of the compute runs in your environment. And so no data ever leaves your environment. So if you have sensitive data in aws, for example, uh, you can actually run the part of Preta base that processes the model for training.

In your data plane, in your, in your AWS environment, and none of that data is ever crosses the boundary into our environment. So Good. Yeah. Airtight make it that airtight gap and then, Uh, get all those soc twos and the soc threes and all that fun stuff. Complied with you're golden. So, dude, Travis, this has been awesome man.

I appreciate you coming on here and I appreciate the demo. If anyone wants to continue the conversation with Travis, of course you, you've got the chat, but you also are in Slack I believe. So feel free to. Tag Travis in the community conference channel and we're gonna keep it moving right now. Also, I, I mentioned it before, but I'll mention it again.

Hit up the tab on the left called solutions and you can find out more about pre base and I saw. Thanks Travis, man, I'll talk to you soon. Bye.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

Current State of LLMs in Production
Posted Oct 18, 2023 | Views 1.6K
# Natural Language Processing
# LLMs
# Truckstop
# Truckstop.com