MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Ax a New Way to Build Complex Workflows with LLMs

Posted Sep 11, 2024 | Views 1.5K
# LLMClient
# DSP paper
# AX
Share
speakers
avatar
Vikram Rangnekar
Software Engineer @ Stealth

Vikram builds open-source software. Currently working on making it easy to build with LLMs. Created LLMClient a typescript library that abstracts over all the complexity of LLMs, it is based on the research done in the Stanford DSP paper. Worked extensively with LLMs over the last few years to build complex workflows. Previously worked as a senior software engineer with LinkedIn on Ad Serving.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

LLMClient is a new way to build complex workflows with LLMs. It's a typescript library based on research done in the Stanford DSP paper. Concepts such as prompt signatures, prompt tuning, and composable prompts help you build RAG and agent-powered ideas that have till now been hard to build and maintain. LLMClient is designed for production usage.

+ Read More
TRANSCRIPT

Vikram Rangnekar [00:00:00]: My name is Vikram Rangnekar. I'm not, you know, I'm not with any specific company right now. I'm just consulting, building my framework and, you know, like, trying to figure out where this whole LLM thing is going and where I fit in my coffee. I just like any good, like light roast, you know, love all the stuff that came out of, you know, comes out of SF and, like, the Pacific Northwest, and they got. And if you ask me what's my favorite coffee, I'd say Cadd and cloud in Santa Cruz. VRv or cat and cloud? Both. Both those places are. You cannot go wrong with them.

Demetrios [00:00:41]: What up, MLOps community? We are back for another podcast. I am your host, Demetrios. Today it's all about DSPy. Vikram broke down so many concepts that I thought I understood, but I didn't. I'm not gonna lie, I didn't. And you get to hear me working it out in real-time. Hopefully it helps you understand. Let's get right into it.

Demetrios [00:01:04]: All right, real quick, I want to tell you about our virtual conference that's coming up on September 12. This time we are going against the grain, and we are doing it all about data engineering for ML and AI. You're not going to hear rag talks, but you are going to hear very valuable talks. We've got some incredible guests and speakers lined up. You know how we do it for these virtual conferences. It's going to be a blast. Check it out. Right now.

Demetrios [00:01:33]: You can go to home, dot mlops.com munity and register. Let's get back into the show. Let's start by telling me what you've been working on, because I want to start there and then go into, like, the nuances of it and all the different challenges you've encountered.

Vikram Rangnekar [00:01:50]: Well, I started, like, building with LLMs a while back and, you know, like, a bunch of startup ideas, first one being this thing called 42 papers back before I, for all the archive excitement, I predicted sort of like, the archive is going to, like, take off. And I wanted to build this, like, way to discover, like, what everyone's reading, like, right now on archive and stuff related to you. Like, you know, I want to say trending, sort of, exactly, but also, like, related. Trending.

Demetrios [00:02:23]: Yeah.

Vikram Rangnekar [00:02:23]: And so I was, like, exploring ways. And I didn't just want to throw the papers out there. I wanted to, like, extract the key points and stuff like that and just show that, like, you know, forget, forget the papers kind of thing, the whole paper. No one's reading the math anyway, so. So I was like, yeah, and we're trying to figure that out. And we had this whole idea around let's get, like, let's get. And this was early, like 2017 ish, maybe, and called it 42 papers, you know, named it after the Douglas Adams whole 42 things. Yeah, but.

Vikram Rangnekar [00:02:56]: And we were stranded. We had this idea where let's get all these people sort of like, you know, the Amazon service where they can, like, all these, like, PhDs or whatever can help us and do this, and we give them like, points or money or something. Karma. And we had this whole thing, and we're planned, and, and I was like, you know what this is? I don't know how fast this is going to scale. It's going to be hard. And get people read papers and extract value and stuff like that. And so I stumbled onto the early models coming out of Allen AI. These were like, Bert models and stuff like that.

Vikram Rangnekar [00:03:26]: And they had trained them on archive. These are small models by today's standard, probably. I don't even remember, probably 500 million parameters.

Demetrios [00:03:34]: Yeah, maybe.

Vikram Rangnekar [00:03:35]: And the thing could summarize. Yeah, and this thing could summarize the paper. The first time I actually saw it, I was, like, blown away, honestly. You know, till then, I was. I was. I was very familiar with the ML world, you know, from being at LinkedIn. They used ML everywhere.

Demetrios [00:03:49]: Yeah, to the best. But they definitely use it.

Vikram Rangnekar [00:03:52]: Yeah. I mean, they, they use it, right. It was ad relevance fraud and all kinds of stuff, right. Everywhere. These are like, not models. These are not GPU models. These are models that ran on hadoop or whatever. These are more statistics leaning sort of.

Demetrios [00:04:06]: The traditional ML world.

Vikram Rangnekar [00:04:08]: Traditional ML. And this thing was like magic because that I could kind of, you know, I could roughly wrap my head around what's going on. But this thing could summarize like a blob of text into, like, points. I was like, how does that. What just happened? And that was, like, really mind blowing. And then kind of I took, you know, went down that rabbit hole, and I needed sort of like a framework, I realized, to, like, build with these things. And so I started, like, an early, like, the early work of building a framework. And.

Vikram Rangnekar [00:04:44]: And there was a lot of learnings around the way and, you know, and it, like, evolved along the way. And that's where, like, I am today.

Demetrios [00:04:51]: With terms of ex incredible. So starting in the, with the fine tuned burp model for 42 papers and then going from mechanical tert, I guess.

Vikram Rangnekar [00:05:05]: Is what we dropped that real quick when we realized the model could just do that. Right.

Demetrios [00:05:09]: I mean, yeah, it's incredible to see. And now it feels like you can still probably get what you're looking for with that fine tuned Bert model. Or maybe like a fine tuned small.

Vikram Rangnekar [00:05:26]: Model, I think a fine tuned small model. There were a lot of issues with that bird model. There was a lot of literal heuristics like oh, if your word is broken here, then let me just join it together. There was a lot of, and they were sort of, obviously they could do it, but a small model I think would not cost too much more or be slower. And I'm not even sure if you probably need a fine tuned model at this point. I don't know, I mean, some of these like really small models coming out today are like super powerful. You know, this Nistan guy, he had this like merged model he launched yesterday. It was like he did this ternary one bit, quantizing on it.

Vikram Rangnekar [00:06:10]: It came down to like one like 65 mb or something. And this model was like apparently like super fast and really good and so, you know.

Demetrios [00:06:22]: Yeah, exactly. You can get so much further with some of the stuff that's out there these days and that it is funny to think about how now you out of necessity had to create something to help you along the way from that bert. But along the way things just blew up in the space and so you were able to level up really quickly. And we mentioned too, before we hit record, that you're very much focused on the JavaScript typescript world because it is so fast and you can iterate and you can do things really quickly. So talk to me about first, just like maybe give us a bit of background on what the actual tool is that you built. And then why this love for JavaScript typescript. Sure.

Vikram Rangnekar [00:07:17]: So the tool I built, it's basically a framework, it's called Ax, and it doesn't stand for anything really, just cool sounding name. So I mean, you call it agent framework or whatever, I mean, so the idea was like when I started, I needed initially a way to abstract out LLMs. So that was my first thing. It was like okay, all these like client libraries have their own set of bugs and all of this, and these are just like API calls. Let me abstract this out. And that was just a simple goal. And I think I achieved that with this pretty quickly essentially. And then I realized on top of that that's like I need a way to improve prompting.

Vikram Rangnekar [00:07:59]: And then I realized these blobs of text that I have to maintain like 0.1 do this, 0.2 don't do this 0.3 do that. Like, I hated that, honestly, that I did not. It caused me not to reach for an LLM more often in my, like, code because there was, like, this whole, like, dance I had to do, and then there was, like, loops around it to make sure it didn't break. And, you know, there was a whole bunch of stuff.

Demetrios [00:08:22]: And I find myself, too, just on that fact, really never being able to quit because I always think the next prompt is going to be better. And so it's like, yeah, that's, there's. I can never be like, yeah, that's good enough. You know, it's because prompting is.

Vikram Rangnekar [00:08:39]: You don't know.

Demetrios [00:08:40]: Yeah, yeah.

Vikram Rangnekar [00:08:41]: And you don't know what word in there is messing things up and, you know, yeah. And that just, like, just feels icky to me and I didn't like it. I know instruction kind of following is obviously the way these things have been trained, you know, at the, you know, final stage of training. But I was like, trying to look for a better way and. And I tried a bunch of stuff. I was like, there was this templating thing and blah, blah, blah. And it just didn't quite fit. And then I stumbled onto the DSPY papers or the demonstrate search predict papers from Omar and Stanford.

Vikram Rangnekar [00:09:11]: So the key, I know you've covered this before, but essentially, to me they were like, so what's really interesting about that paper is there's this trend where people just take an idea and then chop it up into ten papers. And all these papers are really light because now they have ten published papers. But Omar took all his best ideas and put it into this one paper. And people don't notice that. Like, if you're like, oh, okay. It's like these examples. You can tune these models, but it's a lot more to that. If you start the paper, the first part starts with signatures, right? Like that signatures concept really like, to me it made sense because they're basically functions, but they're saying, is there a bunch of inputs, bunch of outputs? And, and then, you know, obviously with ex, we took that even further.

Vikram Rangnekar [00:09:53]: We added the concept of types on those things. So essentially we say, I have two string inputs and I need a. A boolean and a number outputted. Then we'll enforce that automatically. And that's basically what you need in your code. You want something with inputs and something with outputs. You don't want blobs of text that you're trying to substring through. That was a real clear abstraction to me.

Vikram Rangnekar [00:10:20]: And then I was like, there's so much more that could be built on the abstraction because these inputs and outputs, when put together, what is this? This is an API, right? Once you have this kind of structure on a prompt, it's very easy to like say, oh, this is not prompt anymore, it's an API. And now if you have, and also another realization came on is you don't want to do too much in a prompt. There are talks by OpenAI guys and stuff. I don't know the exact, I can't tell you the exact technical reasons behind this, but at a high level from a builder, you don't want a prompt to do too much. You want it because there are all these tasks that these things are trained on. And when you essentially ask it to do something, the LLM is pulling on those tasks. And kind of like there is some sort of a limitation because of the size of the LLMs and all of this stuff, your context size, honestly, I don't know where that limitation comes from. But if you say, oh, can you classify this? Can you summarize this? Can you also call these functions? Can you also do this? Like it's going to mess up more, so you want to like focus it down, say, okay, let's summarize here, let's take that output, let's put into another one that then does function calling based on it, augments it with more data.

Vikram Rangnekar [00:11:36]: Then let's take that result, put it in a third prompt. Like if you have these like long context, I mean, yes, we do have longer contexts now, but that does not mean you can have a longer multi step, blah blah blah, because these things are not trained for that kind of a thing. Even if you notice, like Google talks about needle and the haystack tests, which is fine, which is basically an extraction step. It's saying we put a lot of text and we can find the right step. They don't say, you can also then do function calling and then do this and then do that. So there's this whole multi step task thing. I think they refer to it as long horizon tasks, and most of these models are not trained on that yet. So even though you have the long context, you still want to get your work done as quickly as possible.

Vikram Rangnekar [00:12:25]: Yeah, you can put a lot of text in there and then say, oh, you know what, find the things you need from here, but you are limited to that. And then you want to take those texts and then do something else anyway. So the point is that it defined an API and that API made it really easy to then plug in all these workflows together. So if I have an extraction step, a classification, a function calling augmentation step, then. So it's basically sort of like code. And that signature piece was really, really like. Honestly, that blew my mind first. I mean, it's not anything like out of the blue, it's functions.

Vikram Rangnekar [00:13:02]: But to apply that idea to LLMs is a paper onto itself, sort of. Right, yeah.

Demetrios [00:13:10]: Gives you so much more stability.

Vikram Rangnekar [00:13:12]: Yes, exactly. So, and like with ex, we've taken that even further because now these functions, we, we have, we, we focused on the concept of agents. These are basically these like, sort of like Brahms, but they have the capability to do multi tone reasoning, function calling, I mean, reasoning, you know, as much as these things, reason, function calling and like try to get a task done. But when they call these functions, these functions aren't necessarily just functions like, like other, like some remote API, but they can call other agents because there is this input output structure. You can easily plug agents into agents as functions. Now you can have a tree of agents and you could have something like, hey, this is like the one that you asked this agent about the stock, about a certain stock. And then it has, it calls an agent to do analysis on something and then something else and something else. And then these four agents return their results, and then this main thing like decides, okay, here's your answer, sort of thing.

Vikram Rangnekar [00:14:17]: Right?

Demetrios [00:14:17]: Yeah.

Vikram Rangnekar [00:14:17]: So it makes it really easy to plug agents together and stuff like that. The other thing with the paper that really blew me away was the concept of examples. So essentially, a lot of times we like, you know, we just talked about it. You put all these like, instructions in there. And instructions are really great with large models. Right? I mean, like, you want to, like the, the bigger the model. And the way I understand it from the paper is you want to use these big models to generate these examples from these instructions. And what are examples? Examples are, if this is my input, then this should be my output.

Vikram Rangnekar [00:14:53]: If this is my input, these should be my output. And these should be rich. Try to cover as much as possible, and then you plug these into your prompt. And now this allows the LLM to infer patterns that you and I might not even see. So it's sort of like if I told you to write something sort of like Shakespeare, but not, I mean, how do I really communicate that? I could use just his name, but, and that would infer patterns. But if I give it like an example of what I want, it's much more powerful, right? There are patterns in there that I couldn't even use english words to communicate to you. Yeah, right. So these patterns are what these lms are able to like leverage, you know, distributions, whatever they want to call them.

Vikram Rangnekar [00:15:37]: I don't know the exact terminal. This is just like, this took all my workflows to the next level. They were a lot more solid. And how many examples can you add in there? There's a new Google paper that said add 1000. Obviously I'm not going to do that, but input tokens are cheaper than output tokens. So use those input tokens to get more solid output tokens. Error correcting and all of that and wasting your output tokens. Right.

Demetrios [00:16:08]: Yeah, that's such a good point.

Vikram Rangnekar [00:16:11]: Yeah. So that's sort of like the second.

Demetrios [00:16:13]: Piece here in DSPY. That's where when they do the teacher student type of stuff that is also.

Vikram Rangnekar [00:16:24]: In there, that is actually like a third step. So once you set the examples and then you do the teacher student bootstrapping, that's when it uses these basic. So in DSPI, examples are very simplistic. You don't have too much information in them. You just have like the input and the quick output and that's only of the whole tree. So it's sort of like if you have a tree of tree of functions or prompts, whatever you want to call them, DSPY programs that does like a whole workflow. The examples, you just set it at the top say this is the inputs, and at the bottom these are the outputs I'm expecting. And then when the bootstrap iterates, it captures all the inputs and outputs of the whole tree.

Vikram Rangnekar [00:17:05]: Do you get it? So like, so essentially you have this like database of all those traces whenever you have a good output. So if you're, every time you get a good output, you say pause, save that. Yeah, pause, save that. So essentially now you have a database of all these like good out inputs, outputs that when you are putting your code into production, you apply them on those things. So this one gets its set of ten great examples. This gets its set of ten great examples, this one. And then the whole tree works so much better.

Demetrios [00:17:34]: Yeah. And that's funny because I had heard others talk about how, and they weren't using DSPY, but they talked about how important it was for them and their workflows to have the high quality examples. And they were saying, I spend so much time just going around the web and collecting examples that I want and I have a database of examples. So, and this was for marketing use cases. Right. So if it's an intro to a blog, they go around and anytime they see a great intro for a blog, they'll collect that and throw it in their database. And now with these different tones. And then they actually, they try to label it so that they can say this is this type of tone and this is this type of formality.

Demetrios [00:18:26]: And then they can give it in those prompts. They can almost reference like this database and say, here's what you're doing. It sounds like with the way that DSPY does it, you wouldn't even need that. Or would you still want to have something like that? Or are they two separate things?

Vikram Rangnekar [00:18:50]: So those are, it really depends. DSPY is about having exact examples for your inputs and your outputs. So if you're expecting like three things, inputs that you have a question, a context, and like another thing as inputted, like, and then you want the outputs and like a classification or something, some, three things. And then you want like a category, and then you want an output that's like a summary and something else. Then you want the exact inputs and outputs, like the whole chain as an example. Right. But if you're like, if you're getting it to do something and you want to make it better and you want to add these examples in there as one as like just a thing to improve the prompt, that's fine. Right.

Vikram Rangnekar [00:19:36]: Because your inputs could just be, now write me a marketing, your first input could just be, essentially, you could give it a marketing task. You could build an agent to do marketing tweets or whatever. And then you could have, the first input could be the task. The second input could be the examples that you're sort of inserting yourself like a context. And then the output would be the tweet. Yeah. Right. So essentially, yes, you could provide your own examples, but the examples that at a high level, they're not different.

Vikram Rangnekar [00:20:17]: They're basically putting something in there based on what you want, like output. So you could use those as example outputs and build your own sort of tree. And they're just high quality things that you have handpicked, you know, in the end. Because, yes, even though the bootstrap thing is sort of like, it captures all the traces in between, but the final inputs, the input, and the final output is something you still pick, right? You say, okay, even though you're going through ten steps, it's like when this is the input, sort of, I mean, it's not exactly, I want this to be the output, right? Yeah, like, sort of, I like an output like this. And then maybe 15 steps in the middle that you don't have to go and set values on. Right. The LLM is, the values are flowing. The LLM, the first LLM activates and that generates an output that goes into the second, then that activates and then goes into the third.

Vikram Rangnekar [00:21:07]: Right. I mean, sort of like function calling the next function calling the next function. And now you don't have to sit and capture all the variables within all those, like the state within the whole chain of functions. Right.

Demetrios [00:21:21]: Yes.

Vikram Rangnekar [00:21:23]: That's really the key to like DSPY. It's basically saying that, you know. Yes, sure. Set high quality examples. Found them from the web or whatever. That's cool. And then we'll help you capture all of the traces in between so that when you put this into production, like, there are good examples set for everything. Yeah.

Vikram Rangnekar [00:21:41]: Because your tweet example might be good for like the final output, but it might not be good for the one classification or extraction step you have in the middle. Right. That might need its own sort of example. So that that essentially was like, sort of blew my mind because I was like, it's really simple. Again, none of this is like, you know, like, it really like solved problems that I was having and I could see others kind of struggling with that as well. Yeah. It also allowed me to like, take my models to smaller, take my work to smaller models and still kind of have that same performance. Something I'm exploring now is there's more and more tuning APIs that are popping up.

Vikram Rangnekar [00:22:22]: Every service provider has their inference API, but now they have a tuning API. In fact, Gemini has made it almost free to run tune small models. If you take this information that DSP wires capture, you can essentially just push it to the tuning API and have a tune model. Then you might not even have to set those on the. So I think there's a lot of value in taking this ahead.

Demetrios [00:22:51]: Oh, fascinating. Yeah. And retracing our steps and going back to something that you said originally that I 100% agree with is the idea of making sure that the tasks themselves are as small as possible. But then where you can really let yourself go is the context that you're giving it on the task. You're able to say, and that's why I think the context windows are great. Having larger context windows are really useful, but the task itself that you're trying to get done, you're not trying to get three or four tasks done at the same time in the same prompt. You're just trying to say, do this one thing. Do it really well.

Demetrios [00:23:33]: Here is all the context you need on how to do it, and then it outputs that, and then we can go along to the next thing.

Vikram Rangnekar [00:23:42]: And, you know, there's actually a really interesting thing here. Even though we see that people often do not realize they're adding another task, that's another thing I've seen often. So they're, they're basically, it's like this, let's do this thing, and then I want JSON output. Guess what? You've added another task. So you don't realize that, like, oh, you know what? I got JSON everywhere. Like, yeah, but you know, dude, this is like, it's another task and it's quite a heavy task now, you know, so, like, that's another, that was another breakthrough that the paper had for me. Like, I've struggled with outputs and I tried to figure my way around and I, and even early on, I realized that JSON and stuff was just like a nice to have. I think people, without, like, overthinking it, they were like, oh, I use JSON everywhere, so this should just pop out.

Vikram Rangnekar [00:24:29]: JSON. Oh, JSON doesn't work well. Let me try. Yamuth. I was like, no. What about key values? Just like, do you really need that nested structure out from this thing? Mostly not. So if you look at the paper, they use key values, and that's really great because a lot of the models, when they're trained, they also trained on the key value format. There's always a title of what the thing is, and then the thing is.

Vikram Rangnekar [00:24:57]: And there are advantages to key value. Firstly, yes, there are no extra tokens that this thing has to figure out. It's not really like the, you're wasting tokens. Sure, that's maybe one thing, but it's like, it has to generate those tokens and then it has to, it's got to do a worse job on something else because it's making room to do that JSON. Right. And then if you're like, yeah, but this so and so API does this really fine. This model provider, if you notice, often those things are slower because they're doing something else. It's sort of like having a second model or trying to force it somehow and the error correcting in the background, and they're hiding that from you.

Vikram Rangnekar [00:25:39]: And then what about streaming? Often these things completely break for streaming even. There are model providers, there are just, JSON streaming function calling is broken because no one's using that or something. Maybe people who are using that just whatever. But so the key value, the advantage of going just key value for inputs and outputs, or outputs specifically, is you can now process things as they stream in. So once you get the key and then you start getting an output and you're expecting the output to be a set of points like say, you know, one dot blah, two blah, three blah, you can now ensure you can like literally validate it as it flows in. And if it's not doing the one dot blah, two dot blah whatever, then you can like early stop and say, you know, hold on, hold on, this is not what I want. I specifically want it to follow this format. That's the concept of assertions that the DSP y paper has.

Vikram Rangnekar [00:26:39]: And so in ex we took that further and we built something called streaming assertions. So streaming assertions allowed you to basically do this like early stopping on, like whether on key values, like okay, where there is like an output which is like a summary and then the summary. And sure, I look at the whole summary and I can then decide oh, this is wrong, blah, whatever reason, or while the summary is streaming out, I can actually look at the thing and say, oh, you know what I wanted. This is in points and this is not in point, stop. Now this obviously is good because it's early stop. So you're going to your latency and all that is better and you're going to save tokens. So there's something that's possible, that's not possible if you go any other route and it's built into the framework so you don't have to sit and try to write this complex thing. And because the framework is vertically integrated, there's no dependency.

Vikram Rangnekar [00:27:35]: So ax has zero dependencies in the packages. So it's the APIs, all the LLM APIs are built from scratch. So the beauty of that is we can actually vertically build the most solid thing we want. We want streaming assertions or whatever, we can just build it now because we don't have to go past the boundary of someone else's.

Demetrios [00:27:58]: I didn't quite understand the idea of these assertions and building out the API, basically being able to validate the type.

Vikram Rangnekar [00:28:10]: Yes.

Demetrios [00:28:10]: And how they said, all right, well one abstraction is what they were doing with DSPY, but another abstraction is saying this can now be an API. And so when we give whatever output to someone, it comes in inherent with the idea of it's going to be in the type that you specify. Is that what I understand?

Vikram Rangnekar [00:28:33]: Yes. So there are two types of assertions. One is you put them, you assert a type. Essentially you say, you know, while you're creating the function signature, you just say this output type is string or string array or number or boolean or whatever. Or you know, and, and the second type of assertion is like a custom function where you want to like, like actually do something with the output. You might just, you might actually call another LLM if you really want. You'd be like hey, is this, is this good enough? If not, throw an error, essentially. And then the first element can correct itself.

Vikram Rangnekar [00:29:06]: So it's a function called on this output value. And this can be done either when the whole value is available, you can wait till the end of value. You might have multiple key values that are outputted. So in which case then you might want to look at the whole value. Or you can do it as the value streaming out, because it's not JSON. If you can somehow validate it, maybe you're looking for some prefix or something before. So you can chop up every line that's coming, wait for a line to come in, then kind of look at it with the regex and think, sorry, you're messing up sort of thing. And this early failure is great because now, firstly, the assertions concept makes it easy to code because now you have all of this captured within your DSPY program.

Vikram Rangnekar [00:29:52]: And when you initially put in the examples and you run the bootstrap, all of this gets engaged, right? So the error correction, all that. So your outputs, all the traces are high quality in between because the assertions have ensured. So in production less of this stuff happens, right, because you already fine tuned it down to the correct. What you need in terms of code, you're not saying please add a dash in front of you have an assertion. It's explain that. So you got high quality examples, you just save those in there and the LM knows I know what this guy needs.

Demetrios [00:30:24]: Okay. And one thing that you mentioned before we hopped on too, is that you feel like there's not a lot of people really thinking about building with LLMs. Or sometimes the way we go about building can be off. And it makes me think about some of the stuff that I tend to talk a lot of shit on, how the things that we are building aren't really that valuable. And so I am looking around and saying like, this is the future we were promised. This isn't what I was hoping for. So what's your take on that?

Vikram Rangnekar [00:31:06]: For one caveat, I would say that we are still early, so we shouldn't forget. This came out like, you know, production ready, sort of like, you know, beginning of this year, mid last year, maybe maybe, you know, so we're, like, super early. I think people just haven't built their LLM muscle right there. Just like, a lot of people don't reach for an LLM naturally yet. I mean, there's so many, like, you know who this is, that joke about how, like, you know, just google it or whatever, right? And now just don't ask me sort of thing and, like, just LLM it. And I think even in code, people don't, like, haven't wrapped their head around what's possible. Like, you know, I've come across people who do not believe LLM can write stuff. Like the whole.

Vikram Rangnekar [00:31:48]: They're like, yeah, but it's not as good as blah, you know, I was like, yeah, that's because you just, like, prompted it wrong. He's like, oh, I tried it. I was like, did you? Okay, hold on. Did you try that with chat GPT 3.5 free? And, you know, he's like, yeah. I was like, dude, you're like, whatever engineer in this company, pay the $20, you know, get. Get GPT four. And then, you know, like, I prompt it better, give it some examples, right? So it's sort of like, people, like, I think, haven't, like, internalized it. And yes, I don't think a lot like.

Vikram Rangnekar [00:32:17]: Like, I experimented with tooling startups, so I was. I was experimenting with the observability sort of spacing. You know, maybe there's something I can build there that I tried. Like, I experimented with this, another tooling idea where we'll make it easy to build these LLM workflows. And. And, you know, everyone, it's so early. Then obviously these spaces are super crowded, so I'm not, like, kind of holding off to figure out what really is the thing here. But also customers, right? I mean, they're.

Vikram Rangnekar [00:32:47]: They're. Most are like, just not. They don't have the people to build us, essentially. I think that's really, like, the job I feel ax can help with is make it really, really accessible. Right? Like, I think the model guys are doing a phenomenal job. They're accelerating much faster than anyone else, and a lot of the people building, like, on the framework end of things, I almost feel like they're not building with real customers because I see a lot of over complication of stuff, and it's not production ready. There's no. There's no telemetry.

Vikram Rangnekar [00:33:20]: Ex has open telemetry built in, so there's already tracing happening. So you plug in a tracer from open telemetry is like an open standard. Yeah, you can plug it. So a lot of these frameworks don't have that, for example, just built in. And, you know, there's a lot of, like, production stuff that you need. It's, you know, there's just lots of dependencies and then you're, you know, like every little, like NPM for this and that and whatever, you know, python things.

Demetrios [00:33:45]: Break all the time.

Vikram Rangnekar [00:33:46]: Yeah. And people, you know, obviously don't want to use stuff that has all that. And then they have the link that's created the whole world on top of their own, like terminologies and blah, blah. Like, that's why I like DSP. Why? Because it's sort of like standardizing some of that. And there's a lot more people saying, oh, okay, you know what? If all of us can have the shared understanding this open thing with, like, that's an open paper, it's gonna help. And then the framework on top of that helps with all of this other stuff because it actually implements it and takes it like steps ahead. But yes, I will tend to agree with you.

Vikram Rangnekar [00:34:19]: People need to build more useful stuff.

Demetrios [00:34:22]: And you're thinking about it not even at the level of what's the actual output there. You're just thinking about how difficult it is to get to that output because of how we've architected this ecosystem. The ecosystem that we live in is just convoluted.

Vikram Rangnekar [00:34:40]: And it's like, I mean, lms are also raw, right? I mean, they're like super, they're like raw power. And you have to sort of like, you know, help someone. Like, you know, you just like, cannot throw a guy into a Ferrari and be like, you know, just like, you sort of have to give him some way to control it. And I feel like, like one good example is really like, you know, like, stop, stop. What do you call those things? Like word limbs, like generation output limits, right? And often, sometimes you might have like a small output limit and you hit it. And then a person doesn't really know why his workflow is not working because somehow the framework's not exposing the fact that this has stopped for this reason. And I've seen people struggle with things like that because a lot of these frameworks and stuff are immature in that sense. Or, for example, why not just continue? Why not have some capability in there saying that if you do hit the output count of 200 tokens, then just like, then run it again and then again until you get your exact structure that you need out, you know, or this whole thing about like, JSON everywhere.

Vikram Rangnekar [00:35:50]: And you know, people struggle and cry about it not generating JSON like, you know, do you just have a key and a value? Have another key and a value, you know, I have key and a value. Do you really need this value to have another key and value? And then like, you know, this whole whatever, you don't, I mean, or just leveraging multimodal. So ex, like when you have the signature, these are inputs and outputs, right? So the inputs don't always have to be text. It could be because now these frameworks, models support multimodal, they could be images. So ex has that built in. And how do you make that work with DSPY? That's not in the paper or anything. So there's a lot of stuff that ex is like taking the paper and then build stuff on top. Like yes, those things are still examples.

Vikram Rangnekar [00:36:35]: So people forget those images are base 64 encoded or whatever. They are still part of the prompt, so they can still be captured as examples. And there are still, there are patterns in those images. So you know, so you can have images as examples with text. So if you have like something that classifies cat, I don't know, cat faces, you know, or something like is this cat happy or sad or whatever, and then you want examples. And now if you run through this and you give it a few examples and get all the traces right, now you have your, your examples contain like, you know, stuff about the cat, maybe some text blob and then like the image of the cat's face or something. So you, so that like almost no framework's doing stuff like that, right?

Demetrios [00:37:16]: I mean, yeah. Trying to just think about it as how can we give it a richer prompt so that there is better understanding.

Vikram Rangnekar [00:37:24]: And make that work with the whole DSPY examples, tuning and all of that.

Demetrios [00:37:28]: Yeah, basically one thing that I know it took me a while to wrap my head around and now I can't even remember how it works. But I would love to know if you messed around with it was the idea of like compilers in DSPY or. I don't think they're called compilers, they're called something else. But Omar was telling me to think of them as compilers and that's where they, Omar and team have had to spend the majority of their work because it is something that most people just have a really hard time wrapping their head around. But it is very powerful.

Vikram Rangnekar [00:38:09]: So I think what you're talking about is what they initially call teleprompters. And then they realized people are getting confused. And then I think they call it compilers now. I'm not 100% sure. I think that it's basically the whole bootstrapping stuff. Basically the thing that takes the, the basic set of inputs and examples where you have a model, you have a teacher model. A teacher model is just basically a big model, and you're not really teaching a small model, you're just doing a context. You don't really need the second one, but sure.

Vikram Rangnekar [00:38:43]: And then you set some examples on it in terms of your top level input and your final output. Then the compiler, basically, it trades on it with some strategy where it runs that once, and then maybe it changes the temperature a little bit or it changes something else a bit and then runs it again with another sort of like adding some sort of entropy to the whole thing and then keeps running it. And then you evaluate the output. You can manually do it or you have some sort of function or another LLM, and then you find the traces in the DSPY paper. They call it demos, demonstrations. That's, that's basically the inputs and outputs of the whole chain. Yeah, that works. That, that is a compiler.

Vikram Rangnekar [00:39:29]: But they have other compilers too. They have something called so that I haven't, they have a whole range of them. And I haven't really dug in this new research coming out about Mipro, which is actually kind of optimizing that the, so there is, I know I'm talked a lot about examples, but there are some level instructions. So there are like the prompt signature does get compiled into a prompt, and there are some level of like text, the high level instructions that you do give it, like, okay, here I need you to classify the faces of these cats or whatever. There is some instructions and the Mipro, at least a cursory. What I kind of glanced at the paper is a way of like doing this kind of iterative trying to come up with better text that improves the output of the LLM with the, like, instead of saying, classify the faces of these cats, they'll say, you know, whatever, dwell on the features of this cat and come up with, you know, whether it's happy, I don't know, whatever. Right. I mean, and, and just like changing the language there gives you better outputs.

Vikram Rangnekar [00:40:31]: So there are, there are this research happening there. Solid stuff. I mean, I, and I definitely, I'm going to incorporate all of that, but uh, I've also like, I've just focused on like, you know, people aren't even using this right now. You know, that's like, let's get people to use this and even just setting examples, you're already like, you know, 80, 80 x, like 100 x from where you were.

Demetrios [00:40:55]: Yeah. Your capabilities go up so much. And I had a friend talk about all the different ways that you can think about using LLMs because a lot of times you, you get stuck in your capability or what you think it should do and how it should do it. And he just has been compiling different Personas that he tries to give the LLMs. Like, one is a critic, one is the, what was he calling it? The adapter. So if there, and there's all these different Personas, and then when you look at that, you can get inspired and you can think, okay, well, maybe I could plug it into this little piece of my workflow or whatever.

Vikram Rangnekar [00:41:46]: In ex, those can be agents, essentially. You could have these like Personas which are basically like agents. You can even give them some tools. Like you can say, this has this Persona and now it also is able to use these tools to do this thing. It's like, I don't know, the searcher or something, and then, and then the researcher, and then you connect into like an archive API or something, you know, like, so it's, it's able to like. And then you improve that little guy over time, giving him a little more capabilities, improving him, you know, like sort of like independently. And then you can plug him with ax into any workflow you want, like you're doing something with, doing something with like stocks or like some papers, research. And you can say, oh, here I have this like, like scientific researcher, let me plug him in.

Vikram Rangnekar [00:42:31]: And now suddenly everything improves because now it has that extra capability, right?

Demetrios [00:42:35]: Yeah.

Vikram Rangnekar [00:42:36]: That little agent comes with his own context, his own examples, his own functions. He doesn't pollute the main flow essentially.

Demetrios [00:42:45]: And when I'm building out that little agent, I am, as you said, I'm tuning it so that I get the best performance for that specific small task that I'm going for. And then I can plug him in to the greater Lego masterpiece.

Vikram Rangnekar [00:42:59]: According to the DSPY, he would get tuned into your workflow so he would not come with his own examples of state. So essentially when you plug him into your workflow and then you run the DSPY compiler or the bootstrap or whatever you want to call it on it, then he would get as like, as the text flows from the top to the bottom, his examples would get preserved. The ones that have, you know, result in good outputs for you in your workflow. Does that make sense?

Demetrios [00:43:31]: Sort of like, yeah, I'm just thinking about so I guess maybe I'm not thinking about asian examples per se, but I'm thinking about one use case that we had on here, like, a year and a half ago when Philip from Honeycomb came on, and he said, they have certain things in their product, and the product, honeycomb can be fairly difficult sometimes to figure out what, what, and how to do what you need to do. And so for that, they said, oh, well, just prompt it. Figure out, and then ask what you're trying to do, or tell us what you're trying to do, and then we will try and do it for you.

Vikram Rangnekar [00:44:13]: Sure.

Demetrios [00:44:14]: And on some level, I can imagine that's a lot of agents going, especially if it's the doing it for you, if it's just, hey, let me pull up a document page that'll show you how to do this. That's one thing. But if it's let's go, and now let's try and do what you're trying to do. It's a whole nother thing.

Vikram Rangnekar [00:44:32]: And so in ex, there's some support for that. The way to do that again, you know, I'm coming back to the start where I just said, you got to chop the problem up. And now there are different ways to chop the problem up. So when initially you get a task that's asked of you, and that's broad, saying, hey, either I need you to do my taxes, and then you come in, hey, I need you to, you know, I don't know, classify my cat's face, right? They're two pretty different things. And now if you put a lot of ages, a lot, you get an LLM to make this decision, right? You can. I mean, that's one way to do it. You could have these, but there is some you can give, like, you know, 20 function, basic agents or functions, whatever you want to call them, into the main thing, and say, oh, you're only, your job is only to send the control to one of these guys, basically handoff, control flow, and then that agent sort of takes off from there. Or there's this concept called a semantic router, or routing.

Vikram Rangnekar [00:45:26]: So there is this whole world around routing LLMs. There are startups that are funded here. And ax comes with something called a semantic router where basically it uses embeddings to classify, sort of like find the closest match to send your task down. So if you send it like a blob, you say, I want you to classify cat faces for me. Then there is a router setup which says, oh, anything to do with cats, cats faces I want you to call this function sort of, or anything do with taxes, call this function.

Demetrios [00:46:01]: But that assumes you've seen the question before, right?

Vikram Rangnekar [00:46:06]: So, no, when the question, I mean, you have to, you're working with a finite set, right? You have functions that, I mean, there is only finite number of things you can do. You cannot like just, you know, if you ask the honeycomb thing to go, you know, I don't know, find your grocery list or hotel bookings, it might not be able to do that, right?

Demetrios [00:46:23]: Yeah.

Vikram Rangnekar [00:46:23]: So there is going to be a finite list of things. So once the question comes in, you embed it, and then you have this router which already has embeddings in it for each one of its routes. You do like a dot product, find the most closest similarity and you send it down that and say, okay, let's activate that pipeline. So the advantage here is it's dirt cheap with embeddings. It costs like literally, it's almost free and there's no overhead of an LLM being engaged until it hits that path. Right. So now you can have far more number of things possible. You can have hundreds of things that could potentially, or thousands that your initial input could go down and all these.

Demetrios [00:47:10]: Workflows have been created and then tuned through this methods that we're talking about.

Vikram Rangnekar [00:47:18]: Yes, exactly.

Demetrios [00:47:20]: So it's really not like the workflows.

Vikram Rangnekar [00:47:23]: Are just like code. It's just code. It's like functions, right? It's like a function which has more functions in it. That's, you know, that's the only way to think about it. And because the signatures are so tiny, you want to call and use these, you just want to, this is almost one line to engage a DSPY program in your code. And it looks like a function call. You have some inputs that you pass in. You call this forward function because like, that's why DSPY is in Pytorch and Pytorch at the forward function.

Vikram Rangnekar [00:47:48]: Yeah, I kept that. I'm still like conflicted. I think you should call it run or something. But it's not like none of the prompt people care about Pytorch. I mean, it's too low level a thing, but anyways, so it's so easy to invoke. So people want to use it. It's like a function call. You just give some inputs, you wait for the output inside it.

Vikram Rangnekar [00:48:09]: They might do a whole bunch of stuff. Engage your examples, do multiple function calling, do assertions, error correction, whatever, whatever, and boom. And finally you get your three things that you want out. And then you call the second function, you pass those as inputs to that get it out. You might even call three in parallel, like use whatever async thing and say, oh, okay, I'm going to send it down three more paths, and then I'll wait for those to complete, and then I'll get the answer. And then it's essentially a graph or whatever. It's like a small program, but it's all LLM based.

Demetrios [00:48:43]: And all of these different workflows, youre shepherding them along, or you're tuning them to make sure that they are giving you the best output. So it's not like these workflows are just created by the LLM. And then you stand around and say, like, is it going to give me the right thing?

Vikram Rangnekar [00:49:04]: No. You're setting examples on each one of these sub tasks, the best examples you can have, and you keep improving them over time. And that makes each one of these subtasks solid, because each subtask is like, it's its own clean context, right? Has to have its own set of examples. It cannot have the examples of the top level task or the task besides it, or the task below it. It has its own unique inputs, its own unique outputs. It could be a subtask in your subtask somewhere else, but it sits independently. It says, my job is to take this text that's coming in and classify it into these four things. Maybe.

Vikram Rangnekar [00:49:41]: Or my job is to extract out the right, I don't know, tax sections based on this thing. So now it needs its right examples for.

Demetrios [00:49:53]: So cool, man. Well, thank you a ton for doing this, because you're opening up my eyes to ways that agents are being worked with that I, obviously, I've talked to Omar before, but I love seeing you take that inspiration and then build on top of it and say, here's, as you mentioned, here's the open paper. Here's what we can standardize against. And now what can we build with it?

Vikram Rangnekar [00:50:23]: Exactly. Yeah, that's, you're right on. Let's, let's like, let's start building. That's basically it. You know, I'm there to help. Let's, let's build all these things. Let's fill into that valuation.

Demetrios [00:50:33]: Yeah.

Vikram Rangnekar [00:50:34]: Let's prove goldman wrong.

Demetrios [00:50:37]: I love it.

+ Read More

Watch More

Modern Data Science with Vaex: A New Approach to DataFrames and Pipelines
Posted Apr 15, 2022 | Views 841
# Pipelines
# DataFrames
# Vaex
# Vaex.io
# Tiqets.com
Combining LLMs with Knowledge Bases to Prevent Hallucinations
Posted Jul 06, 2023 | Views 509
# LLM in Production
# Hallucinations
# Mem.ai