MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Making Your Company LLM-native

Posted Oct 04, 2024 | Views 291
# LLM-native
# RAG
# Pampa Labs
Share
speakers
avatar
Francisco Ingham
Founder @ Pampa Labs

Currently working at Pampa Labs, where we help companies become AI-native and build AI-native products. Our expertise lies on the LLM-science side, or how to build a successful data flywheel to leverage user interactions to continuously improve the product. We also spearhead, pampa-friends - the first Spanish-speaking community of AI Engineers.

Previously worked in management consulting, was a TA in fastai in SF, and led the cross-AI + dev tools team at Mercado Libre.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Being an LLM-native is becoming one of the key differentiators among companies, in vastly different verticals. Everyone wants to use LLMs, and everyone wants to be on top of the current tech but - what does it really mean to be LLM-native?

LLM-native involves two ends of a spectrum. On the one hand, we have the product or service that the company offers, which surely offers many automation opportunities. LLMs can be applied strategically to scale at a lower cost and offer a better experience for users.

But being LLM-native not only involves the company's customers, it also involves each stakeholder involved in the company's operations. How can employees integrate LLMs into their daily workflows? How can we as developers leverage the advancements in the field not only as builders but as adopters?

We will tackle these and other key questions for anyone looking to capitalize on the LLM wave, prioritizing real results over the hype.

+ Read More
TRANSCRIPT

Francisco Ingham [00:00:00]: My name is Francisco Ingham. I'm the founder of Pampa Labs, and I like my coffee with milk and with sugar and warm. Not too hot, not too cold.

Demetrios [00:00:13]: Welcome back to another MLOps community podcast. I am your host, Demetri-os. Today we talked about what it means to be LLM native. We got into it with Fran, all about this very topic, and I feel like there were a few tangents we went on, but we kept coming back to why it is so important. Let's get into this conversation with Francisco, and as always, if you enjoy this podcast, just share it with one friend. That's all I ask. For those that are watching on camera, I am looking pasty white because I put on zinc sunscreen now that I am under the sun, and that means it doesn't come off that easy, as I just found when I tried to get ready for the podcast. I'm joined here by my man Francisco, who is a self proclaimed literature geek.

Demetrios [00:01:11]: And since you're living in Argentina, as they like to call it, I want to ask you, have you been reading much, Gabriel Garcia Barquez?

Francisco Ingham [00:01:25]: I would say I've been reading more of Gabriel Margas. Yeah, he's kind of the representative of argentinian literature, but I really also enjoy just international good writers, like English, Russian, Germane.

Demetrios [00:01:44]: Oh, yeah.

Francisco Ingham [00:01:45]: But, yeah.

Demetrios [00:01:46]: Wait, so what got you into that?

Francisco Ingham [00:01:49]: It's one of those things, you know, like, kind of like a soul match where you. It just resonates with you. Some authors just speak at a level that really talks to your soul. And it's funny because, like, I very much enjoy language in all its aspects, but the way you use language to write poetry and literature is very different than when you process it in a. In a chatbot. Right. So it's very. Two different, very different worlds.

Francisco Ingham [00:02:16]: And I try to keep them very separate because one is like, you know, sacred, and the other one is like, more utilitarian.

Demetrios [00:02:22]: Yeah, you don't want those chatbots to tate all of this beautiful poetry that's been written for centuries.

Francisco Ingham [00:02:30]: Yeah, no, I can't handle it when they, like, show in Twitter, like a poem written by. It's like it doesn't get even close.

Demetrios [00:02:38]: It doesn't work. It does not work like that. So you've been talking a lot about becoming an LLM native company. I wanted to ask you first and foremost, what does that even mean?

Francisco Ingham [00:02:53]: Yeah, that's a great question. It's a term that I've been thinking a lot about. And what I think, I mean, I think we, as a community need to come to terms with, with what it really means for all of us, because there can be more than one definition. My definition is a company that understands how to leverage LLMs anywhere that they fit. Basically, some people use LLM native to talk about products. So, like, products that have LLMs integrated in a seamless way. And some other people talk about LLM native. People that can leverage LLMs to do work better and have their day to day enhanced by using LLMs in the right way.

Francisco Ingham [00:03:44]: I think companies should be doing both. So it really encompasses both sides of the spectrum for me.

Demetrios [00:03:52]: Yeah, dude, it's funny you mentioned that, like try and plug in LlMsdev wherever they fit, because I've been thinking a lot about how you have certain jobs that have different pieces of output that they need to give. Like maybe it's the marketing department, they need to be producing content, and some of the content can fall under case studies. Some of the content can fall under blogs, others can fall under podcasts. And I'm not saying that you just go to chat GPT and you ask it to write you a blog about XYZ topic, because that's not going to get you very far. But if you know what your output you need is, then you know every like you have a process for how you create that output. Are there places along the production line that you can plug in an LLM? Maybe it's the research for the podcast, maybe it is uploading, maybe it's even listening to the podcast. The transcription and being able to give you suggest good titles. All of that are ways that you can plug in the LLM.

Demetrios [00:05:08]: But if you have the LLM do the podcast for you, some people are trying it and cool, but it's not my favorite way to listen to podcasts. Right. So I really appreciate this idea that you're talking about of find where you can plug it in and then use it in that place.

Francisco Ingham [00:05:27]: Absolutely. That's very important for me. I think you can fall to both sides. You can over implement LLMs, or you can under implement LLMs. And I think there is a sweet spot and it takes some experimentation. Sometimes you're trying to fork, sometimes you need to force it to just see if it actually works. And then you say like, no, it's more work using alarms than doing it myself. That's when you know you need to back off.

Francisco Ingham [00:05:58]: But I think, broadly speaking, any kind of creative work and decision making is at least has a very great delta in quality if a human being does it, at least, and I would venture to say, in most cases, Alan cannot do it well at all. So you can have a generated content, but it will always be low quality. People will like, the reason why we listen to podcasts and like the good podcast and read books and everything is because we want to find that those unique insights that are not obvious and are not like, you know, a statistic representation of a training set. We want unique perspectives on things, and we, and the way we choose the podcast we want to listen is like we kind of vet the people that we think have these unique perspectives. So I think that must stay and we must enhance that so that people can do more podcasting or more decision making and more creative work and less operative stuff. So, like, broadly speaking, anything operational is where I think we should all have this callback inside of our minds that says like, hey, this is a place where lms at least could be applied. Let's see, this is a place where if I apply LLMs, it actually saves me time. It doesn't take me more time.

Demetrios [00:07:19]: Sure. Have you seen different places that are surprising in your own life where you recognized, oh, I could probably figure out how to use an LLM for this.

Francisco Ingham [00:07:33]: What we do at Pampa with a team is like we purposefully force in some places and we see if it sticks. So what we do now is we're doing like an office agent that can help us do random stuff that we do every day. So just to give you an example, we order food every day, and we need to ask everyone like, what do they want? So like we're building this WhatsApp agent that everyone can write to, and then the person that's going to order just asks. It gives the list of what it needs to order, and then we send them to a restaurant. We're also evaluating having the other right to restaurant directly. So these kinds of things, or for example, when we have expenses, that people spend something for the whole team, and then we need to do that split wise thing at the end of the month, same thing. We have this agent, we write all the expenses, and then we resolve that at the end of the month with another chat. So these kinds of things is places where we are actively thinking of, could this be done with NLM? And then when we try it out, we see if it's actually more work or less work for us.

Francisco Ingham [00:08:38]: But sometimes it's actually very useful.

Demetrios [00:08:41]: And when you say agent, and also in these use cases, couldn't that be done with just pure automation and heuristics? So explain to me why you need to plug in an LLM in there.

Francisco Ingham [00:08:56]: For many use cases, you can do automation if the input is structured enough. If you make a UI where you have select how much money you spend, which were the people involved, which is what the apps do, a split wise will do that. Definitely you can do automation. You don't need an interpretative layer there. The only advantage there is being able to just send a voice note, an unstructured message, and not having to spend time typing and selecting the right things. But it might be, in some use cases it might be like a small delta compared to the work it takes to actually build it and maintain it. But some other use cases, for example, ordering it from a restaurant, it might be harder because sometimes the menu changes. So you need to keep a track of the menu of all the restaurants around to be able to select.

Francisco Ingham [00:09:54]: Actually, maybe you could write it down. So yeah, I think it's mainly the convenience of being able to write unstructured information from the platform that you're already using for everything. So the cost of actually interacting is very low because you don't have three apps for three different things and you don't need. In our case, we are tech savvy and we like technology, but some people, they don't want to be learning how to use each app again and again with all this massive set of features. So it kind of streamlines all your interactions with utilities in a natural language input. You can do it with automation as well.

Demetrios [00:10:38]: But it's interesting to think about too, that you mentioned the delta there and how you are able to quickly spin something up that uses LLMs versus if you were to do it with automation, maybe it would even take you longer to figure that out. Can you talk to me about how like your development process now throwing something to an LLM looks like, how does that whole development go? And then how do you ultimately make the choice? Like, is this worth it? Is this not. I don't know.

Francisco Ingham [00:11:13]: Yeah, so the development process, there's many levels to how we can discuss this. A big part of it is just like having an agent that has tools and those tools like our structure output, and the structure output is processing in the backend, and that does things for you. So basically write read operations in these use cases that I've been discussing and streamlining all of that into one agent that has all these tools instead of having different apps. So that's a basic architecture of how an agent like this looks like. But then in terms of how we develop, we also try to use LLMs to develop. Like many developers are doing and try to take that to a limit. So being cursor power users and try to spend less time writing code and more time thinking about what the right abstractions are and what actually do we want to build. And then once you have, this is an example of a proof of concept or MVP that we did for ourselves, but once you're actually doing something, some real development that needs some assurances in production that many people are going to be using it, and you need quality, then you have this whole data flywheel evaluation, which you probably discussed this a lot of times, but we try to be progressively more systematic in the way we do our evals.

Francisco Ingham [00:12:40]: So depending on the maturity of the product, we start with a typical vibe checks. Then we do a very small evaluation set that is integral to the whole solution. And then as this grows and it scales and we have more evidence that there's demand for something like this, we start doing evaluations for all the different modules within the solution, which can or may or may not include LLMs in terms of the evaluators. One thing that we have as a concept internally is that we try to do evaluations as much as possible that don't include LLMs. Like we try to have evaluations be non stochastic as much as possible. So for example, we would evaluate the input of a tool, right, which is a strict thing, except if it's a query like rag, in which case you need to do either semantic similarity or an LM. But many tools, you just need to have the right parameters given an input, and then you don't need an NLM to test that for the whole system, you usually need an NLM because it's just text in the output and tutorial text, you kind of need an LM. But we can do a lot without an LM by evaluating the different modules in the application.

Francisco Ingham [00:13:51]: But for something that's more mature, when we have something we're playing around with, we don't evaluate, we just do biochecks and we try it out ourselves. We are our own users.

Demetrios [00:14:00]: What is the point? How do you know that this has gone from vibe check to now being we need to actually point the evaluation guns at this.

Francisco Ingham [00:14:14]: I think it is very correlated to optimization. So when you have some scale, I see it as it is worth investing in something proportionally to the evidence that you have that is actually useful. And it's actually settled that that's the direction you will follow. Because sometimes when you're starting out and you're getting to know your users, you are kind of iterating on what the finer architecture of your solution will look like, because you don't really know what they need. So at that point, you don't have a lot of evidence. And so like investment in your evaluation, sweepdeh doesn't really make sense to be so large because it might change. And if it changes, then the validations you made for your tools might not be worth any more, might not be useful because the tools change themselves. So if you start getting some traffic and the incentive to have an optimized solution.

Francisco Ingham [00:15:03]: So for example, you're changing your model, or you're trying to reduce your prompt or refactor, a prompt into smaller sub edge when cost or latency starts becoming a thing because you want to do the same utility, but in the scaling of the solution, you need to start having more strict requirements. That's usually the point where evas start being important, because every optimization effort that you do, you need to measure, and it's very hard to measure many changes with just five checks. So it's basically like the required, the requirements for optimizations come kind of hand in hand with more strict evas for us.

Demetrios [00:15:44]: And do you have a way to track all of these optimizations that you're tweaking?

Francisco Ingham [00:15:52]: Yeah.

Demetrios [00:15:54]: Or really they're experiments to try it, optimize. Right. So you're experimenting with many different things, and then you want to be able to know that, okay, when all the knobs are turned at these different degrees, we get the best output.

Francisco Ingham [00:16:10]: Yeah, yeah, absolutely. So it is basically necessary to have an evaluation set to be able to track whether an optimization effort was successful, because you need to be able to make the claim that regardless of whether the cost went down, the latency went down, the accuracy stayed mostly the same. So the only way you can really say that to a stakeholder that is not part of engineering team like a manager, is that the accuracy was not affected. And you need to show that. So you need an evaluation set, at least an end to end integrate with what we could call an integration test for the whole system, where you say, these answers seem similar to what we handed for, and now we can do this at a lower cost based on this architectural change, or this decreasing the prompt or whatever. The way we would track this is. We usually use Lansmith just because we are partners with a team and we like product a lot. So we would have a dataset with input outputs.

Francisco Ingham [00:17:10]: We would run the different experiments against a dataset within a notebook. So we would have different notebooks where we track the experiment and we have the metric at the end, which is also tracked in the UI. So we would keep these notebooks as records of experiments, and if we can keep the successful experiments where we can keep the accuracy steady and we managed to decrease cost or latency significantly.

Demetrios [00:17:35]: Yeah, but so there's, when you're dealing with these agents, there's different steps along the whole end to end system. Right. And you were talking about being able to play with tools and then going out and doing things. If you are changing the settings on like the tools or the way that it interprets the tools and stuff like that, is that still captured in the notebooks on those experiments?

Francisco Ingham [00:18:07]: Yeah, absolutely. So that depends on how granular the evaluation sets are. Right. Because the first step for us, we go from the major to the minor. So first thing is like a buy check to just like, it's very low effort and, and high value, I would say, because you can kind of tell at least in the 80% of happy path cases, it's not very good to test edge cases because you might have unknown unknowns and you must miss stuff. But on the street, normal flow, you can test if something is seriously wrong by playing around with it. So that's high value, low effort, then having full end to end evaluation test, which we run every time we make a change, that's also maybe medium effort, but also super high value because we start to test these edge cases and we consider these in the data set. So it raises a flag when some edge cases broken after a change.

Francisco Ingham [00:19:10]: And then the last point is like high effort. And I would say depending on the mature, it can be high value, but only if you have a mature product. Then we start making these small evaluation sets for each tool where you can test the functionality, like what would be a unit test of that specific component that you would have in the notebook. Then you make a change to your agent and then you would evaluate on all this small parts of your system, plus the whole integrated system.

Demetrios [00:19:47]: So this is fascinating. And that comes last once you've recognized. I really appreciate how you are saying, don't spend all this time on things that you're probably going to throw out anyway. So just do the vibe check, make sure that it's running and then slowly peel back the onion.

Francisco Ingham [00:20:07]: Yeah, absolutely. So basically invest in proportion to the evidence that you have that your architecture is going to stay that way. And then you start having more regularity in the types of analysis that you can make on the performance of your system. Because you can tell like if it failed, where it failed, right and go directly there. Yeah.

Demetrios [00:20:30]: So you did mention something before that I want to keep, like pulling the thread of, which is you try not to do LLMs as judge. And it's funny because I literally was reading Eugene Yan's summary of like 40 different LLMs as judge papers and all the different ways you can use LLMs as a judge. And you're saying like, yeah, the conclusion is, if you don't have to, don't use it.

Francisco Ingham [00:21:00]: Yeah. So I think this, this kind of, I mean, this, this extends, this is a more general conclusion, which, which is, if you can, like, if you can do it without an LM, it's kind of paradoxical. But if you can do it without an LM, do it without an Ll. It's kind of contradictory, because before I said, like, try to use LLMs wherever you can. But the point is, the places where you insert LLMs must be places where it will be hard to do that without an ll. But now with LLMs, that's possible. And it was, it was not possible before. So, for example, structuring and structure and input, that's something that you need an LM for.

Francisco Ingham [00:21:36]: And it helps you automatize a process, and you couldn't do that before. So that's the sweet spot where, okay, you would get a lot of value out of inserting an LM. And you pay that stochasticity cost of knowing that these results can be probabilistic and not always work according to the plan. And you can obviously make a lot of verification. And I take a lot of measures to have quite a bit of assurance of what that output would look like. But at the end of the day, even if you use structural output, the values themselves are stochastic. So you have some kind of uncertainty of how that will work in each new interaction. But the reason why we try not to use LLM as a judge were possible is that it has this stochasticity aspect, which doesn't give us a clear view of whether the system is working correctly or not, because it changes with each run.

Francisco Ingham [00:22:35]: So you don't have this stable measurement of performance where you say, like, okay, you can do a sound called Monte Carlo analysis, where you run many times, but even then you need to prompt the NLM correctly and you need to evaluate your evaluator. So there's a lot of meta work to make, to work appropriately, which sometimes is necessary. And that's where it fits for me. If you can't do it without, for example, when you have an end to end agent, that has a lateral language answer to the user. It's very hard to test that with just rules. So there NLM hash would make sense, but only there if you can avoid in other parts of your system to have LLMs evaluate your input as much as possible. It gives you much more control over your evaluations. If you have rule based evaluations and.

Demetrios [00:23:26]: Are you doing any specific kind of LLM as a judge when you do use it?

Francisco Ingham [00:23:31]: I'm honestly not familiar with all the kinds there are, so I'll tell you kind of what I usually try to do, which is like basically having either I read the blog post, I like the fact that having a binary output instead of like a gradient of real numbers. So I like that approach of having this is the ideal answer. This is what answered is this a correct match or which of these two is a better answer? Like that type of comparison is interesting to me and I like it. And the other option is just like having a reference answer and having some things that you want to assure the style is similar or the answer has the same content, broadly speaking, and having measurements for each of these things and having different evaluators for each of the aspects you want to analyze, which are not strictly derivable by rules. So depending on the use case, we will do one or the other. And I don't have like a strict position taken on what the best way of doing it is. It depends on the use case.

Demetrios [00:24:50]: Nice. So what other kind of shit are you using? Agents, for?

Francisco Ingham [00:24:55]: One thing. So this is like what I'm telling you about, trying to force agents and see if it works, if it sticks. One thing I've been trying and failing at is like we do a podcast every single week in Spanish. We're like the spanish AI engineer community. And I've been trying to use agents to create presentations. Like, I've tried every single solution out there, like Gamma and like slides IO, which integrates with Google Slides. There's a few of them, they haven't been helpful for me, but presentations for what?

Demetrios [00:25:27]: For your podcast. Like after you've created the podcast, then you want a presentation from the information.

Francisco Ingham [00:25:34]: So it's not so much as a podcast, but like a live stream where we from Pampa, we explain a specific concept and then we open up for Q and A. So it has a presentation part, then it has a demo part, and the presentation part is using slides. So many times I have the content in my head and I don't want to design. So I'm just trying to find something that will give me good design out of like the structure that I already know I want to present and I haven't been, haven't been successful for that at least yet. Another thing that I try to do is use raycast, as you know. Raycast.

Demetrios [00:26:15]: No, what's this?

Francisco Ingham [00:26:16]: So Raycast is like an all around Mac tool that has a lot of shortcuts for Mac, but I think if you're paying dollar ten a month, you can get AI integrated. So in any app, slack, email, whatever, you can have your own custom shortcuts of fixed grammar or make more formal, and you can make your own l and M calls that map to a certain shortcut. So for some members of the team that want to improve the English, for example, or something like this, or when you just don't want to structure everything super specifically and you just want to jot it down and then have a certain structure, I like using LLMs with humans in the loop and not just expecting an end output because it doesn't usually do a great job at giving you the final output. It can get you 70% of the way. I like to jot down things and then have LLM structure them and then send them. For example, another topic I am very passionate about is these solutions. We call it internally low command tab solutions. You don't need a lot of command tabs to get what you want.

Francisco Ingham [00:27:29]: So not changing screens, I am writing something slack, I can jot it down, I do a shortcut, it changes it right there. I don't need to go to chat GPT and then I can just press enter or cursor is a great example as well, where you have everything that you need to move from there you have every single functionality, chat, autopilot, copilot, everything there and then you can just push. So that's another good example, Raycast.

Demetrios [00:27:56]: So that one's fascinating to me because I remember when chat GPT came out and people were saying like, this is going to change the world, right? And there was some folks that were talking about how there needs to be a new operating system, an AI operating system, and I didn't really understand what that means and I still do not understand what that means. But then I saw some people talking about how chat can be your mission control. And if you have the agents that you just are in the chat all the time and the agents will go and do what you need them to do, I don't think it's reliable enough for that yet. And I also wonder if that whole assumption is flawed, where we're still going to be doing what we're doing in our specific places that we like to do them a la slack or cursor, but the AI will come, or the chat will come to those places, as opposed to we're in the chat telling, firing off commands of things that need to go and happen behind the scenes.

Francisco Ingham [00:29:09]: Yeah, that's a good point. I myself, I think we all enjoy seeing things and having interaction with things, so I don't think that's going anywhere. Just having a wide chat screen feels sad to me in some way. It doesn't feel like a good experience, whereas having this more integrated solution where you have chat but you have lots of other things within the UI feels less of a compromise and less of a trade off and more of like an enhancement. Right. And I've also seen the OS thing. I think Karpathy tweeted it back in the day. I don't understand it, but I think it's like when Karpathy says something that you don't understand, it's obviously you.

Francisco Ingham [00:30:00]: You are the product. Yeah, I really like, you know, having chat within whatever I'm doing, which is like the case with cursor. Right. I'm coding, I have an ide. Ids are still, you know, a great thing, but I have an LLM enhanced ide. So that's kind of also like the LLM native concept for me is whatever you are already, you can be an LLM, enhance that. Right? So, for example, this applies to traditional industries and for professionals. So it doesn't mean like if you do marketing that your marketing knowledge is obsolete, that the opposite, it's worth more than before.

Francisco Ingham [00:30:41]: But now you can power that knowledge and get farther with that domain specific knowledge by using ln apps. And the same thing with, for example, a traditional company. If you have users, and the users love what you do, you can serve them better because you know them better than other people and you know how to apply LLMs to serve them. So, like, domain spread domain specific knowledge for me is like at an all time high in value if, and especially if you use LLMs to leverage that domain specific knowledge, be it like, from the job side or from the product side. And this goes kind of against the narrative of, like, you know, every job or every product is going to be replaced by, like, LLMs. I don't think that's the case. It's all about experience. And the experiences that have LLMs integrated in the right way will be enhanced, but not replaced, I think.

Demetrios [00:31:37]: Yeah. My buddy Marcel likes to make the point that you still need to know what good looks like or you still need to know what great looks like. And so it's, it's basically saying if you're an SEO expert on the marketing team and the LLM does stuff for you and it gives you something, if you don't know what great looks like, then you're kind of sitting there and you're like, well, I guess it's good, let's ship it. But if you know what great looks like, then you're going to be able to say, okay, we need to optimize these h one tags. It looks like we didn't get the metadata right on this. All those things that the SEO folks like to talk about, then you're going to see that and see where it's okay and where it's not okay, and then add your added value. But like you said, you get 70% of the way there, hopefully.

Francisco Ingham [00:32:31]: Absolutely, yeah. And I would add to that, like, AI engineers don't know what a good SEO result looks like. We don't know what a good marketing result looks like. We don't know what is a good experience for the user in the construction industry. We don't know. So it's not like we can just automate away everything. These systems need direction, and the direction comes from knowing your users very well, which is the case with traditional industry. So that's where I see this integration happening with AI engineers, but very, very close to the domain experts, right?

Demetrios [00:33:06]: Yeah. Do you. Cause your background is in data science or. It's.

Francisco Ingham [00:33:12]: I studied economics in college and then I quickly moved on to like deep learning because I became fascinated with it, but I just have a hybrid background. Yeah.

Demetrios [00:33:22]: And in your eyes, what is the difference, if any, between like a machine learning engineer and an AI engineer? Is it just a different name for the same kind of thing?

Francisco Ingham [00:33:35]: There's been a fair amount of debates on this topic about what really are engineers, if they are something. I do believe LLMs change things materially. It's not just another algorithm because it is an automation opportunity for everything that existed before. It's not something that is a module, it is transversal, like a capability. So the only thing I see different from pay engineers, which I would call the way I see it, and I don't like to get stuck to the terms. So, like the way I see it is you have different job roles, you have a data scientist, you have a machine learning engineer, like the traditional job roles, and you have a human resources expert. And all these jobs have material expertise in a certain topic for example, data scientists need to do good evaluations and needs to have a good experimental process because that's your bread and butter, regardless of which algorithm you're using. So that stands apart from whether nns exist or not.

Francisco Ingham [00:34:44]: And then maybe like what an AI engineer would be to me, and again, this is probably a definition thing, is somebody that can do AI in an LLM native way. So an AI engineer would be able to include LLM features in a product that they're building, and they would be able to leverage LLMs in their own development process. So, but I don't see, like AI engineers per se being like a job. For me, it's more like an enhancement of maybe like a data scientist, right? A data scientist that knows, like the traditional machine learning topics like evaluation metrics, right? Experimental process. And they enhance that with LLM. And it's like a 2.0, you know, LLM native data scientists. Yeah, because that's an explanation topic.

Demetrios [00:35:41]: Well, it's fascinating you say that because the data scientist being on the hook for evaluation and experimentation we were just talking about earlier, right. When you're building out your agent, one of the things that you're looking at is how do you experiment to bring down costs or latency, and how do you evaluate the effects of that experimentation? So it's. Yeah, it's fun to think about these things, because I know this topic has come up quite a bit.

Francisco Ingham [00:36:16]: Yeah, absolutely. And I think really, in terms of evaluation, things have not changed that much. Maybe the metrics change, and you now need to understand how to have an LLM as part of your evaluation.

Demetrios [00:36:31]: Right.

Francisco Ingham [00:36:32]: Like as an LLM, as a judge, that's a new thing. But evaluation in itself has always been something you needed to do, and that's part of the data science process from forever. So having good evaluations, understanding what a training set means, what an evaluation set means, what a test set means, how many examples you should have in different evaluation sets, covering different aspects of how your solution might be used, different types of inputs, and trying to get the ground truth for those inputs. That kind of thinking is basically science, right? It's a science mindset, or scientist mindset, which doesn't really change. But what changed is that some modules of those evaluations behave differently. So you need to adapt and understand the stochasticity behind them. But I see this a lot. Like some people that come into AI now, and they are prompt engineers, or AI engineers.

Francisco Ingham [00:37:39]: They have no background in doing science for a productive environment. So that kind of rigor is important, especially in a maturity stage of a product, and you either need to learn it or you need to bring that knowledge from traditional machine learning experience, because it's not enough to just prompt and throw an ILM at your outputs and wish for the best.

Demetrios [00:38:07]: You sound like you're talking from experience. You've seen the worst of them.

Francisco Ingham [00:38:11]: Yeah, I mean, it's understandable. Like, yeah, now without a lens, we can build whatever. The bar is super low, everyone can get in, everyone can build a UI in one day. That's all true and it's fantastic for pocs. But then when you start making these solutions that actually need to handle real users, and users will be using all this in whole settled, unexpected ways. You need some systematic process for improving your solution and finding improvement opportunities as well.

Demetrios [00:38:42]: Yeah, it's the difference between you having that agent that takes your food orders just for your slack versus creating an actual slack bot that you put on to the slack marketplace. Let anyone use that.

Francisco Ingham [00:38:59]: Yeah, and like, I love the scrappy, scrappy attitude of like, you know, let's hack and you know, let's break things. But that's like different mindset that you have when you're doing a product for like, you know, an enterprise or like big company. It's very different. So I like having like my place to play, which is like the agents we use, not like translating that mindset to everything, you know.

Demetrios [00:39:28]: Well, so talk to me. When a client comes to you and they're like, all right, we need to do something with AI because I imagine they just come to you and say, we want to do AI stuff. Do they come with that broad of a vision or do they generally have it mapped out in their head? Like, hey, we've got this use case and we think we could plug in some kind of LLMsdev.

Francisco Ingham [00:39:50]: It depends. Sometimes it's more of a blank slate with a visual. Where we would like to build this, we don't know where to start. In that case, usually what you do is a PoC MVP. Throw it to some users, start learning from the users, and start working your way towards a mature evaluation suite. Like we discussed earlier. Sometimes it's already something that is production. You started to scale, and then when you need to scale the requirements, the technical requirements for a solution are different.

Francisco Ingham [00:40:23]: You need more expertise in the table. So in that case, it's more about optimization. That is zero to one. Usually clients do have some background in terms of, they have people in the team that already have been researching into LLMs. They've been playing around with them themselves. They have background and they want to be able to prioritize correctly and understand what is actually like a good thing to add to a solution versus hype. A good example for this is Rac. Like Rack has an infinite assortment of variations and stuff you can do with rack, super fancy things.

Francisco Ingham [00:41:10]: But sometimes when you have a rack pipeline, it's more about understanding the nature of the data and thinking from that data. What needs to be done to reach. You need to get deep into the use case. It's very hard to apply boxed solutions. That's just like an API that you can hit and hand a great drag. So many times it's more about exploration on the use case, understanding what exactly is the problem that the client is trying to solve and then diving into. And that's also like why consulting makes sense. Because if not like you would just like buy a product.

Francisco Ingham [00:41:43]: And there's many startups doing product for every single sort of thing. But many times you need like to get good performance, you need custom software, basically like doing things tailored to your solution.

Demetrios [00:41:58]: And can you explain the different spectrums that you see when it comes to rag on how you've done certain things for some folks and once you've gotten in there, got intimate with the data you recognize, okay, we should probably do this with this type of use case or with this type of data versus a different option or a different data use case, you need to take an entirely different approach.

Francisco Ingham [00:42:29]: Yeah, it does really depend on the use case. Sometimes when you have like just documents and you want to answer a question based on documents, there's like all these questions about what are you embedding? Are you embedding the title? Are you embedding the content? What part of the content are you embedding? A summary. Like all these options are in the table. What similarity, like what embedding space do you have? Which is the model creating your embeddings? Is this more than a custom model or is it like a pre trained model? And all these kind of questions relate to what's going to happen when that query gets embedded and retrieving similar vectors, like that similarity will be defined by the priors that your solution has in terms of how did you cut your documents, what did you embed and what your true. So asking those type of questions, if you have like a traditional rag pipeline is the first thing. And then if you start putting from there, many times you end up doing stuff that actually has a name, but you don't know, like early, in the early days I was doing rag and I didn't know what rag went. And when people spoke about rag, I was like, what? What are they talking about? And then I realized it's like, oh, yeah, it's just like retrieval. So it's funny how like all these terms and all these like, super big techniques come out, and it's just like, sometimes it's more of a common sense approach to a certain use case that's, that's, that's more for, like, retrieving the right documents.

Francisco Ingham [00:43:54]: And then another big use case, which has different requirements is extraction. So, like, if you have a very large document and you need to find that right paragraph or that piece of information that you're looking for is that is, it's different because the input is determined. You know what you're looking for. It doesn't depend on what the user says. You can have a certain amount of options, but it's usually a limited list of data points that you want to obtain. And in that case, you can do much more science in terms of what are the. You can even do rules, specific keywords, hybrid search of trying to get really good at those finite list of terms that you try to find to extract the right values, the right data points. So that's like a different approach to a different use case.

Demetrios [00:44:48]: First of all, I appreciate you breaking this down because 100%, like I was going to go down the route of, oh, I thought you just meant what kind of embedding model you're using, what kind of chunking strategy are you using? But that's just one side of the spectrum where all those questions come up. Then you go to the other side of the spectrum and basically you're saying it's a search problem, more. Less than a retrieval problem. This is just like, hey, we're doing search and you're calling it AI.

Francisco Ingham [00:45:21]: Yeah, absolutely. And I have a background in ranking and retrieval because I worked in recommendation systems in Mercado Diuret, which was my previous company.

Demetrios [00:45:35]: Shout out to Leetox, who's been on here a few times.

Francisco Ingham [00:45:38]: Absolutely. For Litox and for Linia, Lina and Javier.

Demetrios [00:45:44]: They all, the Mercado libre team, they were instrumental in the early foundations of the mlops community. And I think that's how we sparked up our conversation when we met, isn't it?

Francisco Ingham [00:45:55]: Yeah, yeah. And I was part of our team as well. So it's interesting how it all crowds together. Yeah. For me, the first, when I started thinking, because it all came from embeddings, I worked a lot in embeddings in Monca Vale back in the day. So when I started understanding like embeddings were a key part of many of the, of these agentic systems, it all automatically ran a bell of everything I knew from, from those times, right, vector spaces, latent spaces, custom embeddings, fine tuning when it makes sense, when it, when it doesn't, and how you build the, how you purposefully define your latest space based on your problem which has different variables that you can pull from. That was what really stuck for me. Ok, now this is the field we're playing on.

Francisco Ingham [00:46:46]: Various LLMs are outside and now we're talking later space. Let's make this work and then let's see how it interacts with the LLM. The only part that DLM is involved in is how it creates the query, which is basically prompting. But once you understand the types of queries that you want, you can treat the rack system as like a separate part, right? And all these traditional retrieval questions are exactly the same. You're trying to find an idle and a hellstap. That's it.

Demetrios [00:47:20]: Dude, that's fascinating to think. Just like it's almost all the same foundation, but then when you go to rag, you've got the prompts, when you go to recommender systems, it's just a bit different, I'm not gonna say abstraction, but like a little bit different spice involved when you're cooking up the different use cases.

Francisco Ingham [00:47:44]: Yeah, maybe recommender systems, the input is given by end user directly, whereas in LLM rack pipelines many times the query is an ungenerated. So there is a probabilistic aspect between the user and the retrieval system, but you still have an undetermined input that needs to find the right needle in a haystack. I think it's actually an advantage to the fact that you have an item in the middle because you can clear whatever the user did in a very untidy way, which is how users myself, how you use terminologies, you're untidy if it works. So you have that adaptative layer where you can clean whatever is coming in the input, versus in traditional recommender systems you just have what you have. This is the input that the user told you if they missed a keyword or something, you are, you're stuck with that. If it's misspelled, whatever. Sometimes obviously semantic models can account for that, but sometimes having a cleaner input helps to make good retrieval. And also it helps in terms of evaluation, because with the LLM you can control what type of input you get and then you can build a better evaluation set with that in mind.

Demetrios [00:49:10]: Well, it's funny too, you mentioned that in this specific example you're saying the user is giving you information where they're probably typing in a search bar what they want. And let's say that it's at Mercado libre and I'm looking for some new high heels. Not for myself, for my wife of course. And then I get a bunch of recommendations that are saying like maybe I want blue or maybe I want stilettos, whatever it may be. But I find it fascinating too with the recommender systems that you don't really get in the rag. Again, this is because it's a different spice, but the engine is the same. In the recommender system you can use these features of, oh, hey, he hovered over that picture for a little bit. We should maybe show a few more of these blue stiletto heels.

Demetrios [00:50:08]: And oh, he also was asking about whatever turquoise dresses. Maybe we should try and show him a turquoise heel that will go with that. And again, not for myself. I don't like to wear stiletto heels or turquoise dresses. I just hang out with three women because I've got two daughters and a wife.

Francisco Ingham [00:50:33]: Yeah, that's an interesting thing to consider. And I think specifically now how e commerce recommendations systems will work with LLMs on the table. It's going to be a very interesting development in the field. I'm already seeing stuff that I really enjoy.

Demetrios [00:50:55]: What have you been seeing?

Francisco Ingham [00:50:56]: So like, you know, the shopify the agent that you can just ask. So what I really like is you can have these fuzzy filters where you can say, yeah, I like it a little bit. A darker red. There's been papers written on this before the LLM era, which I like because I really enjoyed the ecommerce problem like fashion clip that was being able to transverse the latest space with these kind of latent variables, how dark the color is, for example, which is not something you can filter for obviously.

Demetrios [00:51:32]: Yeah.

Francisco Ingham [00:51:33]: So those kinds of queries start to feel a lot like a real sales experience versus just like an e commerce site. So that's super interesting to me. And that really, as you were saying, what you can use to find the right product is like a very wide spectrum of multimodal data because you have all this navigation information and the hover information, everything can be part of whatever you do to retrieve the right products. So there's a lot of information that you need to prioritize how you're going to incorporate into your system in recommendation system. So it's a very interesting problem.

Demetrios [00:52:17]: Yeah. I've also heard folks that are using LLMs to help them get different feature ideas for their recommender systems, but I haven't. So I look at that like as during the training part of the whole recommender system. I haven't seen anyone quite marry the LLM upfront for the user where you can just talk to it. But now that you're talking about this shopify agent, I do like that idea where you're talking to it. And so if I'm understanding it correctly, you're getting shown stuff and you can say, yeah, I want something like this, but with a darker shade or a different color.

Francisco Ingham [00:53:06]: Yeah, yeah, exactly like you can, you can ask, it will show you a few options and you can continue the discussion, which is very compelling for me.

Demetrios [00:53:17]: And this brings up just for me, the thinking around this is very similar to a presentation that we had, I don't know, like six, eight months ago from Linus. And he was talking about the different ways that you can have interfaces with LLMs. And you see it a lot at notion where he used to work. And notion has the ability to click and it gives you kind of like a short list of things that you're going to want to ask because in his presentation he was saying, we've already got the mouse and the ability to click and so nothing all the time do we want to be chatting, trying to ask more of whatever it is that we're looking at? We just want a drop down menu with maybe five options and we click on one of those. Or we can, if we need to continue chatting.

Francisco Ingham [00:54:25]: Yeah.

Demetrios [00:54:25]: And so I find that interface fascinating too, on how. All right, I. If we can get something that is only a click away, it's much easier to click on something as opposed to write out, even if it is writing just one or two words.

Francisco Ingham [00:54:43]: Absolutely. And I think this touches probably the key topic here for any company building with Adams, which is like, you're pulling from the experience, whatever you do in terms of features or your implementations, pulling from a vision of what a great experience looks like. And it needs to be that way. Like you can be like non LLM dogmatic or LLM dogmatic. We want to do everything with chat, or, you know, like, or we don't want to do any, any LLM features, even, even in the backend. Like it's all about like giving a great experience. And that requires like the pragmatism of like having all this assortment of tools and using them to. And it takes a lot of experimentation as well with the users.

Francisco Ingham [00:55:27]: But this is like the path I see. I like to see companies going where they are trying out things and suddenly they find these amazing experiences that were impossible before. But they are not dogmatic in any kind of way. It's just the right combination for that kind of user. That's the layer two of solutions, the foundation layer. We've seen a lot of developments. Now, how do we get these analysts working for 90% of people that don't care about technology and they don't want to know what's going on behind the scenes. They don't want to know.

Francisco Ingham [00:56:01]: Just like, help me solve my problem with a great experience. So that's what I really like to see happen.

+ Read More

Watch More

Making LLM Inference Affordable
Posted Jul 06, 2023 | Views 474
# LLM
# LLM in Production
# Snowflake.com
Building LLM Applications for Production
Posted Jun 20, 2023 | Views 10.5K
# LLM in Production
# LLMs
# Claypot AI
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io