Sign in or Join the community to continue

We're All Finetuning Incorrectly

Posted Apr 08, 2025 | Views 137

# Finetuning

# DeepSeek

# Emissary

Share

speakers

Tanmay Chopra

Founder / CEO @ Emissary

Tanmay is a machine learning engineer at Neeva, where he's currently engaged in reimagining the search experience through AI - wrangling with LLMs and building cold-start recommendation systems. Previously, Tanmay worked on TikTok's Global Trust&Safety Algorithms team - spearheading the development of AI technologies to counter violent extremism and graphic violence on the platform across 160+ countries. Tanmay has a bachelor's and master's in Computer Science from Columbia University, with a specialization in machine learning.

Tanmay is deeply passionate about communicating science and technology to those outside its realm. He's previously written about LLMs for TechCrunch, held workshops across India on the art of science communication for high school and college students, and is the author of Black Holes, Big Bang and a Load of Salt - a labor of love that elucidated the oft-overlooked contributions of Indian scientists to modern science and helped everyday people understand some of the most complex scientific developments of the past century without breaking into a sweat!

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Finetuning is dead. Finetuning is only for style. We've all heard these claims. But the truth is we feel this way because all we've been doing is extended pretraining. I'm excited to chat about what real finetuning looks like - modifying output heads, loss functions and model layers, and it's implications on quality and latency. Happy to dive deeper into how DeepSeek leveraged this real version of finetuning through GRPO and how this is nothing more than a rediscovery of our old finetuning ways. I'm sure we'll naturally also dive into when developing and deploying your specialized models makes sense and the challenges you face when doing so.

+ Read More

TRANSCRIPT

Tanmay Chopra [00:00:00]: Hi, I'm Tanmay Chopra. I'm the CEO of Emissary. We're an AI infrastructure platform for model optimization and I take my coffee. I actually do a double shot latte in the morning and then in the afternoon every day.

Demetrios [00:00:16]: We're back with another MLOps community podcast. I'm your host Demetrios. And today we got into how traditional ML systems can inform and help you level up your game on the new gen AI systems. TenMate is an original thinker in this space and I really enjoyed the last part of our conversation when Tenmai talks about the Pythonic universe and how you really gotta have it when you are dealing in AI. Let's jump into this conversation. I just posted something and it said if LLMs are so smart, why are AI products so dumb? And I cannot take any credit for that because I completely stole it from one of the agent hours talks that we had and it was from Sriu Shankar. But I think that resonates with you, right? What were you saying?

Tanmay Chopra [00:01:18]: 100%. I think that's kind of the big mismatch we're seeing right now, right? We see that these systems have so much potential, but realistically in production we're actually not seeing them utilize any fraction of that potential. You're seeing this now from the leaders of large tech companies saying this is promising, but actually maybe not all of it works. Which is a kind way of saying, yeah, most of it doesn't work right now. And so this sentiment of AI systems could be amazing and AI systems are amazing. We're kind of in the gulf between that right now.

Demetrios [00:01:57]: Yeah, it's so funny that you mentioned that because just yesterday I was talking to a friend who was saying, you know, apps won't exist in a few years, it's just going to be all agents doing stuff. And I'm like, dude, give me a break. I've heard this argument a few times and I really am having a hard time getting on board with it because let's just take my favorite video editing software like DaVinci Resolve. I know where all the buttons are on that that I need to press to get what I need done. I know the hotkeys, I know how I really like the visuals in the color grading, all of that stuff. If now I need to just tell a large language model or language model or diffusion model, whatever model it is, a foundational model that I want you to edit this podcast and color grade it, then at the current rate, I don't think that is possible. And then when I have to debug it. It's going to take me more time than if I just used DaVinci from the beginning.

Demetrios [00:03:11]: So maybe helping me on a few steps is really where I see the value. But actually, like, what this guy was saying yesterday is, yeah, we're just gonna spin up apps. Like the agents will create apps anytime that you need something done. And I'm like, dude, that is, that is very hard to like grapple with, I do not think. And maybe I'm pessimistic, but I don't think that's a reality that we're going to see.

Tanmay Chopra [00:03:40]: Yeah, I think there's this really interesting distinction there, right, between like, what's possible and what's probable. I think it's very possible that this happens. I think the probability of this happening in the next five years is probably not that high. And so that's how we think about it. If you think about ChatGPT now, we're on year three, right? Like, that's kind of the, in the post ChatGPT era, we're on, we're on year three. And if you were to go back two and a half years and say, hey, in 2025, we're not actually going to have a lot of AI in production, we're going to have a lot of funded companies, but we're not going to have a lot of AI in production. Most people would be surprised, right, because when this thing came out, everyone was like, hey, two year timeline, we're going to automate out particular jobs. And what we've really seen is there's two Personas, right? So you have sort of AI as your supervisor and then you have AI as your assistant, right? So the scenario you're describing is AI as your assistant, you're a really good video editor and you're now being asked to use this AI system for better or for worse.

Tanmay Chopra [00:04:49]: And the other scenario is you're a really bad video editor and you're being asked to use AI to just get started, right? You have no idea where to go, you have no hotkeys, you're sort of the 0 to 1 user. And I think CodeGen is a really good example of this. So this is played out in front of me in like real life most of the time. All of the, you know, fascination over CodeGen is coming from CTOs or CEOs, right? It's actually coming from people that don't write code as frequently as sort of a frontline engineer. And I love CodeGen for context. I'm a huge shill huge fan. But now if you go talk to an infrastructure engineer that's dealing with like a genuinely hard infra problem, they have the experience that you're high.

Demetrios [00:05:35]: Right.

Tanmay Chopra [00:05:36]: So given that they're an expert, this system actually slows them down, but it only slows them down in the context of infrastructure. If you ask an infrastructure engineer to do front end engineering, it's actually amazing. They all love them, right? So for the bucket of tasks where you're not the expert, but you have some verification capability, CodeGen is really good and it's taken up so much of that space, but you've really got to think about like, okay, what is that user journey when you're not an expert, so when AI is your coach and then that user journey where you are an expert so AI is your assistant. And in the latter, repeatability, determinism, accuracy, latency, these things matter a lot. An expert is not going to waste their time trying to prompt ChatGPT four times. They'll just go do the job. But someone who knows nothing, they'll take that alpha and actually they're learning in the process. So as the system fails, they're also learning.

Tanmay Chopra [00:06:38]: And so I think that's how I've been looking at these tasks more and more.

Demetrios [00:06:41]: So where I think in my mind, as a user experience type of perspective, that this whole thing breaks down, especially if we are trying to do tasks that we're not necessarily experts in and we don't know all the lingo and we don't know the best ways of doing them, is trying to explain what we want done, that is very difficult. Like I was just playing around with this tool this morning that the whole thing is you create a prompt and it gives you a pipeline, it will create the workflow for you and then you can go in and you can tweak that. Yeah, it didn't really work. And even me explaining it and then going back and trying to prompt it better, and I knew what I wanted, I knew what the workflow was going to look like, but inevitably I had to go in there and create new nodes and create new calls. And so it's this low code, no code experience that I had with the gui. And at the end of the day it's like, yeah, cool, the prompt kind of got me there, but I'm not sure if it really saved me that much time.

Tanmay Chopra [00:08:02]: Yeah, I think this is actually a big problem with generalized systems.

Demetrios [00:08:06]: Right.

Tanmay Chopra [00:08:07]: They're just not aligned enough. And so if you look at a model that's sort of trained on everything in the world. It's a great model for everything in the world, but it's not the best model for trying to do what you're trying to do. And so in an ideal world, right, whatever system was generating that pipeline out of your prompt would have seen hundreds of thousands of prompts, and the ideal pipeline that was generated out of them. Even then, it's not going to be perfect. But your edit distance from first output to what you finally ran is going to be a lot shorter. So there's a couple of things there. The first is inherently AI was or ML old school.

Tanmay Chopra [00:08:50]: ML was about trying to digitize systems and processes you could not explain. And so we are doing a little bit of morphing here, right, of like, what this was meant for. Machine learning was always meant for like, you give me inputs and outputs and I will converge to the right function. So don't tell me your process because there are processes better than yours that I will discover in this larger search space. And so that was kind of why we used ML, right? If you think about all the use cases before, like, I cannot write software to do recommendations. There is not enough lines of code that I can generate that would create a recommendation pipeline for every user in the world.

Demetrios [00:09:32]: Great.

Tanmay Chopra [00:09:33]: That's a machine learning use case. If you physically cannot describe enough of the process, it's an amazing machine learning use case. So we are morphing that by saying, hey, actually now you have to explain your process in the prompt. Don't give me inputs and outputs, give me your actual chain of thought process.

Demetrios [00:09:53]: Right?

Tanmay Chopra [00:09:53]: So whenever you start writing sort of the chain of thoughts prompts. So that's one thing. And this is sort of a more general philosophical thing. There's two others. The first is, is the AI system itself good?

Demetrios [00:10:06]: Right.

Tanmay Chopra [00:10:06]: And then the second is, is it good for you? And those are two very, not very different alignment vectors, but they are distinct alignment vectors. A lot of AI systems are still stuck at the first one. So they're not even thinking about hyper personalization for the user. They're thinking about like, how do we be good? And I think this is coming more and more to the forefront. This was sort of very popular towards 2021 in ML, but now very popular in AI is E Dals, right. And not LLM as a judge. No, knock on LLM as a judge. I've just never seen it work.

Tanmay Chopra [00:10:42]: But the thing is, you need to know what good looks like for you. The first thing we ask every customer to do on our platform is to stop and Think right, Like, tell me what good looks like. Does it mean I have the right files in place? Does it mean I have the right text in place? Like in your universe, what does success look like? Right. And ideally, this evaluation is not gut based, because if it's gut based, you're adding another layer of uncertainty. So we now have some folks that are just not evaluating their systems, and then we have some folks that are evaluating their systems on Vibes. And the bucket of folks that are sort of evaluating their systems deterministically is actually very, very small. But if you were doing that, everything would get better over time because then you can start using math, you can use back propagation to start building systems that are better over time. Um, so that's kind of where we think a lot about this.

Demetrios [00:11:44]: But why do you feel like people aren't using evals? Is it because they don't know what good looks like? Is it because it is a new product and they want to get it out there? And then once they get users giving them that feedback, then they can create that eval loop. What is it that is missing?

Tanmay Chopra [00:12:05]: So I think it's two things. One, this is the obvious one. Eval is hard. So I don't think people are actively choosing not to do eval. I think this thought process of what does good look like in some cases is hard to describe. In some cases it overlaps between technology and business. So this is also why machine learning used to be hard.

Demetrios [00:12:25]: Right?

Tanmay Chopra [00:12:25]: You've got people trying to do calc and then you need to kind of map that to a business metric because you don't actually have any signal till the business metric comes out. So that's sort of that number one piece, right? That this is actually a really hard problem. I think number two piece is a lot of people are thinking about AI systems a lot more statically than we thought about ML systems or at least did for the last couple of years. They would ship something that looked decent and then kind of say, okay, this is it that works in software. You can front load a lot of thinking in design and then you ship it. Because over the years we've gotten really good at creating software abstractions that result in good output software the first time or the second time or maybe the third time. AI systems are inherently perishable, right? So you kind of have to think about from moment one, how are you going to keep retraining and improving this? But this is very new. Just as a paradigm, it's very new.

Tanmay Chopra [00:13:22]: And so if you're a software engineer, Building AI, that's the thing I always encourage folks to think about is how is this different from your software development life cycle? It's not one and done. It is inherently improving over time. That, like, that's a feature, not a bug.

Demetrios [00:13:39]: I remember when we had, there was a great article that came out from Google probably in 2021, talking about continuous training and it showed the whole maturity levels that you could get to that would automate that retraining pipeline. And it's a little bit going back to that. Even though now we're using large language models instead of these traditional ML models, you still want to look at it as, as soon as you get the model out there in a way, you need to be ready or you need to have a plan for how you're going to keep iterating on making it better and what you can do to fine tune it. And I think a lot of people are having success with fine tuning prompts. I know that you don't necessarily like that idea.

Tanmay Chopra [00:14:33]: I don't think I dislike that idea. I think I've just seen that sort of hit a ceiling with the early movers from the last two years. And so in some ways I tried to preempt folks struggling when they hit that ceiling. But actually the approach we take is we say don't fine tune till you have to.

Demetrios [00:14:54]: Right.

Tanmay Chopra [00:14:54]: So do the best you can in the prompting world and if it's good enough, like, fantastic, let's help you maintain those systems through the retraining piece.

Demetrios [00:15:03]: Right.

Tanmay Chopra [00:15:03]: So part of what our platform does is it actually streamlines the retraining piece because retraining used to be this most painful part of machine learning. If you have your own model, you're now like signing up to retrain this model month over month. And so this is actually why we discourage people from fine tuning their own models. Because you need to know that it's a lifelong investment.

Demetrios [00:15:26]: Right.

Tanmay Chopra [00:15:26]: It's like having a child where you kind of have to take care of that child month over month, year over year. You don't get to just like give birth to a child and then be like, I'm out. And so I actually don't, you know, knock on prompting or prompt fine tuning. I think it's more a function of you should be intentional about when you're doing that versus when you're fine tuning your own model. So the big challenge with any optimizations at the prompt level is that you can't change the objective function. So the model still only cares about the next word. Most enterprise Tasks care about things other than the next word.

Demetrios [00:16:07]: Right.

Tanmay Chopra [00:16:08]: You might care about some form of classification. You might care about, you know, you generate an investment memo and you're like, okay, did this memo map to reality? You don't actually care about the words of the memo.

Demetrios [00:16:19]: Right.

Tanmay Chopra [00:16:19]: Or like words of memo one versus ideal memo. What you care about is like, what was the grounding in reality of memo one versus Memo two? So improving your model or your system, AI system based on your objective function is something you do not have access to till you start owning your own models. That's the only reason that we encourage folks to think about this is it can improve for you.

Demetrios [00:16:46]: Right.

Tanmay Chopra [00:16:46]: It doesn't just improve for words, it improves specific to your AI system objective.

Demetrios [00:16:53]: So the idea here is that you're doing as much work as you can with foundational models and then eventually you feel like going back to what you said before. There's these two vectors that we need to look at. Is AI good? Is AI good for you or is the AI product, I guess, that we wrap it up in is it good and is it good for your use case? You can get pretty far, I imagine, with the foundational models. And indeed a lot of companies, that's as far as they get.

Tanmay Chopra [00:17:25]: Yeah.

Demetrios [00:17:27]: Their AI center of Excellence says we've got a win under our belt. Let's go play around with other stuff and wait till the new model comes out. But what you just described is very much like going back into the old world in a way, old, traditional ML world. So how do you look at that? Is it not just a ton of burden? And when do you want to use those types of use cases? Because I think when these AI center of Excellences are talking about their use cases that they're going to tackle, if you bring up, oh yeah, we now have to do AI like we did ML. It gives people cold feet.

Tanmay Chopra [00:18:14]: Yeah. So that you bring up a bunch of interesting things. So in terms of the burden side, that's why we're building msle, Right. That's the whole purpose of our platform, is to say, hey, it was actually really hard to build models before and it shouldn't have been. So let's try and make it easier.

Demetrios [00:18:30]: Right?

Tanmay Chopra [00:18:31]: Like, how much easier can we make it? And our bar to success is really, can we make it easier than prompting? That is the level that we are pushing to go to. We're pushing to go to say, if you want to do a GRPO based optimization.

Demetrios [00:18:44]: Right.

Tanmay Chopra [00:18:44]: So the system that Deepseek use to build their model, we Want to make it as easy as it is for you to prompt. In fact, in the longer run we want to make it easier because you don't have to be guessing around removing a word here and there and being like, oh, did my prompt regression? You can just be like, hey math, please take care of this, optimize to this function. So burden is an infrastructure problem, right? That's something that even if we don't solve, somebody should and will solve if the industry becomes.

Demetrios [00:19:14]: Sorry to cut you off, but I want to stress this point. You said, hey math, go and figure this out. So it's not that it's someone that is pulling something out of their ass with their prompt and trying to explore latent space. It is math equations, which that resonated with me a lot the first time you told me it because it was like, huh, that seems like a much better way to systematically go about improving your product.

Tanmay Chopra [00:19:49]: We have a deterministic way to reduce the loss, right? Like that's all of what machine learning was. We were like, hey, if you can define what good looks like into some equation, we can use that equation to start approximating outcomes based on your input. This is what all of machine learning was. Somewhere along the line, I think we kind of lost the plot and said, let's not use that. Let's generalize that right to the next word and let's try and guess the series of words that result in the right word outputted instead of actually trying to use the math that we've been relying on for this long.

Demetrios [00:20:33]: But we made so many influencers so much money with their prompting guides, 100%.

Tanmay Chopra [00:20:40]: I still think it's useful. So here's the thing. I think the real LLM unlock is you need a lot less data to bootstrap your first version, right? So if you wanted to start doing machine learning three years ago, your first call to action was, I need to go to, you know, scale AI or search AI or somebody like that and get 100,000 samples labeled. Now you can say, hey, let me start guessing some words. And this is where those guides are really useful. And let's get us to V0. This V0, if it's good enough for customers to use, becomes our pipeline to get better every day for the rest of our lives. Our only burden is the infrastructure burden.

Tanmay Chopra [00:21:25]: And maybe you can check out emissary for the infrastructure burden or you can check out other sources, but there are people helping you with this infra burden and now you have this self improving system. And I know there's a lot of founders or a lot of AI people that say hey, why don't you just wait for the next GPT? Or what happens when there's the next GPT? And this is my standard response there, right? They don't care about every task in the world. They cannot. They can care about every task in aggregate, but they do not care about your specific workflow. And so GPT5 could be much better. But will it be much better for you? Is a very interesting question. Task that's number one. Number two is if they do start getting about your task, that's a whole other like can of worms, right? So you see with anthropic codegen they started caring enough about optimizing for code and so now you're competing with them and you're not improving your systems based on your users interactions, but they are.

Tanmay Chopra [00:22:29]: And so you're now in a universe where they came to your game six months or a year later, but because they're better at improving day over day, they're going to be better in weeks. That's the so actually you want to be as orthogonal as possible in terms of better for you versus better from the foundational models. If you want to keep space between what you do and a foundational model sort of trampling on you in some sense. Everyone has to go up against OpenAI or Anthropic at some level, right? Like that is if you're building an application layer company, at some point they will come for your application. You can't simultaneously believe that CodeGen will make all software mode zero and then go out to public markets and say our moat is software mode, right? Like those two things cannot exist parallelly. So you either say code gen is going to be really bad and so there will still be software mode, or you say oh shit, we need something more than just a software mode. Not to start, but over time it.

Demetrios [00:23:32]: Goes back to your last point or one of the points that you made before with the workflow solution that wasn't working up to standards and that is because it is trying to be generalized workflows. It's not a Here's a workflow that scrapes keywords from Semrush and then goes out and looks on Reddit and looks at all the posts that are for those keywords, and then looks at all the blog posts that have been written about those keywords and then creates a intro or an outline for a blog post that you can write and then it creates the intro and the body text and it and then it has that run through some kind of SEO optimizer? I'm just explaining like, yeah, the very deep, deep workflow that I've seen my friends in marketing create for generating a shit ton of SEO optimized blog posts from AI generated content. Right. And so whether or not that is actually useful for the Internet is a whole another conversation. But it's out there, it's happening. It is super advanced. And that happens because a marketer or a few marketers sat down and they said, you know, it would be great if we had all these different steps. And then if I go and I say I want a workflow that will do all these different steps, even if I know exactly which step is which and I prompt it or I try to prompt it, the model isn't going to understand.

Demetrios [00:25:14]: Or the magical workflow that is created from that isn't going to understand because it's again, going back to what you're saying. They didn't have thousands of these prompts. And so it couldn't know what I was trying to look for because it is such a broad type of a product. It's not that verticalized product.

Tanmay Chopra [00:25:37]: Yeah. And what's really interesting there is you added another layer of uncertainty, right? So this is one of the. I always, like everyone laughs when I say this, but you should use as less machine learning or AI as you feasibly can to build your system.

Demetrios [00:25:54]: Right.

Tanmay Chopra [00:25:54]: And obviously this is counterproductive to my business, but the idea here is you had two layers of uncertainty you prompted and that resulted in the creation of a workflow. So you have what we think of as like orchestration uncertainty, Right. And then that workflow needs to get executed. So these are different pieces that need to get executed. And so you have some level of execution uncertainty within each piece. That execution is also volatile. So you also have component uncertainty.

Demetrios [00:26:27]: Right?

Tanmay Chopra [00:26:28]: Now, if you look at the folks that are in marketing that are doing this, if they're marketers, the workflow is coming from them, so they have no orchestration uncertainty. They will say, do this first, then do this, then do this. They have an SOP and they'll execute on that sop. They still have execution uncertainty, but even within that, because they're experts, right? This expertise is sitting in their head. They're actually a lot better at aligning these systems. So the whole game now is just a question of who is the best at aligning the models. We've kind of flatlined in terms of how good we are going to get by throwing more compute at the problem to A large extent. And this happens every so often.

Tanmay Chopra [00:27:15]: And so I'm sure 10 years from now or five years from now, we'll have a much larger model that'll do much better than everybody else. What's really interesting is a lot of value is created at the next layer of foundational model, right? So when CNNS came out, it was huge. A lot of value was created, but actually most of the value is captured between the times that one large model comes out and the next larger model that blows the last large model out of the water comes out. This is what we're very excited about, right? Is the actual value capture in this journey has always been the applied AI people. It's always the people who are like, these are the new tools we've gotten this new large model, this new way of training. How are we going to orchestrate these things to actually generate value for someone that'll be willing to pay for these systems, ideally more than what it costs you to run them. That's kind of what's really exciting about this period in time. But yeah, there is.

Tanmay Chopra [00:28:21]: The more uncertainty you add to the system, the less likely it is that the system will do well.

Demetrios [00:28:25]: Okay, so getting back to the question I asked before we went on this epic tangent around these AI center of excellences thinking, oh, no, I don't want to go back in time to traditional machine learning methods.

Tanmay Chopra [00:28:40]: Yeah, I think I would focus less on method and more on outcomes. The thing we've been encouraging centers of excellence to focus on is the bigger bets. And what I mean by that is it's equally hard to build an AI system for customer service as it is for something that will completely change the unit economics of your model of your business more often than not. And so one of the things we focus really deeply on when we start working with folks is what's the biggest problem you can tackle?

Demetrios [00:29:15]: Right?

Tanmay Chopra [00:29:15]: If you were going to commit 10% of R& D spend to a specific problem or 20% of R& D spend to a specific problem, would it be worth it? If not, just buy the solution, right? Like there are companies out there that are doing amazing jobs at doing the things that are not worth 10 to 20%. And because these AI systems are perishable goods, you're now putting somebody on the hook for maintenance every time you build a new system. If you can outsource that, you probably should. You want to build the stuff where your feedback loop is optimal. That's the shortest way I can put it. If you have a differentiated feedback loop.

Demetrios [00:29:56]: Right.

Tanmay Chopra [00:29:56]: If you're an insurance company and you write insurance policies like, figure out how you can deal with the insurance broker shortage by building a copilot for insurance workers. That's the alpha for you, right? Not trying to figure out like, hey, can I build a marketing AI tool? Because I think it's going to cost you 100 bucks a month to buy. It's probably going to cost you like a hundred thousand dollars to get right.

Demetrios [00:30:18]: Right.

Tanmay Chopra [00:30:18]: Like the economics don't work there, I think. Invariably. Going back to your original question, I know I've been going couple of circles, but AI systems are going to become a component in the ML system pipeline. So you're not going to forget classifiers, but your classifier might sit on top of an LLM. That's how we think about this universe. Standing out.

Demetrios [00:30:46]: What does that mean, sit on top of an LLM? How does that.

Tanmay Chopra [00:30:49]: So say you're building a chatbot.

Demetrios [00:30:51]: Right?

Tanmay Chopra [00:30:51]: Common use case. Right now there is like so much work being done on trying to figure out how to get the LLM to only answer the questions that it should. The easiest way to do that is to build a classifier that says, should I answer this question or should I not? Stacks on top of an LLM. So basically upstream from an LLM question comes in, should I answer? Should I not? No. End the chat or politely end the chat. Yes, let your LLM answer. Instead of trying to get this LLM to be this one unit that's perfect, just use old school ML to solve the problems that old school ML could solve. So I think there is that kind of fear.

Tanmay Chopra [00:31:34]: That's actually one of the things we've spent a lot of time doing is converting LLMs into classifiers, regressors, and other old school models. So you can still use their inherent knowledge.

Demetrios [00:31:44]: Right.

Tanmay Chopra [00:31:45]: That's what's exciting about them is the knowledge that sits between the first layer and the last layer. You can still use it, but you can use it for your tasks. So now with 500 samples, you can build a class up there. We're really excited about this universe.

Demetrios [00:31:58]: Right?

Tanmay Chopra [00:31:59]: But I also understand that a lot of people are pretty hesitant to sort of slip back into old school ML. I don't think that's a. I think you just have to get over that fear at some level.

Demetrios [00:32:09]: Yeah. Well, it's interesting you say that because it is a much different scenario if it is only 500 examples. Even though at the end of the day you gotta go and you gotta have those chops to know what you're doing, you it no longer becomes, I'm pinging an API.

Tanmay Chopra [00:32:30]: Exactly. And that transition is going to be rough, but that transition is going to fundamentally change how you do business right, within the AI world. Because now you can get a precision recall curve. This is the most exciting part about a classifier. And I know I'm going to get a sale for saying precision recall curves are exciting, but to me, they are. Now you have error margins, right? Like, this is game changing. You go from a world where you just have to hope that the next token is the right one to a world where you're like, oh, if your confidence score is below 0.9, just don't answer the question. And then you say, oh, we're making a lot of mistakes there.

Tanmay Chopra [00:33:06]: Let's raise that confidence score requirement. So if your confidence score is above 0.95, don't answer the question. Below 0.95, don't answer THE question. That's kind of the world we need to be in, right? Because if you're an enterprise, you can't go out there and have no. AI is never going to be perfect, but at least you need to know roughly how often it's going to go wrong.

Demetrios [00:33:28]: And it's interesting, like, classifier is one example. The other example that I've heard from the folks at Process for their AI agent is they want to know if their agents have enough information to go and execute a task. And that feels like something that could be a distant cousin or a close cousin of a classifier.

Tanmay Chopra [00:33:51]: Exactly. Same math problem, right? You're. You're basically going to say, what's my threshold confidence score? Above which I think we have the right answer. You train the model over time to learn, hey, when I'm below 0.9, I'm like, never, right? Okay, I'm going to be above 0.9. It's, like, very simple to improve AI systems when you start adding thresholds to them, because then you can just work in the world where you're good, right? And you can say, hey, we'll default to the old workflow when we're bad. Right now, there just isn't that distinction. The LLM is never able to say no. That's the.

Tanmay Chopra [00:34:28]: That's the manifestation of this problem that everyone's seen, right? It. It answers with equal confidence when it knows for sure and when it has no clue.

Demetrios [00:34:39]: That was the biggest problem that they ran into in the podcast that I did on their data analyst agent is that folks would ask it for data or how many partners have done X, Y, Z in the last month and they could know that information, but sometimes they wouldn't be given the right amount of detail. And so the LLM would come back with some kind of plausible answer. It would go off and it would be thinking for a little while and then it would come back and would give you this full story. And if you weren't really intimate with that data, you would be like, yeah, it seems about right.

Tanmay Chopra [00:35:19]: Yeah.

Demetrios [00:35:20]: And so that is a really dangerous scenario because you base decisions off of that, which if you are not careful, can be completely fabricated 100%.

Tanmay Chopra [00:35:33]: That is the most dangerous chunk of answers is when it's plausible but wrong. Right. Everything else is actually in. In that grade is actually pretty fine. It's that part which is like, oh, God, what do we do? And especially in regulated industries, right. So if you're in finance, if you're in healthcare, if you're in cybersecurity, you don't get to make these mistakes more than once or twice before, like meaningfully having to pay for them. And so that's where we spend a lot of time thinking about how do we make sure that when it's not confident, it's not answering, that's the fundamental problem you need to solve. No one's going to be mad at you if you default to the old ways.

Demetrios [00:36:23]: Right.

Tanmay Chopra [00:36:23]: So if you have a search engine and it always answers the question, but occasionally it says, hey, I'm sorry, here's the search results, I don't think I can answer the question. That's amazing. But every time it says made up answer, you start trusting it a little less.

Demetrios [00:36:38]: Yeah, yeah. And it's really easy to lose that trust. That's the other thing, is that it's hard to gain it and easy to lose. But the, the idea that you're bringing up too, I have heard from Igor who works at Wise, and he was talking about how LLMs should just be viewed as another node in a DAG that can do a bit more fancy stuff. You don't need to think of it as this whole revolutionary new product or new thing. Let's just think about it a little bit more unsophisticated and say, all right, cool. It can take unstructured data and make it structured. That's what an LLM is great for.

Tanmay Chopra [00:37:26]: Yeah. So I see it as two ways. Right. I see it one as wet clay, so it can sort of fit into any position before you mold that node. Anything can be an LLM. That's the best part of it being so flexible.

Demetrios [00:37:40]: Right.

Tanmay Chopra [00:37:40]: You can treat it as a classifier. Today you can treat it as a regressor, tomorrow you can treat it as a generator the third day. So as you're coming up with your MVP system, it's amazing, right? It's a prototype solution at that stage. And then when you harden that clay, it's really good at one thing. And that thing is fluency. So it's not good at decision making. And if you use it for decision making at some point in your journey, you will be like, oops, maybe we shouldn't have done that. But it's really good at fluency.

Tanmay Chopra [00:38:10]: So back when I was in grad school, I used to build, I was doing some work around persuasive language modeling. So how do you generate text that could potentially persuade people? And our biggest blocker there was fluency. So given a user query, we could come up with the best facts facts to try and convince you otherwise, but we couldn't present those facts in a fluent way. And now LLMs can do that. So this is four or five years ago.

Demetrios [00:38:41]: Right.

Tanmay Chopra [00:38:42]: And that's kind of where we were stuck with those systems. So if you start thinking about LLMs as another node in this like AI systems world or ML systems world, you start seeing a whole new set of problems that can be solved. It was the same with sort of the marketing use case.

Demetrios [00:38:57]: Right.

Tanmay Chopra [00:38:57]: You could always. Targeting has been good, retrieval has been decent. The only thing we didn't have is fluency. So if I could target you with the right material, how do I actually give you that final piece of text that you're going to read or video that you're going to watch now is unlocked. That's I think the hard clay version of the LLM. So there is a purpose for the LLM when you're demoing or MVPing and then there's a separate purpose for the LLM when you're like in production.

Demetrios [00:39:26]: What are some ways that you have seen folks actually having success with AI? And going back to what you were saying before, having good AI, I think is the key here. And specifically good for them too.

Tanmay Chopra [00:39:42]: Yeah. So I'm a big, big fan of the CodeGen universe. And specifically for dev tools or MVPs, I've seen it actually. I'm very, very excited about the no code, low code devtools universe because basically what LLMs have or codegen has made possible is having complete flexibility over your components without losing any control. So without losing, with still having a lot of complexity abstraction. Right, And I'll rephrase that. Essentially the Trade off always used to be flexibility for complexity and that trade off is now gone. That's what's really exciting to me.

Tanmay Chopra [00:40:31]: You can have all your code, but you can also have it generated in a few minutes. These systems, I still have yet to see them work well with larger code bases. So I think code context retrieval is a really big problem that pretty much everybody in this world is sort of trying to solve. But when you have no baggage, when you have no code base is DevTools. That's exactly it. You build a bunch of these things every day or every week and now you can build them for a lot cheaper. So I think like a dev prod team that now has five people can ship so much more than a dev prod team yesterday that had five people. So this one I'm very, very excited about.

Tanmay Chopra [00:41:14]: I think the other one is localization. That's been really hard for us to do.

Demetrios [00:41:20]: Right.

Tanmay Chopra [00:41:20]: So I used to be an ML engineer at TikTok and we spent a lot of time thinking about internationalization because we were in about 150 countries. I think that's one of those other low hanging fruits that I'm very excited about. Can you localize content?

Demetrios [00:41:34]: Right.

Tanmay Chopra [00:41:35]: And furthermore, hyper personalization is obviously one of those big bets. It's always been a big bet in advertising and now you can go even further. To be fair, these things are both hard.

Demetrios [00:41:46]: Right.

Tanmay Chopra [00:41:46]: There are a lot of problems in both worlds. With DevTools, you might want to think a lot about constraining universes. So you don't want to use any database, but if you use a third party dev tool system, like a code gen system, it could call any db. And so there's some degree of like constraining that needs to happen for a good dev tool code gen company. And then with advertising, I think it's just like a. There's a huge talent shortage in engineering for any industry that does localization, which is usually sort of the content generation industries.

Demetrios [00:42:19]: I'm not sure what the localization use case is. Is that just serving the right ads to the right people?

Tanmay Chopra [00:42:26]: No. If I create an ad in the U.S. right. And I want to serve it in a manner that appeals to people in Portugal.

Demetrios [00:42:34]: Right.

Tanmay Chopra [00:42:35]: So say it's about coffee. You probably have coffee with a donut if you're in the us, but you'll probably have coffee with a paste if you're in Lisbon. Right. And so that's kind of the. It's a tiny change. But. But your ad means so much more.

Demetrios [00:42:51]: Uhhuh.

Tanmay Chopra [00:42:52]: And it's the same for blog Portuguese.

Demetrios [00:42:54]: It'S going to be in English. In the US it's in Portuguese. You have. I imagine you would want more Portuguese looking people in the Portuguese one and American looking people in the American one. So.

Tanmay Chopra [00:43:06]: Exactly.

Demetrios [00:43:07]: They can relate to it better.

Tanmay Chopra [00:43:09]: And I think that'll happen for news, I think it'll happen for ads, I think it'll happen for blog posts.

Demetrios [00:43:17]: Right.

Tanmay Chopra [00:43:18]: Which case study are you serving whom and what you emphasize on where There's a lot of these like slices to that. We've always been trying to do this. It's just the fluency has been the blocker. Everything else in that pipeline. I know where you're calling me from. I probably know something about your business, right? Identity resolution is pretty good. I just, I don't have the fluency engine and now I do. So that's kind of how what I'm excited about.

Demetrios [00:43:45]: But the way that you can use AI to get that last mile, if we're talking about, oh, I want different looking people in the ad in Portuguese or in Portugal versus in the U.S. unless you're doing video generation, then you're not really going to get that, right?

Tanmay Chopra [00:44:10]: Or not necessarily. You can make it a retrieval problem. So how you design your AI system is actually it's a very interesting industry, right? For example, here you could have a collection of models from all over the world and whichever country you're in, you just retrieve that model and now it becomes an interpolation problem. It's not a generation problem. You're not generating a fake Portuguese human.

Demetrios [00:44:36]: You're talking human models. When you say models, you mean human.

Tanmay Chopra [00:44:39]: Models or human models like the influencers. So you get like a Portuguese influencer, you get a picture of them and you say, hey, we're going to please give us the right to put your picture in our ad. This is what the ads for. And then all you have to do is inpaint, right? Which is a much different problem from generation. And actually inpainting is a lot more reliable than pure generation. So now you also have a higher degree of determinism around your system. And so we actually think a lot about this, right? Like the way you design your AI system can be very, very different from system to system and you're trading off for different things. So right now what we did is we turned a generation problem into a retrieval plus infill problem and now we have a different pipeline where we can optimize both of those systems and the retrieval problem can be purely deterministic.

Tanmay Chopra [00:45:35]: You don't need to use AI based retrieval. You literally just go to a database and say, who are the models from Portugal? And you can infill. So you've reduced your uncertainty drastically over like a one minute conversation.

Demetrios [00:45:51]: Yeah, okay. And it's, it's really thinking creatively about that and thinking about that in these different ways where if you've seen it enough, I'm sure you, you as a person who has worked with companies trying to do this or doing it yourself, you've seen patterns and so you are able to go and say, well, yeah, I mean, you can do it like that. But the way that I would do it is, is like this. And so there still is almost like these design patterns that sound like they're kind of stuck in your head and you need to get them on a blog post or something. So I can go and scrape them and then have an AI generate a blog post that has my name on it, but it's really your blog post.

Tanmay Chopra [00:46:38]: Yeah. I think ML intuition is very tribal. This is actually a big problem that our industry has. Right. Most of the people that have been building these systems for a while, we learned by failure. So I failed 50, 60, 100 times at Building ML Systems before I was like, oh, I'm starting to get a hang of this. I think enterprises at large need to be a little bit more patient with their ML teams or their AI teams because they're also developing this intuition. It's a matter of time.

Tanmay Chopra [00:47:09]: One of the things we're doing on our platform is we're trying to sort of imbibe some of this intuition into the platform. So the things that I'm telling you, right, hey, maybe we should design it this way, or maybe you should use this base model instead of this one. For this reason, we're pushing some of that onto the platform and the goal is to try and push as much of that as we can onto the platform. So we don't actually think of MSRE as sort of a fine tuning platform. We think of it as ML intuition delivered through infrastructure. How do we make these AI engineers who are struggling with this stuff, who are maybe one or two or ten fine tunes in or prompt optimizations in to start getting access to this brain that might have done hundreds of thousands of fine tunes or optimizations, and that brain is the sort of the MSRE engine. So we are working towards lowering that barrier because that is a big barrier right now. But yeah, in the meanwhile we, we just hop on calls and say, hey, maybe this is how you should Try.

Demetrios [00:48:12]: It until you can take those opinions and bake them into the product. And I really look at it as, yeah, the pattern recognition that you've seen over the years. And I appreciate the fact that you are calling out the obvious, which is, it's a system, and it's not just, okay, we've got this ML model or we've got this AI model. We need to really look at the system. It goes back to that old D. Scully post from back in the day that everyone used to quote, where it was a lot of different boxes in the infrastructure and only one of these small boxes was the actual model. The rest were the different pieces of it. And granted, it's.

Demetrios [00:49:00]: It's been updated since then because you don't necessarily have the data pipelines anymore and you don't have that training pipeline. But you need to think about the AI product and the AI system that is around it, as opposed to just, all right, cool, I'm hitting this API and then serving it up. And that does feel like what a lot of people are learning these days.

Tanmay Chopra [00:49:27]: 100%. And actually they're learning it. The one thing I'm always very excited about is they're learning it a lot faster than we did.

Demetrios [00:49:34]: Right.

Tanmay Chopra [00:49:35]: The ML industry, we took decades to learn some things that, like, the AI industry is learning in months. And so it's very exciting to sort of see the pace at which things are developing, you know, even evals. Like, we took so long as an industry to realize how important it was. And I think it's taken maybe AI like a year and change from the first systems in, at least in demo, to get to a point where everyone's like, okay, let's stop and think about what a good outcome is here. And then we can sort of optimize towards that. So I think that's sort of a big step in the right direction.

Demetrios [00:50:12]: Yeah. And realizing that certain evals mean nothing, like the leaderboards, and you need to have your own evals that you keep close to your chest. Otherwise, in a few model iterations, they're also going to mean nothing.

Tanmay Chopra [00:50:26]: Yeah, that's you. You got to keep your evals, sort of make them more robust over time, and your system has to be learning from those evals. It means nothing if you're like, my model is doing X, or is that quality X if that's not feeding back into the model to make it X plus one.

Demetrios [00:50:47]: Right.

Tanmay Chopra [00:50:47]: Like, at that point, maybe don't do evals because you're just wasting time. It's. You have to Close the loop from generation evaluation to improvement. Without that last piece of the loop, your system is static, right? And that's sort of the worst thing you can have in a perishable universe is you know, the baseline is going down and you're not doing anything to sort of keep improving it.

Demetrios [00:51:13]: Perishable goods, man, I love that idea and the vision of as soon as you put something out, it's already stale. And we knew that. It's so funny how speaking to you, you're. You're really bringing some of this stuff back. Like, as soon as we would put out a ML model, we knew it was stale. And you would have to go and you would have to figure out the features, or you would have to retrain it as much as possible, and you would be looking at the model drift, and when it would hit a certain level of drift, or if there was some kind of a big thing that would throw it out of whack, you would try and trigger a retraining and make that happen as fast as possible. And then you kind of have that champ, your challenger, as you deploy one new model, to see is it working as well as this other one? Let's see. And so now we're.

Demetrios [00:52:05]: We're taking that same idea, but we're saying, all right, well, what about these prompts? Can we have champion challenger prompts that will kind of work? Are they going to.

Tanmay Chopra [00:52:13]: Exactly. And I think once you start getting good eval out, right, it's very easy to start doing those experiments. So the step two here, after that, step one of what good and bad looks like is how. How can we start experimenting?

Demetrios [00:52:27]: Right?

Tanmay Chopra [00:52:27]: Can we do like, a. For us, it was so normal to launch a new model, do a 5, like an offline eval, which was on your existing regression set, and then an online eval on, like, 5% traffic, right? These were like standard mechanics that we ran for every model that we trained, because you knew that a lot of the ground truth was limited in offline eval. So you had to see what this new challenger prompt looks like online. And you just have to get comfortable with this idea that machine learning is iterative, that you're almost always going to be experimenting. And I think that sort of scares folks a little bit because with software, you sort of, you know, put your head down and then release. And if you have to patch, it's a bug, it's a bad thing, right? In ML, you're always patching like that is all you do. You're just like sticking patch on top of patch and it's a feature, it's not a bug. So I think that mindset is something that's undergoing a transformation there and I'm very excited about it.

Demetrios [00:53:28]: So basically what you've seen as the graduation from just hitting an API, calling some kind of OpenAI API, anthropic API, whatever, and calling it from any language that folks want, and then embedding the prompt in that API call, that's great until you get to a certain maturity level. And then you have to do what.

Tanmay Chopra [00:53:53]: Then you have to start thinking about what are the other steps that you might want to do locally. Right. So you might want to have an AI server that's sitting right next to you, you might want to embed locally. This is like one of the simplest ones is you probably don't need to call an API endpoint to embed your query when the user sends it. You can just hold because the embedding models are super small. So you can host your own embedding model on the same server. And now you've cut down the network latency by about a second.

Demetrios [00:54:20]: Right.

Tanmay Chopra [00:54:20]: So if you're running this a million times a month, you're reducing quite a bit of time for your users. And to do all of this, though, is actually pretty hard in any language other than Python. This is kind of a big challenge.

Demetrios [00:54:38]: Why is that? Right.

Tanmay Chopra [00:54:39]: The ML community thrived in Python and so all of the support that exists is in Python. That's not to say you can't load a model in another language.

Demetrios [00:54:48]: Right.

Tanmay Chopra [00:54:49]: That would be sort of superfluous to me. Say it's that the amount of time it would take you to do that and do that. Right. Is usually just not going to be worth it. We have like a decade's worth of, you know, ML support in Python, whether that's your PyTorch, whether that's TensorFlow, whether that's now Transformers. There's so many layers of abstraction that make your life a lot easier. And so we're seeing this sort of universe. And it could be a good universe or a challenging one.

Tanmay Chopra [00:55:19]: We'll know in a few months where folks are building AI systems into their existing code bases. That is the fastest way to go. But that usually means that once you hit the ceiling that you can with foundational models, you have to rip everything out, create a separate Pythonic universe, and then start loading everything in there, starting.

Demetrios [00:55:43]: Over from scratch if you aren't doing it in Python.

Tanmay Chopra [00:55:46]: Exactly. And so now that we could get lucky and like, there is support for ML in other languages by the time you need to do that. But this is why, like the gentle recommendation I tend to give is, I know it's going to be a big lift to now start working in a different language, but if you look at this AI universe very seriously, then you should probably start thinking about an AI backend, right? What does it mean for you to have like a backend service dedicated to your AI systems? Then you can hand off eval to that system, you can hand off API calling, model management, gateway management. It just like the separation of concerns is so clean when you start doing that. But there is an upfront investment of, hey, are we really going to take on a whole new language? So I think there is some back and forth there to be had.

Demetrios [00:56:38]: Yeah, because that was one of the key things is that people aren't using Python or not as much as you would suspect, especially if they're not used to living and breathing in AI ML world 100%.

Tanmay Chopra [00:56:53]: I think people are using Python a lot less than folks would believe because there is no immediate reason.

Demetrios [00:57:00]: Right.

Tanmay Chopra [00:57:00]: When you like start hitting an API, you can hit an API in any language. So maybe you're not using, you know, OpenAI and anthropics. SDKs are very limited. I think there's three or four languages, but you can just call an API endpoint.

Demetrios [00:57:14]: Especially if you're used to coding in JavaScript. Why would you switch over to Python?

Tanmay Chopra [00:57:19]: Exactly.

Demetrios [00:57:20]: Until you hit that limit. And then that's what you're saying is that you're seeing folks, you try and tell them in the beginning when they're starting off their journey and then they say, yeah, yeah, yeah, whatever. You don't know what you're talking about, kid.

Tanmay Chopra [00:57:36]: I think there's like an interesting enterprise decision there, right. So there is a big lift to move to Python. So maybe it is optimal to MVP in your existing language as long as you know that if the MVP works, you're going to have to move, right? So maybe you say, hey, we don't want to move to another language before we see some AI value. But if you're convinced that your business is going to run on AI or be AI native in the near future, then I always recommend go all in. If you're not convinced, then yeah, it might actually be worthwhile to stay. But think through that journey, make sure that you're cognizant of the decision you're taking is sort of my focus there versus sort of pushing anyone in one direction over the other.

Demetrios [00:58:28]: Well, and what was it you were Saying about, okay, then you need to eventually bring your model in house. And I was saying, well, it's probably through Microsoft OpenAI API or it or the anthropic AWS version.

Tanmay Chopra [00:58:44]: But what was it?

Demetrios [00:58:46]: The model isn't necessarily the hard part here.

Tanmay Chopra [00:58:48]: Yeah, it's everything around it. So that's why we were talking about the querying stuff, right? I. How do you generate a query embedding or if you're trying to generate like rewrite your query. So query reformulation, there's no reason for you to call a foundational model to do query reformulation. You could just do it locally with a tiny model. So you don't need to use anthropics through aws or Azure OpenAI. You just host that next to you and it's like a dollar an hour if you're using an A10. And it's faster, it's cheaper, there's no reason that you wouldn't do that.

Tanmay Chopra [00:59:24]: But if you're not living in a Pythonic universe, you basically cannot do that, right?

Demetrios [00:59:29]: Because you can't do that in Ruby or you can't do that in JavaScript.

Tanmay Chopra [00:59:32]: You can do that in any language. So you could do this in Ruby, you could do this in TypeScript. It just would be really hard, right? Like you'd have to rewrite a lot of libraries from scratch. That's kind of the trade off there is. If you're working in, you know, Java, for example, you basically will need to rewrite a lot of libraries. I actually went through this personally where I was doing this for Go, right? And it became such a big blocker that eventually I was like, you know what, we'll just call the API endpoint. Like, I really don't think I should rewrite Transformers in Google. And it's, it's an interesting challenge to wrangle with.

+ Read More

Watch More

Finetuning LLMs

Posted Oct 26, 2023 | Views 770

# Fine-tuning LLMs

# Building Production

# PowerML, Inc

Finetuning Open-Source LLMs // LLMs in Production Conference 3 Keynote 1

Posted Oct 09, 2023 | Views 7.7K

# Finetuning

# Open-Source

# LLMs in Production

# Lightning AI

All About Evaluating LLM Applications

Posted Sep 28, 2023 | Views 880

# Evaluation

# LLM Applications

# Exploding Gradients