Sign in or Join the community to continue

LLMs in Production - How to Keep Them from Breaking // Vaibhav Gupta // AI in Production 2025

Posted Mar 17, 2025 | Views 202

# BAML

# LLMS

Share

speaker

Vaibhav Gupta

CEO @ Boundary ML

Vaibhav is a software engineer with over 9 years of experience productizing research. At Microsoft, he worked on realtime 3D reconstruction for HoloLens. At Google, he led performance optimization on ARCore and Face ID. Now he's bringing that same experience to help bring better quality and speed to Generative AI technology.

+ Read More

SUMMARY

Deploying Large Language Models (LLMs) in production brings a host of challenges well beyond prompt engineering. Once they’re live, the smallest oversight—like a malformed API call or unexpected user input—can cause failures you never saw coming. In this talk, Vaibhav Gupta will share proven strategies and practical tooling to keep LLMs robust in real-world environments. You’ll learn about structured prompting, dynamic routing with fallback handlers, and data-driven guardrails—all aimed at catching errors before they break your application. You’ll also hear why naïve use of JSON can reduce a model’s accuracy, and discover when it’s wise to push back on standard serialization in favor of more flexible output formats. Whether you’re processing 100+ page bank statements, analyzing user queries, or summarizing critical healthcare data, you’ll not only understand how to keep LLMs from “breaking,” but also how to design AI-driven solutions that scale gracefully alongside evolving user needs.

+ Read More

TRANSCRIPT

Ben Epstein [00:00:00]: We have here Vaibhav Gupta with BAML and actually BAML's come up already in a couple of the talks. People are already using it for their systems, which is great. I think Five's take right now is whether or not they're conscious. They can do a lot of really useful things. And let's see how we can make those things really, really reliable. Bye.

Vaibhav Gupta [00:00:26]: Bye.

Ben Epstein [00:00:26]: Welcome. Thanks for coming on.

Vaibhav Gupta [00:00:28]: Thanks for having me, Ben. It's always fun to be here.

Ben Epstein [00:00:33]: Yeah, I imagine. I know what you're going to talk about, but I won't ruin the suspense. Why don't you give like a quick background and then we'll jump into the talk.

Vaibhav Gupta [00:00:41]: Perfect. So I'm Vaibhav. I'm one of the co founders over at Boundary and we did something incredibly dumb. We made a new.

Ben Epstein [00:00:53]: Oh no, we lost Five Bob.

Vaibhav Gupta [00:00:58]: Oh.

Ben Epstein [00:01:00]: Did everyone else lose him too? We're having some technical difficulties at one moment. Okay, let's see.

Vaibhav Gupta [00:01:31]: All right. Sorry about that. For some reason, that was like the.

Ben Epstein [00:01:35]: Perfect time to cut off. You said we started, we did something really stupid and then you dropped.

Vaibhav Gupta [00:01:40]: Oh, well, I'll. I'll start off on that. What we did is we made a programming language and really what we tried to do was address this problem of how do we actually ship agents. These LLMs have now been out in the hands of regular developers for over a year and a half now. And yet I think most of the apps that we've seen have been underwhelming compared to what expectations we had from the sentiment of magic we felt when we first played around with ChatGPT, at least for myself. And we have some thoughts as to why and I'd love to share them today and then talk about some approaches that we've seen be really successful at actually taking these things from demos to real systems that are actually reliable and usable in many different ways. And that's the conversation I want to have today.

Ben Epstein [00:02:28]: That's awesome. If you go ahead and reshare your screen, I'll let you jump into code talk, prez, whatever you have while you're doing that, I will definitely call out that. Like I said before, avid user of BAML has changed the way I write code. And so, yes, whenever you're ready to.

Vaibhav Gupta [00:02:45]: Get everyone in context. I remember when I first wrote web systems, and if a Web system failed 5% of the time, my company would stop every single person's actions and go fix that right away. And yet somehow, if we change this one word, our perspective on this whole thing changes and we all suddenly are Just satisfied with these results. And the take that we have is this is not okay. It says this is fine, but this is not fine. And we really shouldn't be designing systems that are okay with 5% failures. And writing code for these kinds of systems is very, very hard. And the reason it ends up being very hard is because of this concept of fault tolerance.

Vaibhav Gupta [00:03:29]: Before, very few parts of our system had to be fault tolerant. AWS has to be fault tolerant. S3 has to be fault tolerant. But my app, I just assumed AWS is going to be up. I never built around it not being up. And that's really what needs to change in the world of AI in order to build AI applications. We need to build this concept of fault tolerance in not just this core infrastructure layer, but in every way we write code. Every part of the system can now suddenly fail.

Vaibhav Gupta [00:03:58]: And that's just a really hard thing to go do. It's similar to how we used to write websites when we first wrote websites. I'm assuming some of you have probably written websites that look like this. It's all strings. And now none of us write websites like this. We all write websites using React. And React does a few magical things for us that we don't think about. If I forget this closing slash, it's able to put squigglies under here and prevent me from merging my code in.

Vaibhav Gupta [00:04:29]: If I forget my slash here, all that happens is I merge my code and it breaks the entire system and my website breaks. It's able to use this magical system of use state so that if I call this function, React somehow knows that I can re render only the button object and nothing else. It's able to optimize that for me. Then the old system really had a couple problems. One, we had a bunch of strings and strings as we discussed, totally useless. We had state management that was really hard to do. We had to use $sign and jQuery somehow to do all the hard things. Dynamic components I remember back in 2015 or 2014 when I was interviewing for software engineering roles for the first time, an infinitely scrolling newsfeed was a hard interview problem.

Vaibhav Gupta [00:05:18]: And now anyone can build it with React and that makes it a lot easier. Reusable components never existed and that's why websites look like geocities. And I think the part that most people underestimate what React really did for us is it changed the iteration loop. If I went back to my website and I had to change the string, my back end had to reload. If some I had to re Navigate to the page that actually had that error. And I hope that the state was preserved in the exact same way with React. You just press control s on your TypeScript React TSX file and then you just get that specific component automatically rerendering. It went from a task of minutes to seconds.

Vaibhav Gupta [00:06:01]: And really what I classify this whole system at is before we used to have low engineering rigor. And that's why front end wasn't called really front end engineering, it was front end work. HTML wasn't a real language, but React made front end engineering actually engineering. It brought the engineering concepts that we needed to to build reliable, complex websites. And that's exactly why Facebook made React in the first place. But really what it did is it allowed developers to not have to think about certain things. You never have HTML broken tags ever again. You don't think about state.

Vaibhav Gupta [00:06:36]: The complexity React has to do to do this. Use state, complex use state concept is something that most developers never think about. And that's the beauty of what a framework or a compiler can really do for you. Now, when it comes to agents, why does this matter? I'm sure many of you have probably written agents that look like this. This code is freaking ugly. That's my take at least. It's just unreadable, it's unmaintainable. And when I think about a system, this looks like code that I would throw away the minute I'm done with it.

Vaibhav Gupta [00:07:14]: Really, it has the exact same problems that we had before. The reason it's throwaway code is it's a bunch of strings. Managing context in any dynamics system is too hard. With this approach, everything has downstream impact in a way that's very hard to reflect. If my data model suddenly changes to make location a city state pair, I have to somehow remember to change this magical string and change my data model at the same time. And then switching away from OpenAI becomes hard because I have to build abstraction around that. And again, most importantly, the iteration loop takes way too long. It's all about low engineering rigor.

Vaibhav Gupta [00:07:56]: What we really want is we want to be able to use the beauty of these models. The beauty of these models is that they are expressive. I can put anything into them. But we want to compose them in complex ways and that requires the structure of code. How do we balance these two? How do we live on the seesaw? We've probably all seen some frameworks out here. Everyone seems to release a new framework every day. OpenAI came out with one again. There's this famous blog post by Hamel Hussain, which is titled fu.

Vaibhav Gupta [00:08:23]: Show me the prompt. You can Google it. But I think the whole point of this system is you cannot use an LLM if you don't know the prompt you're feeding it. That's similar to saying how could react really have worked if I couldn't actually inspect the HTML it generated in the browser? I need to see the HTML because eventually if things go wrong, I need the debug ability. The prompt is the most useful part of the system and that is the real magic that we have to make our applications do what we want them to do. Let me show you a few things that I think are useful on how we think about this and what happens when you build a system that can do this. So here I have this visa. I'm going to do something kind of magical here which is I'm going to have this VISA form define the schema in baml.

Vaibhav Gupta [00:09:12]: So our made up programming language on the fly. So it's going to go ahead and generate some BAML code that describes the schema. Then I'm going to run that code and you'll notice that I'm able to get results out of this pretty reliably. In fact, it does a few things. It actually makes a data model with the right shapes for me, but it's able to turn these booleans into these checkboxes into booleans. I can actually do this on real data too. If I take a webcam and I just run it on the fly of some handwritten text here I'll just run it and I'll see what it says. The reason I can run this on the fly so reliably is because what ends up happening is it's actually going to go ahead and just like take out the code from here.

Vaibhav Gupta [00:10:04]: That's actually the text on here. The reason I can go do this is because LLMs in our opinion are really, really good. Even GPT3.5 is really good. The hard part is how do you make it good for today's system. Similarly, a recipe website that I put together. We've all seen ChatGPT generate websites, but this doesn't feel like ChatGPT even while it's loading. I can even play around with it. And building these dynamic UIs I think is what's going to make these alums a lot more powerful.

Vaibhav Gupta [00:10:36]: Using them as a primitive rather than as the end all be all. Now I'm able to not only do that, but I'm able to show loading spinner as it's going on. Exactly what it's working on. And all this is coming from one prompt. If you build a Q and a system with rag, instead of just doing simple Q and A, the user is now seeing something where it says, I found one citation, I found two citations, and I'll just keep on going on. And I found three citations, and eventually I'll start answering. And when it answers, I can even build a deep link into exactly where it's supposed to go, and I can deep link into it. Now, how does this all work and why did we go around this route? The way that this ends up working is we build a system called baml where you define all your prompts as functions.

Vaibhav Gupta [00:11:25]: Functions take in parameters like every other piece of code, and they return data models. Now, unlike every language where functions are defined with if statements and for loops, functions in BAML are executed using an LLM. So in order to do that, you tell us what LLM you want to use, and you can use any alum you want, and then you write your prompt. Now, this is where BAML starts to diverge from a lot of other tools. The first thing that we do is we actually give you a live preview of your prompt. You're able to see exactly what your prompt is, including test cases in real time. I change something in my prompt, it shows up right there as I'm typing it out. I change something in my test case, it shows up right there.

Vaibhav Gupta [00:12:06]: And we're even able to highlight a test case from you to differentiate it from the rest of your content. That's baseline in the prompt. Now, when you build tooling like this, you can start seeing token visualizations and see exactly what tokens you're feeding into the model. Why does this matter? Because check this token out right here. Adding a few more dashes fundamentally makes this two tokens instead of one. I may actually prefer this to be one token. I may prefer this to be this, which would change the tokenizer yet again. And just having that context is useful as a developer on what I'm doing.

Vaibhav Gupta [00:12:39]: And then I think most importantly is the fact that you can actually see the raw web requests that the framework makes. As we said, the primitives are, we're calling an LLM. So if we call a different LLM in this case anthropic, you should be able to see exactly what the model is doing. If you call Gemini Pro, you should be able to see that it changes to the Google model. If you decide to do something more complex, like call a sequence of models, you should be able to see exactly what happens no matter what part of the sequence you're in, whether you're in haiku or a GPT4L mini, because everything is described in your code in a way that allows you to go edit and see exactly what's going on. It's all programmatic in a way that tooling is designed to go support. But I think the part that ends up really helping the most is that you can actually run test cases. I wrote my test case and I'm actually able to do chain of thought and reasoning in my test case itself because my prompt says before answering, list three incredible achievements of this person.

Vaibhav Gupta [00:13:39]: It will go ahead and outline the data and BAMLD starts this magic of pulling out the data right over here. The model responded with this and BAM will pull that. All the data models as described by your code or resume is a name. It has an email, it has an experience array. And then as we go down this, you'll notice that this actually happens while streaming. So while streaming it will eventually start populating each field as it comes in. You don't have to think about it now. There's something subtle here that some of you may have skipped on.

Vaibhav Gupta [00:14:10]: The content being outputted here is not valid JSON. There's no quotation marks on our name. We're able to do this transformation in real time without using any LLMs, and we can do it for all sorts of data. This has a few unintended consequences, but the main ones are things just get cheaper and faster because now you don't need tokens for each of those quotation marks. Now, one thing I think a lot of people deal with when they deal with LLM pipelines is hallucinations. What if I don't want Gmail emails for any email? There's two ways I could approach this. I could try adding a description that says no Gmail emails. And maybe this will work.

Vaibhav Gupta [00:14:50]: I don't know. We'll try. But now it just hallucinated this email to me for some reason that's not what I wanted. How do I prevent that? Instead I can use a different type. I can use a string which is guaranteed to not contain a Gmail email.in. what does this mean? The problem with LLM demos there sometimes lag. Right over here. Dumped [email protected] email.

Vaibhav Gupta [00:15:15]: But you notice that we pulled it out because this string is not allowed to have a gmail.com email. So we gave you a null value instead. It allows you to tie the beauty of an LLM with the beauty of algorithms In a way that feels really ergonomic and easy to go, right? But at some point you'll want to take this code and actually use in your Python code. How would you go about and do that? You just do this. You do from Bamo client import B B dot and we'll just write this off from scratch. B dot extractresume. You pass in a string and you notice that it takes in a string object, resume email resume name is going to be a string, as we'd expect. Now if we decide to stream, we just write B extractresume.

Vaibhav Gupta [00:16:06]: And now during streaming, name is not guaranteed to be a string because the name may not be present yet. When we're done, stream equals await, awaitstream, getfinalresponse. And now when we're done, name is guaranteed to be completely ready. But on the other hand, what if I change name? What if I change name instead of being a name to be a name of first and last class? Well, my Python code will reflect that immediately during streaming. It will reflect that immediately. Similarly, over here, now I can do all sorts of expressive things. I can say things like, even during streaming, this has to be completed, not null. It's not allowed to be null when I'm streaming.

Vaibhav Gupta [00:16:54]: And now even my type system will actually guarantee that I will never get a none for name while streaming. It's this binding of LLM logic with Python code, with the static type checker, that allows us to be that flexible. But not only can you do this in Python, you can actually do this in any programming language of your Choice. So in TypeScript, the system resume name is going to be a name object. I change this back to string. The same BAMO code can feed multiple Python, TypeScript, Ruby, Java code, whatever you want. We support almost every programming language as of today already, and this system, when you have systems doing the work instead of people, ends up leading to more reliable software. The whole point of BAML is to get rid of prompting and add more engineering.

Vaibhav Gupta [00:17:47]: And I think that allows us to build reliable and maintainable pipelines. Because that's the thing I think isn't talked about enough today. Anyone can go build an app on v0 and put a demo together. How do you build an app that doesn't have to be thrown away and still is going to work six months from now where people can add to it or AI models can add to it. It needs to be good code, clean code has a lot of value. That's really it. About Bamo, I think I'll answer a few questions as they were popping up and I think, Ben, if you have anything else you want to hop on and chat about, I'd love to go and just chat a little bit.

Ben Epstein [00:18:21]: Yeah. Yeah, that was awesome. Thank you. I always, I mean, I've listened to a variant of this talk so many times and I love it every. I mean, you keep adding new stuff, but I love it every single time.

Vaibhav Gupta [00:18:31]: The.

Ben Epstein [00:18:32]: There's a couple of things. The first one, the point on type safety, the fact that the output types are guaranteed is amazing, and it changes the way that you write code with LLMs, but it's also only a piece of it. Like, I use bamboo in production. One of the things that I was doing recently is that my model was like, you and I have talked about this a bunch, but, like, partially filling JSON types. Like, I have a model, like a. Like a pedantic class, and I want to fill it, but it gets filled over time. And so I know that you guys are working on that, but I have been also working on it myself in some ways. And so my function takes in like a JSON object and so I hit attributes and I have a bunch of ginger templating conditionals in it and I made a change to it and I pushed it up and my CI test failed.

Ben Epstein [00:19:18]: And it failed because the bamo function, which then generates the Python function, the type changed and I called it incorrectly like somewhere else in my code. And I was like, I mean, I should have caught that. It was bad on me. But also, that was crazy. It's crazy that my LLM prompts take type safe pedantic classes that my production code has to abide by. Like, that's a crazy paradigm shift that has just.

Vaibhav Gupta [00:19:47]: I think for me, that really was the first shift in web development that made me respect front end engineering a lot. Because when I was doing. When I was doing like front end code that I like, honestly, I was just like, front end is a joke. It's not real engineering. But then when I finally learned that React and Next JS exactly exist and you can build things that are much more complex with those pieces of software, I was like, holy cow, this is a real art and a real engineering beast to go build out. But when I first started coding in 2012, 2015, I just didn't think it was a real. It was a real part of engineering. It was like a thing I had to do to make my.

Vaibhav Gupta [00:20:25]: To get the final thing out there. But the meat was in the back end. And like now with LLMs, I think a lot of people view LLMs as this. We live in this dichotomy where some of us want LLMs to do everything and we hope that that will happen. And we're waiting for Sam Altman to release GPT 26 or something, and maybe that'll do the magic. But like, in reality, I think we can do almost everything we want today already. I think it's about just composing systems and making a way to have composable systems. Like you said with type safety is that's the way software has been written.

Vaibhav Gupta [00:21:00]: That's why we don't use JavaScript, we use TypeScript. That bridge is. I'm surprised often by how much it's overlooked.

Ben Epstein [00:21:08]: Me too. I mean, when I watch, anytime I see someone post on, like LinkedIn about the ways people work with agents and they talk about agentic systems and these systems going all the way. And I hear, I read so much and hear so much about evals, like how incredible, like how important evals are and how to build evals around your systems. Every time I see one of those, I'm like, you're just not using baml if your fundamental primitive is the type of the thing that the LLM has to return. Evals aren't a problem anymore. They're just unit tests. You're just comparing JSON.

Vaibhav Gupta [00:21:43]: It's such a test case that says, I want this to be true. And you don't use an LLM to do that validation, you just use an assert.

Ben Epstein [00:21:51]: Right? I have LLM, I have PyTest that runs because I'm waiting for BAML's like CI test suite. So I have Bamble that. I have a pytest that runs right now and it extracts and runs all these different things on all my different prompts. And they're just. They take a minute to run because they're just comparing to a bunch of Python and regex outputs that I'm expecting. I could talk to you forever, but we have a couple of questions. Oh, go for it.

Vaibhav Gupta [00:22:14]: Yeah, I was gonna say there's like a couple questions that I'm gonna try and answer really fast.

Ben Epstein [00:22:18]: Are you in the. Okay, great. Go for it.

Vaibhav Gupta [00:22:20]: I can see the questions. One of them is, how do we get OCR to work really well. Well, it turns out LLMs are one of the best OCR tools in the world. Anyone that's using an OCR model today is almost definitely doing it wrong. I think you should just use LLMs. Now, the trick with LLMs is how do you get to actually work with OCR that turns out to be quite hard because LLMs are just not going to produce data correctly all the time. For example, if you're processing bank statements, they don't always add dollar signs in front of every value. Now you have to build a system that can go ahead and say, detect if this image has a dollar sign on every value or only on the very first value.

Vaibhav Gupta [00:23:02]: When you go do that, now you write two separate prompts, One optimized for dollar signs existing everywhere, one optimized for dollar signs only existing on the first value. And now you can build a slightly more reliable pipeline by doing that. Just breaking down the problem into really small composable pieces and then using an LLM to get this working, I think is the answer. And we have people crossing like 100 page plus bank statements out of LLMs with zero sense of error when you build these kinds of pipeline. I think there's a second question about why do we opt in for a DSL as opposed to language native APIs? Well, one nice thing about Bam L is it's kind of like a DSL in the way that JSON is a DSL. JSON is definitely a DSL, but it's compatible with every language and there's a native plugin into every single language that makes it possible to not feel like a dsl. That's what we do with BAML as well. Yes, you write this tooling over here.

Vaibhav Gupta [00:24:00]: And writing the DSL allows us to build all sorts of tooling like the Playground and some new tooling that we'll announce hopefully in another two weeks, which is really getting me excited. But when you use it, it just feels like Python. These are just regular Python functions under the hood that just call a bunch of Rust code. The entire system is built in Rust. And now what we can do is we can build connections to every single language that work exactly the same. So instead of a library that's implemented seven different times for each language and breaks and has a whole bunch of dependencies, BAML comes with one dependency, bam. That's it. Then you use it in Python and you use it to do whatever you want.

Vaibhav Gupta [00:24:39]: Use it in Typescript, you use it to do whatever you want. And it will work exactly the same in each of those languages with the guarantee that does that. And this is actually why some larger companies have started using us, because we're one of the few people that actually support Java. We support Go. And it's kind of sad that everything that is built in AI has to be built in Python. I just don't like that world. Not that I hate Python, I just think every language has its merits and why should we restrict AI to just Python?

Ben Epstein [00:25:10]: There's another piece there that I think is worth calling out for Joe. If you look at the threads in the forums on Pydantic AI for example, which is a great tool, it's built really well, the creators are awesome, but if you ask them how are you handling when the model doesn't return perfect JSON, the response from not just them, but the community is just retry it, like hit it again with a different temperature and be like well, better respond. And like, I mean being language agnostic is awesome but to me that's, that's the real reason. It's like, because BAM will guarantee, like, it actually guarantees that you're gonna get stuff back that the other. No, literally no other framework can do, which is, I mean it's the reason I use it in production. Like I have been using Bamboo in production for months, even way back when it was early. And I think I've had one failure since I started.

Vaibhav Gupta [00:26:02]: Yeah. And I think I'll give an idea as to why this ends up leading to better results. So we have this magical parser. We don't use LLM. You can see here clearly, no spaces, no nothing sensible can be made out of this. And yet we produce the data here. And we don't use an LLM to do this, we just use good old fashioned algorithms. But I have this prompt here.

Vaibhav Gupta [00:26:19]: This prompt is going to ask me to generate like. Where'd it go? This prompt is going to ask me to generate like an implement binary search tree out of an LLM. I'm going to go run this test case and when I go run this test case one, the model can do reasoning, so it can do something slightly better. But you'll notice that the way it outputs the code is actually in markdown format, but the way you get it out is in JSON. Why is this nice? Because I want you to all to just imagine writing code like this class. What is this class node? I would. What is this? How does syntax highlight this as JSON? I don't know. I don't know how to syntax.

Vaibhav Gupta [00:27:08]: Oh, there we go. JSON. I could not write this code if my life depended on it. It's not possible. Just like writing code on a whiteboard is really hard. Syntax highlighting helps us, formatting helps us, but when we do structured outputs, we suddenly are asking the model to generate code like this. The model also can't generate code like this. And I Just don't believe JSON is the best serialization format for LLMs.

Vaibhav Gupta [00:27:31]: It's great for parsability, but it's not great for interpretability. So we say is, why don't we just let the model do what it does best, dump out whatever the heck it wants. We run some good old fashioned algorithms and take this data and pull out everything that you want out of it so you never have to think about it. The model can just write things naturally. In this case, it actually added a bunch of new lines and stuff as well. We just add that in to your code so you don't have to think, we just parse it. It doesn't have to be JSON compliant, it just has to be semi readable.

Ben Epstein [00:28:04]: Yeah, it's awesome. A couple of the classes that I have in the functions that I have, actually I ask the LLM to think like thinking tags because some of them are like, I mean some of them are easy and some of them are harder. And having IT think does a better job. And so it says thinking and then it does tick, tick, tick. JSON spits out some nonsense JSON. The thinking is thrown away, the backticks are thrown away and I just get back my, my, my PI dancing base model like. And that's crazy that, that, that is, that's to me at least obviously the way to be building LM systems. And just to tie this back into like every other talk that people have been giving, the fundamental primitive here, what BAMO forces you into, just like all the other talks we've been hearing, is that you just have to define your problem first.

Ben Epstein [00:28:47]: Define your problem as a set of output types like that is always or not always. But it's so frequently how you can boil down your complex problems, even agentic workflows. Bamboo forces you to do that anyway. And then once you have it now you have a system that's just reliable and testable.

Vaibhav Gupta [00:29:03]: Yeah, exactly. And then I think one last thing to note is if you do have existing prompts, just bring them into here. Because learning baml, I get it, it's a new dsl. Sounds kind of intimidating. I don't think it's that hard to learn. But in case you do, you just ask this chatbot how you do something in baml and it will just, and this is obviously an agent built purely in BAM L and it'll actually generate a full system for you that will actually give you all the BAML code and the code to run in Python to show you how it'll work. And it has the playground that we have supporting. And what if you're in Typescript? I need Typescript.

Vaibhav Gupta [00:29:40]: You just tell it that and I'll just show you how to go use this code in Typescript as well instead of Python. And it's actually able to understand this for you and just help you go navigate to sense pulls from our docs and everything. So you can build fairly complex AI pipelines in baml, but there are things we don't do such as like the agentic loop. And that's really because to build a really good pipeline you need to own the agentic loop. And we can talk about that more later on.

Ben Epstein [00:30:10]: Yeah, actually is actually coming on the mlops community show in a couple of weeks. Maybe he'll talk about that. That'd be really cool. I have to cut you off. Anybody who's interested just join the discord. I'm there, I'm chatting all the time. I'm asking questions. They're really.

Ben Epstein [00:30:26]: It's a fun group. People are really building awesome stuff. Bye.

Vaibhav Gupta [00:30:30]: Bye.

Ben Epstein [00:30:30]: Thanks so much for coming on. This was.

Vaibhav Gupta [00:30:31]: Thanks for having me. Always fun.

Ben Epstein [00:30:33]: See you later.

+ Read More

Sign in or Join the community

Like

Comments (0)

Popular

Watch More

From Research to Production: Fine-Tuning & Aligning LLMs // Philipp Schmid // AI in Production

Posted Feb 25, 2024 | Views 1.3K

# LLM

# Fine-tuning LLMs

# dpo

# Evaluation

Finetuning Open-Source LLMs // LLMs in Production Conference 3 Keynote 1

Posted Oct 09, 2023 | Views 7.7K

# Finetuning

# Open-Source

# LLMs in Production

# Lightning AI

How to Systematically Test and Evaluate Your LLMs Apps

Posted Oct 18, 2024 | Views 15.1K

# LLMs

# Engineering best practices

# Comet ML