MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Getting AI Apps Past the Demo

Posted May 30, 2025 | Views 4
# Programming Language
# LLM
# BAML
Share

speakers

avatar
Vaibhav Gupta
CEO @ Boundary ML

Vaibhav is a software engineer with over 9 years of experience productizing research. At Microsoft, he worked on realtime 3D reconstruction for HoloLens. At Google, he led performance optimization on ARCore and Face ID. Now he's bringing that same experience to help bring better quality and speed to Generative AI technology.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

It's been two years, and we still seem to see AI disproportionately more in demos than production features. Why? And how can we apply engineering practices we've all learned in the past decades to our advantage here?

+ Read More

TRANSCRIPT

Vaibhav Gupta [00:00:00]: Like full expressive systems like if statements for loops while loops where you can conditionally call an agent abort to a human worker human annotation system come back to it. That kind of system.

Demetrios [00:00:13]: Wow.

Vaibhav Gupta [00:00:14]: Yeah. Hey everyone. I'm Vybob. I'm one of the co creators of baml. I work for a company called Boundary as the founder and we do not really drink coffee. Actually no one on the team drinks coffee out here. I mostly drink water and every now and then I will have some orange juice in the morning.

Demetrios [00:00:37]: All right. Dude, what the fuck is Bammel and why do I keep hearing all about it?

Vaibhav Gupta [00:00:45]: What is Bammel? I mean Bammel's. I mean what it tangibly is programming language. What it really is is a crazy idea. Me and my co founder started out with about two years ago, about a year and a half ago now that just started with the idea that we can't possibly write prompts using LangChain. Like that just feels wrong ethically.

Demetrios [00:01:10]: Because you were doing it a lot.

Vaibhav Gupta [00:01:12]: No, I just. Because every single piece of code that I saw was ugly. And like I think so many people I talk to are just like they talk about. We talk to everyone. And like at least everyone I've talked to has always said that AI pipeline code feels like a demo code. It's like you throw it away and like with this vibe coding thing that's going on, everybody just like vibe code it and then ship it. And like that's. That works.

Vaibhav Gupta [00:01:35]: If you're starting from zero and for front end engineering where everything is strongly componentized, that works really well. But if you're building backend that has like 15 intricacies with software you wrote five years ago. It's not that it's bad, it's just like I wouldn't let a junior engineer or new hire submit code without reading it carefully or like talking about it with them. And people just don't have that natural tendency when they vibe code. It's not like. So I think if you add pull request, add process, you get to that. But anyway, the point is every single piece of prompt injury code I've seen, even before vibe coding felt like it was vibe coded. It looked like it was vibe coded.

Demetrios [00:02:16]: It was a phenomenon before it was a name.

Vaibhav Gupta [00:02:19]: Yeah. And like I see, I think it's ugly code is one of those things that I think not a lot of people put a lot of thought into. But there's a reason that like every single company in the world has a CICD system which includes a linter. Right? Because like it doesn't even matter if your linter is right or bad. All that matters is you as a company have a process and an opinionated system of how you write code, because that helps you build pipelines that anyone can hop into from a different company that just joined you and iterate on. And whether it's an AI agent writing that code or whether it's a human, you need that. And AI pipelines just don't seem to have that level of rigor. They just feel like glob put together.

Vaibhav Gupta [00:03:03]: And that felt wrong. And that's kind of what inspired Bam. Link.

Demetrios [00:03:07]: Okay, so you saw a bunch of LangChain examples. You thought, man, we can't lane chain prompt our way to a better future. What besides the linting was it the linting and ugly code, the vibe coding ness about it? What was it that made you think that?

Vaibhav Gupta [00:03:30]: Take like web development for example. When I first started writing websites like this was like 2012, 202012 kind of time. React didn't really exist. Typescript didn't really exist. The way you wrote a website is you, your backend returned a string and like you'd put it in there and that's what you shipped. And that was not bad. But there's a reason we all went away from that to writing React. Like React does a lot of implicit stuff for us automatically that I thought was really helpful.

Vaibhav Gupta [00:04:01]: Like one, if you forget a closing tag, like instead of button, you put slash button at the end and you forgot to put slash button, it yells at you, you can't even push code. But if you're returning it from your backend as a string, you can easily forget to put the slash button. And now your whole website breaks building. I remember when I was first applying for internships back then an infinite scrolling news feed was a hard interview problem, right? And like it was because like interactivity wasn't a thing that was built in. You had to do all this code and somehow merge jquery with like a dollar, a bunch of dollar signs in your code with your actual HTML and like it just looked bad. And then eventually React came around and said, what if we just pulled the syntax of HTML, pull syntax of CSS into your JavaScript and made it native. So now not only did it get prettier because it looked prettier, because it's syntax highlighted button as blue or in whatever color you use based on your theme differently from the actual content of the button, but it gave you command click, it gave you compile time checking, it gave you dom Re rendering, like, the syntax that React exposed allowed it to do a lot of novel things. So we kind of have the same approach.

Vaibhav Gupta [00:05:23]: If you're going to write a prompt, you probably don't want to write a bunch of, like, English all through your code base. That is not verifiable in any way. So what if we could verify the English? What if we could turn the English into something that looked a little bit more like code, but preserved the flexibility of English? So we do a lot of error corrections for you under the hood. We do retries and other things minimally as possible. They effectively enter your control. And most importantly, we can build tooling for you that allows you to do things that are probably best seen and not heard. But if I were to describe it to someone that's just listening, imagine if you could see your prompt before you send it out to the model. Like a markdown preview.

Vaibhav Gupta [00:06:11]: No one writes markdown files without actually looking at what it looks like. No one ships a website without opening the browser and looking at what it looks like. Why is it that today, in every single framework that exists, the only way to see the prompt is to run your code and somehow monitor the web activity that's actually being sent out? Because there's like, 50 layers of abstractions under your prompt. So, like, we just ripped that out and said, what if you could just see your prompt live in real time in VS code or cursor while you coded? Small tooling changes like that have dramatically helped iteration loops.

Demetrios [00:06:46]: No, you said something there that I would love to understand better, which is verifiable English. What the hell does that mean?

Vaibhav Gupta [00:06:57]: Verifiable English? Yeah. What does that mean? That's a great question. Today, everyone treats LLMs like this magical box that somehow gets the answer right. And if it doesn't get the answer right, it's your fault. We kind of like to take a step back and just think of the LLM as just a really, really damn good calculator. So what is a calculator? A calculator takes two numbers and an operator like. And then based on the operator, it transforms those two numbers into a new number, right? You +Sign, it adds them, you multiply sign, it multiplies them. Pretty simple.

Vaibhav Gupta [00:07:31]: Now, you might use different calculators for different jobs. Um, like you might use an abacus if you're in the 1600s or, like, 1200s. And this is pretty efficient, huh?

Demetrios [00:07:42]: Or Montessori or.

Vaibhav Gupta [00:07:44]: Or a few other things. Exactly. And then. But you might use a scientific calculator today or you might use Matplot, matlab for something really numerically complex. But they're all just calculators under the hood. It's just a different calculator, has different trade offs. So what if we treat an LLM like a calculator, you give it some parameters of whatever your data ends up being and you just say, I want the LLM to guarantee it spits out something specific. So if I'm looking for, let's say I'm building like a doctor patient transcript, you want to have some audit control before the doctor gave medicine or anything that they have.

Vaibhav Gupta [00:08:21]: And all I have is an audio transcript. But really what that, what that looks like is I want a calculator that can take in an audio file or a text of the transcript and spit back out a list of questions the doctor asked about medical, about medicine, which medicine it was, and whether or not the patient confirmed it. That's a calculator. Now, I can implement that calculator using a really powerful device, which is an LLM. And based on the LLM I use, I'll get different levels of precision. If I use a small model, I'll get good precision. If I use a big model, I'll get much better precision. But it's just a calculator.

Vaibhav Gupta [00:09:03]: And now the prompt I use kind of is the operator that goes into the calculator. So it's like the plus sign. And if we view it in that way, then the English in that prompt becomes completely verifiable because all I have to verify is, is that the English I put in will somehow produce the data model that I want out. Okay, so what this I'm, I'm gonna show something because I think it's gonna be easier, and then we can talk through it to the people that are perhaps able to see it. So this is like a 15 page notion doc that we did for like a four hour workshop we ran with a bunch of YC founders. So I think when we talk about code, it's sometimes easier to show code rather than talk about it orally. So I'll show a couple snippets of what I mean by leading to verifiable English.

Demetrios [00:09:53]: Nice.

Vaibhav Gupta [00:09:54]: So in this case, what I have is I have a function called classifymessage. And classify message takes in a message, which is a string. So it could come in from a review on Amazon, a tweet, or really anything that I really have, a text from my girlfriend, it doesn't matter. And all I want to know is a sentiment of it as one of these three things. So I want to build a calculator that can, no matter what the message is, will return one of either positive, neutral or negative. Now, the calculator of my choice here is going to be OpenAI GPT4O, but I could use a llama model, I could use a anthropic model, Gemini, it doesn't really matter. And then like that operator that I was talking about becomes this prompt. So now I'm able to verify the English in this prompt very easily by doing a couple of things.

Vaibhav Gupta [00:10:41]: Because this is in plain English, all I have to do is I can say, is this English going to lead this message, doing the right thing for a couple of different test cases. So if my girlfriend sends me a text that says, I am incredibly upset with you, it should return negative. This calculator should calculate to negative. And when we view things like this, it's. It becomes much more easy, much simpler to compose them into complex systems and know exactly what the contract I've built into the system is, which leads to code that is now usable by static analyzers, code that is like, detectable by AI as like understanding when things are going wrong, and most important, a strong iteration loop. Because if, for whatever reason, that message returns back as neutral, I know that there's two things that could be wrong. It's either I need a different calculator. I need to either maybe upgrade the model to like Gemini's 2.5 Turbo, or I need to go and up, or I need to go and update my prompt and change the operate operator that this calculator is using.

Vaibhav Gupta [00:11:42]: Does that make sense?

Demetrios [00:11:44]: Yeah, yeah, I see that. And now why can't you do that today on with the tools that we have in languages that are fairly common, like Python.

Vaibhav Gupta [00:11:57]: Yeah. Well, I think there's two parts to this. One, not every piece of software is Python. So when you go do this in Python, that means my SQL database, which is written in C, can't support AI pipelines now, except by spinning up a Python microservice. That just seems wrong. All the code in the world that exists in Java there will eventually want to use LLM. Swell. Because they're again, really powerful calculators and every computer system, every software application will want to take advantage of that calculator.

Vaibhav Gupta [00:12:31]: So that's like reason number one, like we should be able to support all these languages, but then you get into really simple things like just like what, what would this code even look like in Python? So one thing you'll, that you'll notice and what I'm showing you over here is that this code has a string which is completely indented. And maybe it's easier to see with like how it actually looks in static analysis. So you're actually able to highlight this in a very easy way, just like in React. So I'm looking at VS code right now and I'm showing the same code that I had over there just written out with the syntax highlighting that we offer. And the first thing that you'll notice is if I were to do this in Python, then my code will end up actually looking something more like this. So. And this is like a really, really small knit that ends up happening.

Demetrios [00:13:23]: Yeah, but it goes back to what you were saying earlier on, the ugliness of the code and almost this readability factor.

Vaibhav Gupta [00:13:33]: Exactly. So like, if I have a prompt now, every single prompt which is a long string is going to somehow be dedented all the way back to line, line zero. So if it's four layers deep in an if statement, suddenly your statement will be pushed all the way to the left of your code. And when you scroll, you won't know when the if statement exists or doesn't exist simply because you have a long string. And most code bases aren't designed to have globally available long strings across them. And I think that's something that people don't really think about is like, when I have a lot of constant strings, how have we typically used a lot of string variables in our code base? We either load them from a database or we have them like as very short strings that we put in some constants PY file that we then load in. But now literally my code, my business logic is embedded into these strings and that needs to be co located in the area that they exist in. And even worse, I build these strings in really dynamic ways.

Vaibhav Gupta [00:14:36]: Sometimes I add if statements to conditionally add statements into my string. Sometimes I add for loops to dynamically build them. Sometimes I want to like, I have to change what string I have because the model is Gemini 50% of the time and OpenAI the other 50% of the time. That is not what strings are designed for. And like, that's, that's kind of like. And strings aren't designed for that in any programming language. So how do you have a string that can be both extremely dynamic and extremely flexible? The only language that gets close to this is React.

Demetrios [00:15:16]: Okay, well it does feel like strings in that regard are, they're for a different paradigm. You never would really use strings to control the computer or to, to control the outcome. But, but now with the advent of LLMs, you are using strings as a control mechanism. And so it is part of the code. And so I think if I'm understanding your philosophy, you're saying more that the strings should be treated like a first class citizen in the way that we are treating other pieces of code like that.

Vaibhav Gupta [00:15:57]: Exactly. And they should be treated very like, cautiously, almost like you need caution around your strings. If I'm like Delta Airlines and I have a bunch of AI chatbots that I'm building on, not just one, how do I guarantee that an intern isn't going to come in and accidentally forget to add that you work for Delta Airlines as a first part of any system message or in chatbot. Yeah, that is, that's not a thing we want to leave up to people. That's the thing we want a process to go do. And there's no mechanism in any language today to statically analyze a string in that way.

Demetrios [00:16:38]: And so you're saying, hey, this kind of process is added in natively with the programming language. It's something that only a programming language can do. Because I think that's the other question it goes back to this whole idea is why can't we do this with the tools we have today and the languages that are already out there? Right.

Vaibhav Gupta [00:16:59]: So we could do it in python. Like what you could do is you could make a special class called a prompt string. And now everywhere you use strings, you have to actually use a prompt string, not a string. But even if you do that, you can't dedentor strings now. And you rely on developers everywhere in your company to use prompt string instead of string. And let's remember, developers are really lazy. Well, I'm lazy, so other people are, maybe they don't know, but I'm just lazy. Even if I knew, I wouldn't do it because I don't want to type in extra work, I don't have to.

Vaibhav Gupta [00:17:32]: And.

Demetrios [00:17:33]: And then you're exposing yourself to risk. Yeah, I can see that.

Vaibhav Gupta [00:17:35]: Exactly. And I think the whole caveat of this whole system is we have to, we have to decide where we want the burden to be. Do we want the burden to be on developers or do we want it to be on a process? And it's fine to put the burden on developers sometimes, but I think tangibly what I think the best thing to do if an action becomes frequent, then you put the burden on the process. And because these are such powerful calculators, it's really powerful to go put it into the process rather than a developer. And then the other big caveat that we really say is like, like I mentioned earlier, we want every single language to go support these calculators, these LLMs. So I haven't seen a good framework in Java yet, I haven't seen a good framework in go yet. And OpenAI themselves barely support those languages. And that's that.

Vaibhav Gupta [00:18:32]: That says a couple things. Either they strongly believe that Python and TypeScript are the only feature that exists here and that, you know what, screw any of the other ones, or they believe that eventually someone will deal with it and we don't have to think about it. But I, I kind of have the belief that everyone should be able to take advantage of these systems. There are some, like Kubernetes is built not in Python. Some of the most foundational systems that we have are not built in Python. And it feels like a pity to me if those systems will be the last ones to be able to take advantage of AI pipelines because they're the most useful.

Demetrios [00:19:09]: How does BAML then sync up with all these? How does it interplay with all the different languages?

Vaibhav Gupta [00:19:15]: So BAML's philosophy as a language is we do some things as a language, but then what we do is we do code generation into every other language of your choice. So you write your code in BAML and then we do CodeGen to give you a Python function. So if you wrote a function called classifyMessage in BAML, we would make an identical function called defclassifymessage in Python that would actually call the BAML code under the hood. And the BAML code is actually written in Rust. So it's extremely fast and it runs on every system. So for those people that work in Python, it's like numpy. Numpy is not written in Python, it's written in C. And that's because to do really good numerical processing, you can't let Python do it.

Vaibhav Gupta [00:20:00]: You write it in C, make it way faster and expose it to Python via Python interface. That's what we do. We take the BAMO code and we expose it to you via a type safe autocompletable interface in every language of your choice. So it feels like you wrote in Python. It just happens that you happen to do it in BAM L. So did.

Demetrios [00:20:22]: I understand that correctly? Because it's basically you're writing in your language of choice. Yes, but you're writing bam' L code and it will just translate it to BAM L on the other side.

Vaibhav Gupta [00:20:38]: Exactly. That's literally how that works. So you write. You. You open up BAML file.

Demetrios [00:20:43]: Yeah.

Vaibhav Gupta [00:20:44]: And then you press command S in VS code or cursor and we run a CLI command called baml CLI generate, which will turn the BAML files into PY files or TS files or go files or Java files or whatever the heck you want. So. And then you just use it natively. There's no Internet dependency, it runs locally on your machine. And then we're able to pack in a bunch of algorithms for you automatically. And because every language is taking advantage of the same exact core under the hood, we don't actually have the same type of bugs that a lot of other frameworks have. So most other frameworks, they implement ones in Python, then they're like, oh, my God, I guess people use TypeScript, then they implement re. Implement everything in TypeScript again, and then they kind of forget about it and they keep adding features to Python and people are like, when are we adding this typescript? But because BAML is the same everywhere, we actually support every language with every feature, always by default.

Demetrios [00:21:45]: Wow. Okay. Yeah. That idea reminds me a lot of when you have apps that are specific for either Android or iPhone. It's like, when is this app gonna be. Normally they go after IPH and then it's like, when can I get this app on Android? Right. And it's that same kind of idea, like implement it in Python and then, oh, yeah, we gotta go do this in TypeScript too. Yeah.

Demetrios [00:22:08]: All right, let's see.

Vaibhav Gupta [00:22:09]: Exactly. And then that's kind of why, like, I think React Native got a huge push, and that's why Dart and Flutter and all these other things had a huge push, because you could implement it once and get it for web, get it for Android, get it for iOS all at once, because they did the groundwork so that your team only has to focus on it once.

Demetrios [00:22:27]: Yeah. It's a common pattern and it is, again, going back to the Persona that you're serving. It's going to be much more comfortable if you are just building once. And it can work wherever it needs to work.

Vaibhav Gupta [00:22:41]: Yeah. And if you. If you think about it from like a larger organization standpoint, most large companies use more than one language.

Demetrios [00:22:48]: Yeah.

Vaibhav Gupta [00:22:49]: So now what's nice is we can actually. They use many. Right. So it's just. It naturally happens. So if you live in that world and you're a large company now, it's great because all my LLM code can be shared. All the techniques that one team discovers can be used by another team. Even if they use different languages and they get them for free.

Vaibhav Gupta [00:23:08]: So that cross pollinate like the only thing I. It's like JSON. Why is JSON so powerful? Because JSON, every single language has a JSON.loads, jSON.parse, JSON, JSON.c serialized. It has those methods built into it because they adapted it. JSON is compatible with every language. BAML was designed with the same philosophy. We give you all the tooling that you want for LLMs, but then we build a compat layer with every language so you don't have to actually rediscover all that tooling all the time.

Demetrios [00:23:44]: So you must have tried at least to not build another language. You guys, I thought really hard on how you could do this without building a language because as you said earlier, I'm lazy. And what you are trying to do by building a language is probably like playing the game not on hard mode, but on extremely hard mode.

Vaibhav Gupta [00:24:08]: Yeah, I think this is probably the worst idea. One of the worst ideas for a startup ever is what I would say. Not a lot of companies have been able to do this. There's a few, but very few. And almost no company has done it for business logic ever. But. And I think a lot of startups have this thing where like you could vibe code it and ship something in the hands of your customers in like a day. We just sat in a corner and coded for eight months before we got our first few users even.

Demetrios [00:24:41]: Wow.

Vaibhav Gupta [00:24:42]: Because like how would imagine. I want you to put your shoes in our first few devs who use daml.

Demetrios [00:24:49]: Yeah.

Vaibhav Gupta [00:24:50]: And back then it was called glue, so it was a different thing. That's a funny. We got sued for our name so we have to rebrand it and I think we have a much better name now.

Demetrios [00:24:58]: Yeah.

Vaibhav Gupta [00:24:59]: But the first few devs that used BAML used a compiler that was really flaky, had a bunch of bugs because we wrote in C, not Rust. They had no syntax highlighting, they had no autocomplete. We barely supported Python. We definitely did support any other language. And we had like, our type system wasn't very complete yet. And then we grew from like, from four, I don't mean companies, four developers to five developers took us three months, from five developers to eight took another three months. And then we got 10 like two months later after that and then we started booming. Now we have like, now we probably have like 700 or so companies deploying into production in BAML.

Demetrios [00:25:42]: Nice.

Vaibhav Gupta [00:25:45]: Including some like Fortune 500, which has been really, really surprising.

Demetrios [00:25:49]: So cool.

Vaibhav Gupta [00:25:50]: But I think this is one of the projects and the technologies that, like the. You know those rides at a roller coaster where, like, you have to be this tall to ride.

Demetrios [00:26:00]: Yeah.

Vaibhav Gupta [00:26:00]: A language, as it turns out, you need to be like 6 foot 2 to get in the door. Maybe 6 foot 8 to even have, like, a developer consider you. And like, it was. I don't think I. We understood the dauntingness of the task we took on at the very beginning. Like, we knew it was hard. Like, programming languages, there's not a lot of them. But one thing we underappreciated back then was how easy it was to build something stable.

Vaibhav Gupta [00:26:27]: Like, once you got to a point of stability and generality, the number of use cases that would be unlocked was really, really surprising to us.

Demetrios [00:26:37]: Okay.

Vaibhav Gupta [00:26:38]: And the way that the tooling could evolve. So one of the things that we do is this concept that I was talking about about, like, being able to see the prompt, for example. So typically when people want to run a prompt, like, they usually have to run some CLI command to go run tests. So right now, what I'm showing is I'm showing VS code, and I'm showing our VS code extension. Because we own the entire stack. I can legitimately show you the prompt in real time, and you can watch it edit as I'm typing. So if I type something in here, it pops in right over here. If I change my test case, it pops in and I can highlight my test case differently than my base prompt in the rendering of the prompt.

Vaibhav Gupta [00:27:22]: And that matters a lot, because if I want to quickly glance at this thing, I can easily see where the bug is. I can easily read the thing that's being sent out to the model. I can see the tokens that are actually underlying being sent out to the model. Depending on the model, the tokenizer changes.

Demetrios [00:27:38]: And you just did that by. You were able. And because with that tokenizer, what you're seeing there is each word being highlighted with a different color as our token. Right?

Vaibhav Gupta [00:27:53]: Yeah. Because this is what the model is actually seeing. Right. The model isn't seeing your English. There's. It's seeing the token. So, like, the Strawberry question is a very common question that people used to talk about, like, why does the model not. Why is the model not able to answer the word Strawberry? It's because StrawBerry is like two tokens.

Vaibhav Gupta [00:28:07]: But if you do S space T space R and spell out Strawberry with spaces, it has a really easy time ANSwering how many Rs there are in Strawberry. Because a tokenizer can see the Rs, right? But that is really hard to explain to someone, especially junior dev, if you don't have the right tooling. But here you can just show it as you're debugging and understand what the model is doing. Better with the RAW curl mode that we have under the hood. We all know every framework is going to at some point put a web message together. So if I swap from OpenAI to Anthropic, my, the raw web request itself changes. So we try, we make that like a zero click that you can see ahead of time, not after the fact. And there's a.

Vaibhav Gupta [00:28:51]: And I think the biggest unlock is kind of what I, what I call like the hot reload loop of prompts in React.

Demetrios [00:28:58]: Hot reload. I like that name.

Vaibhav Gupta [00:29:01]: Right? So like in React, what is your hot reload loop? You go to your reaction file, like your TS file or TSX file. You press command S. You'll look at the browser. If it doesn't match what you want, you go back to the file, you edit it, you go back. That's really fast hot reload loop. And you can't do that without React in reality because a whole bunch of things about how web state component works and all that stuff, I won't go into that and prompting, it's kind of similar. You want to change your prompt, you want to run your test case. And what I did right now is I just pressed the run test case button and you can legitimately just see exactly what the model did.

Vaibhav Gupta [00:29:35]: And if it doesn't match your expectations, you just go and edit the prompt, edit your data model, edit your input and rerun it again. You get into a really fast hot reload loop. So you quickly converge on a good prompt for your test cases. And because you never leave your editor, you don't have to go to a web browser. You're using tooling that you're already familiar with. You can use cursor, you can use cloud code or anything else you want without having to really log into a SaaS of any kind.

Demetrios [00:30:06]: So this is really, it's helping you speed up that workflow by giving you like, the rendering on one side of your screen of what you're working on on the other side of the screen.

Vaibhav Gupta [00:30:19]: Exactly. It's, it's how I do web development. And because web development is so experimental, we needed to do that. Why is Jupyter Notebook so successful for data science? Because data science work is very experimental. I need the visual feedback to go do this. I don't want to have to run the whole program every single time from scratch. Agents are also very experimental and you need that. And the question we have to ask is, is every language going to build that experimental tooling individually or can we just use something like BAML and then plug into every language with autocomplete and all the benefits with like a cogen kind of layer?

Demetrios [00:30:59]: Well, because I was envisioning it as a tool that you would use with it, like a. I know if we're talking about prompting specifically, there's a bunch of these different prompting tools. I think prompt layer is one of them. Or OPIC or ML Flow even does it these days. Right. So how do you see those two things playing together? Is it still that I would have ML Flow to like version my prompts and have that?

Vaibhav Gupta [00:31:29]: Why? I think it's just like, why would you reinvent version control? We have git, we have like it's a. It's a battle tested version control system that works for decades for like the most complex software out there. Why reinvent it? It's beautiful. It's really, really damn good. There's only two companies that reinvented git and that's Mike, that's Google, Microsoft and Facebook. And that's because their code base is massive. That git was too slow. So they made improvements to Git to make it better.

Vaibhav Gupta [00:31:58]: And they have used Mercurial as well. Why would you use anything different?

Demetrios [00:32:03]: But isn't it because you would have so many different prompts for certain? Like what you were showing me? There was one test case, but I assume that you're going to have a test suite when you're using stuff.

Vaibhav Gupta [00:32:20]: Yeah, we have that for regular software too. I think if you go back in time to a time when software is a lot smaller just in volume of lines of code, people would have said, oh, this works for like 10 functions. But is git really going to work when you have a hundred thousand plus functions? It turns out it works when you have 100,000 plus functions. And I think it's the same thing with your test cases and everything else too. It's just code. And the best place to have code is your code base. Now you might want to load tests from a database. That's fine.

Vaibhav Gupta [00:32:52]: We know how to write tests that load from a database. We've done that before. You create a database call, you call it and then you run the test case. We know how to write pytest that does that. So it's true. You might store some instances in a database and some Instances locally. That makes sense, but it's just code.

Demetrios [00:33:10]: Now, as you're working with different folks that are using bam, what are things that they're telling you that's surprising you?

Vaibhav Gupta [00:33:21]: I think the scariest thing that I heard was they're using it in production. Well, no, that. That was fine. That. That stopped being scary, like, maybe, like, seven, eight months ago. That was scary of seven, eight months ago. Because every time someone shipped, I was like, oh. Because we ship for so many different languages.

Vaibhav Gupta [00:33:36]: I was like, oh, my God, did we break something? Did we break something on, like, the. What's it called? Like, Debian? What's a slim version of Debian that people have, like, the Alpine. There's, like, an Ubuntu container called Alpine that people use for deployments because it's very, very small and we broke it once. Or like, there's compat layer, so there's small things like that. They used to give me anxiety, but almost all those bugs have been addressed now.

Demetrios [00:34:02]: Nice.

Vaibhav Gupta [00:34:03]: But I think the scariest thing is when I heard someone having, like, 25,000 lines of bamboo code in their repo. Wow. And I was like, that's a real code base. Like, that's scary to me.

Demetrios [00:34:16]: That's legit, dude.

Vaibhav Gupta [00:34:18]: Yeah. And I was like. And they're like, can I please have namespaces? Because we don't have namespaces right now in Babel. And I was like, why do you need namespaces? And then they showed me that I could, and I was like, okay. I guess you did namespaces.

Demetrios [00:34:32]: Yeah.

Vaibhav Gupta [00:34:33]: So we haven't had namespaces yet. It takes a while for us to build out some features just because they're so core and primitive. Yeah. So features do take a little bit, but some features are fast, some are more work. But that was probably. Go ahead.

Demetrios [00:34:47]: Oh, no, no, no. Sorry. Keep going, Keep going.

Vaibhav Gupta [00:34:49]: No, I was gonna say, like, 25 lines of code scared me, I think. First. First time I met someone that wrote a thousand lines of code. That was scary. We hired one of the first people that wrote 3,000 lines of code of BAML. Nice. Into our company, and they ended up joining us. But, yeah, volume of code is probably the scariest thing.

Vaibhav Gupta [00:35:09]: It's like, oh, shit. People depend on this. For real.

Demetrios [00:35:13]: And how do you look at where to go next? Is it by talking to folks and seeing there's namespaces we need to do? And I'm sure there's a laundry list of other things that people are asking you for?

Vaibhav Gupta [00:35:28]: I think one of the Best things about building a programming language is in the end, we're building for ourselves. Like, there's very, very few tools that developers could truly build for themselves. One of them is like, editors, like, cursor VS Code. And you can go. You can just go do that. It turns out another one is a programming language because, like, you as a developer kind of know what it feels like to write code and you have an idea for what feels right, what is necessary. So we do listen to our customers to add different features, but we kind of just know what we have to do. Like, I think if anyone had asked us to make BAM L, they would have said no.

Vaibhav Gupta [00:36:05]: They would have said, why would I want this? And some of our earliest users were like, why the heck are you making me write this code? And we were just like, trust us. And we hoped that we were right. But most features are usually just like, in our heads. And we're like, we go talk internally on the team, we write a lot of pseudocode, and then we go shop test it once we have that concept with our community. So we have a community of over like a thousand people now.

Demetrios [00:36:28]: Nice.

Vaibhav Gupta [00:36:29]: And we, they've been really helpful in helping us guide the direction of where we should be going. But we usually don't leverage them to, like, have the inception, the first inception of an idea, because they're busy thinking about how to build the best applications.

Demetrios [00:36:44]: Yeah.

Vaibhav Gupta [00:36:45]: And what we see is sometimes, like, I'll give you an example, like streaming. A lot of SDKs and frameworks around LLMs don't give you a great streaming visualization. And what I mean by that, in what way? Streaming is a really nuanced concept. So I'm going to show you something and maybe we can try describing it to everyone else. So let's take this recipe generator, for example. So everyone knows we can go to ChatGPT and ask it to spit out a recipe. It'll dump out something. But when I use streaming, I can do something really incredible.

Vaibhav Gupta [00:37:21]: What if I can make it interactive while I was loading? And if you take a look at my screen, you can actually see, like, the loader icon is telling me exactly what it's working on at any given moment versus what's done? And this cue of how you build this kind of application is, it's possible Today, with today's SDKs, it's just really, really damn hard.

Demetrios [00:37:46]: And so what we're seeing here is that you've got a little slider button at the top. As the recipe is rendering, you can move the slider and get an interactive experience of that recipe being updated depending on the size of or the amount of people that you're trying to feed with this recipe. But you also see one of those little spinning wheels so you know exactly which part the LLM is working on.

Vaibhav Gupta [00:38:19]: Yeah, yeah. And that's just not something that I think a lot of people spend effort on. Like, can you do this today? Yes, you could do this today. With today's LLMs, we remember we don't modify the LLM at all. We use every model as is without any modification. But the problem that I often see is not that you can't do this today, it's just it takes a lot of code to go do this. What if you can do this with one line of code in baml? That is the nuance of what we offer. And I think that making something that needs to be common, very easy is an under undervalued thing.

Vaibhav Gupta [00:38:59]: But I think a perfect example that a lot of people probably know that did this is Tailwind. So I think Tailwind fundamentally changed the name, it changed the game of how style should be done. And things became more standardized once you use Tailwind. Because CSS is one of those things where like we all want it to be perfect and we all thought like the right way to do CSS was to have style CSS files with like a bunch of classes that we link. But it turned out CSS is very hyperlocal. I want to modify just the div I'm right at and I don't want to hover over and see what class something is. But the problem is I don't want to write raw CSS there because raw CSS is hard to read. And actually like, there's some attributes in there that are just like, what the heck is that? I cannot human read that.

Demetrios [00:39:44]: Yeah.

Vaibhav Gupta [00:39:45]: So Tailwind did something very simple, which is they added a string that was easy to read and programmatically possible to generate, and they added it in a spatially local place. So they added a new syntax for defining CSS that they then just run through React to render the actual CSS under the hood. And that allows them to do optimizations as well for free. Like only include the styles that you actually use in your code into the rendered formats. Your style sheet isn't super long like Bootstrap or other systems used to be. But more importantly, it became more ergonomic and that allowed people to build perfect UIs for each part of their website because it's. It became easy to do it frequently in My code, baml tries to do the same thing on streaming. Instead of having to think really hard about do I want to add like a hundred extra lines here to make this thing stream perfectly? You had one line of code and now it just streams the way you want it to stream.

Demetrios [00:40:42]: Huh? Now, what are you seeing users of BAM L create? What are some cool projects?

Vaibhav Gupta [00:40:52]: I'll. There are a couple ones. There's kind of things all over the domain. So there's some in like the government industry that I thought were really interesting. We, we have a lot of like government RFP generators that are built off baml. We have some in the medical space, like analyzing doctor patient conversations for all sorts of EMR stats, admin work. We have agents that operate in like the RFP automation space. We've all seen like those Chrome extensions that kind of extract data from website web pages into like a spreadsheet, like Vue.

Vaibhav Gupta [00:41:27]: I've seen those very generic dynamic systems we built in BAM L and SQL chatbots kind of all over the place, RAG systems, a little bit of everything, which has been really surprising.

Demetrios [00:41:40]: And is there stuff that folks are asking for besides like the streaming, besides the namespaces that you had said? What else is next on your list that you're thinking about tackling?

Vaibhav Gupta [00:41:52]: More language support is probably like a huge one for us. So we technically support every language today using open API. So we support Python, Ruby, Python, Ruby, TypeScript natively, and then all of the rest are available via Open API through baml. But we're going to add GO support soon and then following that will be Java support natively as well. So now that we've figured out static languages, we should be able to unlock every other language without having to have like a sidecar kind of system. And then the next big one is actually an orchestration system that we're going to announce pretty soon. Oh yeah. So not just prompts, but like full workflows and orchestrations with a debugging experience unlike anything people have ever seen before for agents.

Demetrios [00:42:43]: And when you say orchestration, you're talking about what specifically?

Vaibhav Gupta [00:42:49]: Like full expressive systems like if statements for loops, while loops where you can conditionally call an agent, abort to a human worker, human annotation system, come back to it, that kind of system.

Demetrios [00:43:02]: Wow.

Vaibhav Gupta [00:43:03]: Yeah. And all that will still be exposed to every language of your choice.

Demetrios [00:43:07]: Oh man. So now as you're thinking through a mind share perspective, you like to garner more attention or to garner more developers using BAM link. How are you looking at that? Like how do you Think this is ultimately going to be something that's not just a flash in the pant.

Vaibhav Gupta [00:43:32]: Yeah, we spent a lot of time thinking about this because I think since we started, which was like November or October of 2023, there have been a ton of frameworks that have come out since then and a lot of them have died in very fast. They've all been flashlight. I think one thing that is good is people have sustainably continued. Our users of Bama have increased over time without much like churn. One thing I was worried about is are people going to grow out of baml? But as I talked about, like the person with that other company with like 25,000 lines of bamboo code, clearly they seem to be adding more baml, not less baml over time. So that sort of stuff I think has assuaged a lot of our worries around, like, will people grow out? Like, maybe they will, but we'll add namespaces, we'll add all the things that people expect out of a language. But I have a view on dev tools that I think is very different than a lot of people. But I was inspired to by the Typescript team.

Vaibhav Gupta [00:44:28]: I think as a company, you have a finite set of resources. You choose where you want to deploy your money. You can deploy or end your time. You deploy into marketing or engineering. We as a company are just saying, screw it, we're just going to write a shit ton of code. Like, we're just gonna keep writing code and keep shipping. Our code base is almost like half a million lines of code or something. And we'll just keep.

Vaibhav Gupta [00:44:50]: Because that's what we as a team love doing. And we can do some incredible features. And what we wanna do is we just wanna make it so that you as a developer aren't thinking about BAM being a bottleneck. We're just shipping and you're just like, cool. I have that feature. Oh, before you even imagine you need it, we've already thought of it and added it by the time you need it. And what we hope is if people keep using BAM L and they keep loving it, I hope that they tell their friends. And I think developers are.

Vaibhav Gupta [00:45:19]: These really, really developers are horrible to sell your product to. They're just the worst buyers ever. But they are the best referral system ever. If a dev loves your tool and they abide by it, they will tell every single one of their friends how much they love it.

Demetrios [00:45:34]: Yes.

Vaibhav Gupta [00:45:35]: And I want to earn the trust of developers that, hey, we will look out for them and that we will do our best. To make sure that their problems go away. So rather than spending time on, like, marketing or all these other things, like, let's just ship good code. Like, shipping is the best marketing, and that's what we want to keep doing.

Demetrios [00:45:54]: How do you think developers talk about Bamo when they talk to each other?

Vaibhav Gupta [00:46:01]: I, I can read some tweets out that are like, some messages, but, like, I get like, messages almost every week now from someone somewhere around the world that discovered bamboo for the first time and literally was like, thank you, or like, this is amazing. I, I don't know how you did it, but it, it was really good. And I had a someone that messaged me. I think the message that stuck out was I. What they sent me was I really. I explored it three months ago and I didn't want to go learn it because I didn't have time and I had to go ship. But I tried it again this weekend because I was tired of seeing your posts and I regret not having switched earlier. It would have saved me so much time.

Vaibhav Gupta [00:46:41]: It ended up taking about, like, two and a half hours to learn.

Demetrios [00:46:44]: Nice.

Vaibhav Gupta [00:46:45]: But I think it's that sentiment of, like, the regret that they felt was like a really. I felt bad that I could say things earlier to save them the time earlier. Uh, but I'm really glad that they found value and it wasn't a waste of time for them.

Demetrios [00:46:59]: I want to highlight that you said he was tired of seeing your posts, and this is you not in marketing mode. I cannot imagine what the world will look like when you are marketing.

Vaibhav Gupta [00:47:11]: Yeah, I do. I do post about Bama on LinkedIn. Uh, I, I, I think I'm really proud of what the team has done and the things that I mostly post. Like, things that we ship.

Demetrios [00:47:19]: Yeah.

Vaibhav Gupta [00:47:20]: So I'll just talk about features and that's us for marketing. No. So more of our users know about the new features we release. Like, I should probably send an email chain out. I feel like people do that thing, but I don't really know how to set that up. So we, we collect emails, but we have never released. We send out four emails in the lifetime of the company to people. And I try my best not to do that because I, as a developer, hate emails.

Demetrios [00:47:41]: Yeah.

Vaibhav Gupta [00:47:43]: So we do different things.

Demetrios [00:47:44]: Google groups going on.

Vaibhav Gupta [00:47:45]: That feels like, oh, my God, I would die. I hope not. I mean, like, I. Let's talk about Discord versus Slack. Like, why do we use Discord? I hate Discord. Like, I like, I play games and stuff and the Discord is for social But Slack is like more commercial and work based. Yeah, but one of the things we thought was, I don't want to give up my name and identity just to ask a question that feels kind of like it's like increasing a barrier. So we just said, what if we just make a discord, no login needed, just go and ask a question.

Demetrios [00:48:15]: That's cool.

Vaibhav Gupta [00:48:15]: It's kind of the philosophy that we always take, like reduce the barrier of entry as much as possible because we have this huge barrier which is you got to spend two hours on a Saturday to play around with it. So it's like every other barrier we try and remove out of the way.

Demetrios [00:48:29]: And what are some things that you feel like the folks that are using bamo Highlight when they talk about bamo?

Vaibhav Gupta [00:48:39]: The biggest one is probably our parsing. So we have a way to do structured outputs that is better than OpenAI, it's better than anthropic, it's better than every other model out there on almost every benchmark. It's a new algorithm we created called schema line parsing. People love that thing and they love it because it allows you to chain of thought in one shot with structured outputs in the model. And you don't have to think about it. We just do error correction on the model. A lot of other frameworks, what they do is they do retries whenever the model doesn't give you the exact thing you ask for. We solved that same problem with like milliseconds of work with an algorithm.

Vaibhav Gupta [00:49:18]: And now it's been battle tested in like millions of calls, tens of like easily over like tens of millions of calls and all different data types. That's one thing people love and the other thing that they really love after they get past that is the iteration speed. The iteration speed of BAM O is unlike anything else. If it takes you. I think the way I put it is if it takes you five minutes to test one prompt at a time and you need to test 50 prompts to find the right answer, it'll take you 250 minutes. If it takes you five seconds to test the prompt, it will take you 250 seconds. And there's just like, no, there's just no comparison when it's that much faster. That run test button I showed earlier from our VS Code Playground is probably the single best feature that we've shipped.

Demetrios [00:50:14]: I am blown away by this now.

+ Read More

Watch More

How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 15.1K
# LLMs
# Engineering best practices
# Comet ML
Building Defensible AI Apps
Posted Nov 13, 2023 | Views 492
# Defensible AI Apps
# AI Gateway
# Data Independent
Taking LangChain Apps to Production with LangChain-serve
Posted Apr 27, 2023 | Views 2.5K
# LLM
# LLM in Production
# LangChain
# LangChain-serve
# Rungalileo.io
# Snorkel.ai
# Wandb.ai
# Tecton.ai
# Petuum.com
# mckinsey.com/quantumblack
# Wallaroo.ai
# Union.ai
# Redis.com
# Alphasignal.ai
# Bigbraindaily.com
# Turningpost.com