MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Building Claude Code: Origin, Story, Product Iterations, & What's Next

Posted Nov 05, 2025 | Views 18
# Claude Code
# Agentic Coding
# Anthropic
Share

speakers

user's Avatar
Siddharth Bidasaria
Member of Technical Staff @ Anthropic

Software engineer. Founding team of Claude Code. Ex-Robinhood and Rubrik

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Demetrios Brinkmann talks with Siddharth Bidasaria about Anthropic’s Claude code — how it was built, key features like file tools and Spotify control, and the team’s lean, user-focused approach. They explore testing, subagents, and the future of agentic coding, plus how users are pushing its limits.

+ Read More

TRANSCRIPT

Siddharth Bidasaria [00:00:00]: I've seen people with like 100 MCP servers installed or something like that. And I was like, what do you even do with 100 MCP servers? But hey, it works for them, you know, like, they're happy with it and, you know, it works for them. So, yeah, there's always like these like surprising things where all the composable parts of cloud code, like, they're being used in such creative ways, in ways that we didn't quite expect them kind of to be. To be used.

Demetrios Brinkmann [00:00:34]: Dude, tell me about Claude code. What. What's the genesis of it? Because I. I never heard the origin story.

Siddharth Bidasaria [00:00:42]: Yeah, so the origin story is we, Anthropic had this team called the Labs team. And this, the charter of this team was just a prototype, new products to kind of play around with things and see what hits. More of an experimentation team, really. And that's kind of the team that I joined when I joined Anthropic. And then one of my colleagues, Boris, he had this nifty little prototype when he joined Anthropic. He built it in his first week and it was basically accessing Claude a terminal.

Demetrios Brinkmann [00:01:22]: Nice.

Siddharth Bidasaria [00:01:23]: And it couldn't write code or it didn't have any access to any tools. It was literally just like calling the Anthropic APIs from the terminal and you could type into it. And I remember the first demo that he showed us, he was just like, he spun it up and then the first thing he wrote into it was like two plus two. And I was like, bro, that's the worst thing you want to ask an LLM, ask us something nice. But it was funny, I think. So that was. He's kind of like, you know, had that as a prototype and then we were experimenting with other coding surfaces and other coding products. But at some point he ended up adding two tools to this little terminal thing he had.

Siddharth Bidasaria [00:02:03]: The first one was the ability to control Spotify.

Demetrios Brinkmann [00:02:07]: Yeah, nice.

Siddharth Bidasaria [00:02:08]: It was like, I want this thing to control my music. So he's like, all right, like, play me this. So he added that from the terminal. From the terminal.

Demetrios Brinkmann [00:02:17]: Because who doesn't want to operate Spotify from their terminal?

Siddharth Bidasaria [00:02:21]: Exactly, exactly. Are you really a hacker if you can operate your Spotify from your terminal? But the second thing he added was just file tools, file reading, file writing, stuff like that. And the moment he added that, he shared it with us. And I was like, holy shit. It was just like, instantly I was like, there's something here. It feels really ergonomic. It feels very different from web based apps. And things like that.

Siddharth Bidasaria [00:02:53]: So I think I was still working on some other surfaces then, but I was like, I think I just want to join you and hack on this. And he was also kind of getting pulled into other projects, but then at some point kind of made a decision to be like, okay, let's just double down and work on this a little bit more. So we spent a couple weeks hacking and then we kind of released it internally at Anthropic. And almost immediately it just got on like wildfire. People were just using it. Everyone was super excited. And we were like, whoa, this is. Within two weeks, we had some 300 active daily users.

Siddharth Bidasaria [00:03:29]: And this was when anthropic was like 600 people in the whole company.

Demetrios Brinkmann [00:03:33]: Even the CFO was using it.

Siddharth Bidasaria [00:03:35]: Yeah. Controlling their Spotify, you know.

Demetrios Brinkmann [00:03:42]: Do you remember what model it was?

Siddharth Bidasaria [00:03:45]: This was 3.5. This was sonic. This was 3.5, I think. Yeah. Around there. It's like October, October, November of last year. Yeah. And I think that's kind of how it all started.

Siddharth Bidasaria [00:03:58]: And then once that. Once we got some momentum internally, there was talk about launching externally, but honestly, at the time, like, we weren't really sure if this is going to be a thing. We were like, you know, this is still in the terminal. Kind of still feels a little jank. You know, it doesn't quite feel as polished as, like, cursor is. Like, cursor feels like, oh, like, you know, they have this cool interface and they have the tab autocomplete and they're like forking an ide. This is literally just a terminal, like, it's just like a terminal app. So we had no idea if this was what would happen and how this would be received.

Siddharth Bidasaria [00:04:35]: Remember, there was some FUD and there was kind of all kinds of discussions going around. But, yeah, ultimately we decided to do a small EAP with a few customers, seeing what the response would be like. People just really liked it. They grew attached to it very quickly. And that's when we kind of decided to launch more externally. And then once we launched it, it's just like after a couple weeks, it kind of just blew up on social media and things like that and just had a life of its own. But, yeah, it was not apparent at all. I feel like it was a lot of happy accidents that happened along the way that kind of got us to that point.

Demetrios Brinkmann [00:05:17]: There's so many things that you said that are so fascinating right now. One is that there's just an R D department to go and try it and create cool new Stuff. And you are on that team, which I love. The second is the one tool that you added that changed everything. And what was it about that tool? You were like, it's ergonomic. There's something different here. Why did you have that intuition there?

Siddharth Bidasaria [00:05:46]: Yeah, I think the. The file edit file retools. It was. I think the biggest thing was that these files were local and you could spin up Claude anywhere. With kind of most products before that, you'd have to have some sort of synchronization or setup step where you need to copy over your files, or you'd need to create a Docker image with your repo or give them access to your git repository or some form of sharing your files. And then it could start working on those things. And then it wasn't super clear how to. How to be collaborative in a product like that.

Siddharth Bidasaria [00:06:25]: If everything is being exported over to a VM where your agent is running, how do I then very easily take what the agent's done and then continue that and improve that myself? The models are pretty good and they're getting better, but they're not good enough to zero shot or one shot everything. So I need to go in and make some changes, adjust things like, you know, do things. Make it so that it looks like actual code. And that was very hard to do with these other products or like, with kind of this other worldview. And I think in the terminal, when you can just like, spin it up, no matter what files you have, like what repo, it doesn't matter. It's just. What matters is that you have the files locally. You can just spin up Claude, ask it to read a bunch of files, ask it to explore kind of how a human would.

Siddharth Bidasaria [00:07:18]: It just felt really magical. It just felt really low friction. It was a very low friction way of getting started. And I think that's kind of what felt really ergonomic about it.

Demetrios Brinkmann [00:07:30]: What's been your favorite feature that you added? I'll tell you mine first. The clear context window.

Siddharth Bidasaria [00:07:37]: Yeah. Slash clear. Yeah, yeah, yeah, yeah. Slash clear is cool. Yeah.

Demetrios Brinkmann [00:07:43]: How about you?

Siddharth Bidasaria [00:07:45]: I think my favorite feature that I added would probably be honestly, like, the to do list. The to do list was. I think that that remains my favorite feature. Just something about, like, tasks getting checked off and just like, you know, you.

Demetrios Brinkmann [00:08:00]: Aren'T the one that is doing them.

Siddharth Bidasaria [00:08:02]: Yeah, yeah. It feels satisfying, you know, like, best feeling ever. Yeah, yeah. Like, it's just doing all this stuff for me. I'm just watching, like, it's like watching.

Demetrios Brinkmann [00:08:10]: Those TikTok videos where it's deep cleaning of cars or houses. And you're like, this is oddly satisfying because I'm not the one that has to be the one cleaning right now.

Siddharth Bidasaria [00:08:19]: Yeah, exactly, exactly. And I think, like, adding the to do list also kind of just made the model stay on track longer for, like, longer horizon tasks. And that was cool to see too. Like, you'd see a lot of times, like, you'd be like, hey, like, I have these hundred files. Can you go and, like, edit their names or, you know, do something to them? And. And it would do, like, 30 of them and be like, ah, I'm tired now. You know, you can do the rest by yourself or something like that. And then, like, you give it a to do list tool and, like, it uses it to first create a to do list for itself.

Siddharth Bidasaria [00:08:50]: And, like, batches the hundred files in, like, you know, batches of 10 and keeps checking them off one at a time, and it goes through the whole thing and it's like, oh, this is quite satisfying. Like, now this is becoming a little bit more deterministic. Not super deterministic still, but it's a little bit more deterministic than it was before.

Demetrios Brinkmann [00:09:08]: Did you notice? Because I remember we had my friend David Hershey on here who created Pokemon. Claude plays Pokemon, and he was talking about how, for him, he saw such a unlock that it was almost like this latent unlock where people didn't realize when a new model dropped the potential it had until they were playing around with it for a little while. And then you saw, whoa, this is really good at coding. And there's something here. Did you notice, like, big jumps from different models?

Siddharth Bidasaria [00:09:44]: Oh, a hundred percent, 100%, I think. I think 3, like, 35 was pretty good. And. But 37 is kind of like where things really started to come together. I think you could see a step change in the complexity of tasks that the model was now able to handle. You could see a step change in the way that the model kind of navigated through complex structures. It almost feels like there is, like, if I was to tackle a task, what would I do? It would do a lot of those same things to kind of get to its conclusion. And that was really fascinating.

Siddharth Bidasaria [00:10:31]: And I think that I saw that step jump between 3.5 and 3.7, where immediately 3.7 felt like it was just way more capable. And I think, because I'm using the models all day, every day, Right. I think a lot of people at Anthropica and just even outside, so you get quite attuned to model behavior. And then once you see your model behaving slightly differently, it jumps out. Like you can tell. It's like it has a slightly different smell to it. It's like, yeah, I can tell this is something different.

Demetrios Brinkmann [00:11:05]: Yeah, it's like, wait a minute, you just broke my frame of reference. That shouldn't do that.

Siddharth Bidasaria [00:11:12]: Yeah, yeah. And it's also kind of tricky, you know, because like once you, especially if you use coding agent decoding products a lot like you, you start to form an intuition about the kinds of things that it might be able to do versus the kinds of things that it might not be so good at. And you'd probably have to play a more hands on role in getting that task to completion. But then a new model will come along that's actually much better and a step change. And then you have to rewire your brain into like, okay, no, now it can actually do this now. And that can be tricky for people sometimes. And I think that might also just be a blocker in adoption of AI tools kind of going forward because people kind of form a point of view about what it can do. But then that point of view needs to be adjusted very frequently or more frequently than people realize.

Demetrios Brinkmann [00:12:06]: It's funny you mentioned that because we had my buddy JQ on here and he was talking about how if for some reason the model like leans towards doing something what you would consider the wrong way, then try to change your frame of reference into can I make it fit into what the model's doing? Like if the model wants to always do this, can I see if I can now change me to let the model do that and see if it can then complete the task, you know.

Siddharth Bidasaria [00:12:39]: Yeah, yeah, that's a fascinating idea. Yeah. Keeping the model on distribution. There's something to that. Right? If you keep it on distribution, it's kind of doing what it really innately wants to. You're not trying to force it to go a certain way or steer it a certain way. Yeah, I think that's an interesting field of research and study too. Just seeing kind of how keeping the model on distribution might actually help or even if it's unintuitive to Asian one.

Demetrios Brinkmann [00:13:14]: There'S something I want to talk about that's close to your heart, which is this idea of forward compatibility and making sure that when you're creating today, you're not spending a bunch of time on something that is going to change in the next model drop or with some kind of unlock with maybe it's like, oh great, now we can just use MCP servers for that. And so I wonder if you have examples of things that you have done in the past that ended up being like, you almost shot yourself in the foot, or it was just a quote unquote waste of time. And I'll give you an example of one that my buddy Flores was talking about on the podcast a few months ago. And he said, When ChatGPT came out, I spent so much time trying to extend the context window. I did everything I could to make it so that it was so janky, it was so hacky, and all I needed to do was be patient and wait. And the context windows for all these different models just got bigger by default. And so I wonder if, like, you have felt that in different areas where you're sitting there now, looking back, you realize I spent way too much time on this thing that I could have just been patient on.

Siddharth Bidasaria [00:14:35]: Yeah, there's a lot. I think, like, one of the core, like, philosophies of our team is we absolutely love deleting code and, like, taking and just deleting features.

Demetrios Brinkmann [00:14:48]: Nice.

Siddharth Bidasaria [00:14:50]: I think that's, like, by design somewhat, because of this exact issue, because you're like, there's a phrase that it's like unhobbling the model. Basically, it translates to let the model cook. That is the centerpiece and the showpiece here. And the harness is really just about providing the model with the tools that allows it to cook, as opposed to trying to steer it in a way that's unnatural or steered in a way that, you know, might be redundant in a few model releases. So an example of that is, like, when cloud code first came out, we had a bunch of tools. We had, like, the LS tool that was literally just for listing files. We had a few other tools that were just, like, very file system specific. And then we realized that we don't really need any of these tools.

Siddharth Bidasaria [00:15:48]: We can just literally just give it the BASH tool and it, like, will know it can just do all of these things just through the abstraction of a BASH command. So we strip them all out and we're like, okay, we don't really want this in here, and just give it the Bash tool. And it still is able to do the same exact things, maybe even better than before, just because, like, it understands the concept of bash so much better than a tool that we invented and gave it, gave it, gave to it. So there's other kind of fascinating examples of how much Harness is too much Harness. That's a question that I think everyone who's Building AI apps is kind of asking and there isn't a clear answer. I think it just kind of. There's a balance. It's more of an art really.

Siddharth Bidasaria [00:16:39]: It's like you got to balance it out because you want to be useful now to your users, but at the same time you want to be able to adapt and you want to be really agile so that if the model changes tomorrow, you can just rip out entire pieces of your code base and just do something else and just let it cook. So that's always like hooks. I don't know if you've kind of used cloud code hooks yet, or they're like, they're like a feature that allows you to kind of hook into the lifecycle of cloud code. And what I mean by that is let's say you want to. A simple example is like logging, right? Like you want to log every single tool call that cloud code makes as part of its operations. So you can create a hook, you can register a pre tool call hook, or I forget what the actual name is, but that's effectively what it does is it's a piece of code that the harness runs before each tool call. And that piece of code is provided by you, it's provided by the user. So you can have a bash script or you can kind of have any kind of code that takes in some parameters for like what tool is being called, what's the context before that and a few other pieces of information and it can return some information back into Claude.

Siddharth Bidasaria [00:17:57]: Like it can like disallow that tool or it can like steer it in a certain way. So it's like, it's a very flexible system that kind of makes you, that makes Claude code a little bit more deterministic and more composable according to how you want it to be. I remember very distinctly one of my colleagues, Dixon, kind of came up with this and I remember thinking, I was like, okay, is this too much hardness? Do we really want this? But then I think his point was quite. I mean, clearly a lot of people are using it and a lot of people like hooks. But his point was that we need to kind of be somewhere in the middle. This is a problem that we're facing right now where the model does something that people don't want it to do and they want to steer it. So let's kind of create an abstraction that works here. We're not trying to do something very heavy handed, but we just kind of make lifecycle hooks, which are a very common concept in many kind of dev Tools.

Siddharth Bidasaria [00:18:55]: Right. Like you have these lifecycle hooks. Yeah. So it's always a struggle. Kind of like trying to figure out how much harness is too much harness, but kind of we lean towards just being leaner rather than being more harness heavy.

Demetrios Brinkmann [00:19:13]: It's so funny that you mentioned that specific thing because last week I gave a talk all about how there are a lot of downfalls with the chat interface. And one of the downfalls that I see is that we don't have this granular steerability if we want it, because we're using words and words are inherently fuzzy. They're like symbols of things that we want to happen. Right. And especially with language itself, you don't get like a knob that you can turn. You have a word that you use or you use a stronger word or you use a weaker word. And so you saying that makes me rethink, well, do we even need steerability? Right. Like, because your whole thesis is in the way of just let the model cook.

Demetrios Brinkmann [00:20:06]: You don't need to steer it, you just need to get out of its way. And that was also kind of like JQ's thing is when it's doing what you don't expect it to do, then figure out how to make you fit into its patterns as opposed to it fitting into your patterns. And so now I'm like going to have to go back and rethink this whole idea of this durability.

Siddharth Bidasaria [00:20:31]: Yeah, it's. It's a balance. It's a balance. Right. I think, like, I don't think like needing or wanting steerability is inherently bad. I think it makes, it makes sense. Right. Like, people are using it today and they want it to be useful for them right now.

Siddharth Bidasaria [00:20:45]: And if what they want right now is steerability and the model is not able to give it to them, then really the only option you have as an application developer is to create some harness that would allow the user to steer the model in one way or the other. But there's kind of two things to it. There's like, one is the point that you were making earlier is like, oh, should I be changing my frame of reference? And would that help? And then the second is what would happen if, let's say two months from now, there's a new model that comes out and it just does it for you. Did you just spend a whole bunch of resources on it for nothing? So I think it's like being agile, having a tech stack that allows you to be really agile, a tech stack that kind of allows you to experiment a bunch and you don't feel bad really throwing away a lot of your code. That becomes really key, right? Because you should experiment with harnesses and you should experiment with letting it steer or like more heavy handed steering mechanisms. But then at the same time, when the time comes, it should be easy to rip out. It's not like you've now built five more things on top of that steering abstraction and now it's like impossible to take out from your code. And now you're fucked.

Siddharth Bidasaria [00:21:52]: Right? It's like, that's like you gotta. Yeah. So it's a balance.

Demetrios Brinkmann [00:21:58]: It's like I can't delete that. It holds up the rest of this mountain of code that I have. And so you really like got yourself into a pickle, right?

Siddharth Bidasaria [00:22:07]: Yeah.

Demetrios Brinkmann [00:22:09]: What features are on the chopping block in your mind? Like you would be happy to delete.

Siddharth Bidasaria [00:22:16]: Ooh, I would be really happy to delete hooks. I think if the model was just working perfectly in a perfect world, you wouldn't need hooks, right? You can just tell the model what you want to do and you can have different ways of ingesting some sort of context into the model. And the model adheres to it perfectly every single time. Let's say you have like, let's take like probably the most common use case of hook is hooks is logging. So you can literally just tell the model, hey, every time you use a tool, log it using this endpoint and if it adheres to your context or your request perfectly, you don't need hooks anymore. Because hooks is our way of adding determinism to do an inherently abstract, inherently non deterministic process. So that would be really cool. Let's see what other tools I think we've kept.

Siddharth Bidasaria [00:23:18]: I think we've done a pretty good job of keeping our tool list pretty lean. We've been deleting a bunch of code, a bunch of tools over time. That's the big one that comes to mind is maybe like some memory features, right? Like if the model had perfect memory and perfect recollection of things that you've set it in the past, like let's say you have infinite context going forward. You just kind of point it to the transcripts that it had before and it just knows what to look for. You don't really need any kind of special scaffolding for memory or user preferences. It just kind of knows. Um, but these are, these are like, these kind of sound like pipe dreams, but not really. I think like there is, I can see a world where it kind of gets to that, but Yeah, I think those would be two features that I might be Delete.

Siddharth Bidasaria [00:24:12]: Might delete.

Demetrios Brinkmann [00:24:12]: Well, it's kind of like certain things are almost Moore's Law ish in how much they're developing and how fast they're developing. And one of them is the context windows are getting bigger. There's debate there around like, how effective it is to keep everything in context. Right. And how useful these large context windows are. But we do see like the push for bigger context windows and then another one is like the price per LLM call is dropping and that continues to drop in like a Moore's Law type of way where you're like, it's a better model and it's cheaper than before. And so yeah, I potentially could see it. Maybe it's not like in the next month, but if we're talking a 10 year horizon, there's a world where that makes a lot of sense.

Siddharth Bidasaria [00:25:10]: Yeah, yeah, I agree. Yeah, I think like the timeline is always really tricky with these things. You don't really see it coming until you just see it like, you know, it's just like you're, you're oblivious, you're oblivious, you're oblivious and boom, it just smacks you in the face and like, oh shit, I can do this now. Or at least that's been my experience with it, at least. So yeah, timing is kind of a bit of a, you know, oddball. But I do think that there is a world where something like that exists. Yeah.

Demetrios Brinkmann [00:25:40]: Speaking of timing and how things have evolved, I know that we had your colleague Eric on the ML Ops Agent Builder Summit panel that we did in San Francisco in May. It's now, we're in September. And one thing that Eric was talking about back in May was how hard verification is and how that's something that keeps him up at night. You had mentioned like there's been some advances, but it's still a very hard problem. Can you go into that a little?

Siddharth Bidasaria [00:26:13]: Yeah, for sure. For context verification is the ability of the model to verify that the work that it's doing is correct or some form of course correction mechanism where if it's going down a bad path, it recognizes that and can course correct. I think there's two. I would kind of break this down into two different sub problems really. The first is a model behavior. Like does the model know that to check its work regularly and can it elicit that behavior? And the second is, do you have the tools or have you given the models the tools to effectively check its own work? And I think both of Those things are kind of advancing at their own pace. I think the first one, where the models are now they know about verification and they know that it is a path to success, it is a good path to success. And it's more on distribution for them to be doing this.

Siddharth Bidasaria [00:27:20]: And I think that's chugging along. And I think my hunch is that we'll see that kind of keep getting better pretty rapidly over the next few months and years. But the second piece of it is the tools that you're providing to the model. So in the coding context, really we have there's a really wide variety of tasks that you can be coding up. And I think we have figured out verification for a small subset of these. For example, if you're coding up a website and you're mostly dealing with JavaScript components and CSS and HTML and just the visual hierarchy of elements, that is a little bit easier to verify. Now you have some MCP servers. Like there's a puppeteer MCP server that can open up a web browser for you, take a screenshot, send it back to the model, and the model can iterate based on that screenshot.

Siddharth Bidasaria [00:28:21]: So if it's like changing the color of a button, or if it's changing the position of a button, it has instant feedback of what it's doing. But then there are more complex things here too. Let's say it's like an animation. Let's say you're coding up an animation or some sort of user interaction that requires some sort of animated frames. Those are still harder to kind of verify. It's hard to kind of provide the model with a video stream of what's happening. At least in how today's models are conducted, it's still possible. You could slow down the animation frame rate and take screenshots at certain times, but, you know, it's fuzzy and it's unclear how effective those things are.

Siddharth Bidasaria [00:29:02]: And then you've got things like you could be doing some sort of performance work, you're working on some sort of inference thing, and it's like super low level performance work. Again, what does verification even look like here? How do you measure that? How does the model know to measure that? So the easiest thing to do, and this is a bit of a cop out, I realize, but is like unit tests. I think that unit tests and smaller units of tests that humans pay extreme attention to and make the framework for making unit tests and maybe even end to end tests much easier. That will be probably the shortest path to verification success. And so if I'm creating a new product tomorrow. The one thing that I will for sure do is make sure that I have a unit testing framework that is able to test as large of a surface area as possible of my code. Because if I do that, then I can leverage AI tools to write more of my code and trust the output of those tools. So I think, I mean, test driven development has been around for ages, right? Like, it's not a new concept, but I think it is a concept that is being rediscovered to some degree.

Siddharth Bidasaria [00:30:18]: I think everyone talks about test driven development, but if I look back at my career, how many times have I really written a unit test before I write the actual code? And not that many, I'm going to be honest. But now with the models capable of writing the code, like the unit test code too, it becomes kind of easier to digest. It becomes a little bit lower friction to kind of write code that way. So. So that's kind of the other piece of verification which is unsolved. And I think it's going to be tricky to get to a point where we can completely solve that.

Demetrios Brinkmann [00:30:54]: This unit test thing sounds like it's going to take forever. Like you gotta run the test too though.

Siddharth Bidasaria [00:31:03]: Yeah, yeah, you do have to run the tests, but hopefully the model just figures out which tests it needs to run and just like runs those tests and optimizes that for you. So, yeah, there's degrees of being AGI pilled and, you know, like depending on kind of where you fall on that spectrum, you might see it differently.

Demetrios Brinkmann [00:31:23]: Yeah, the model's just going to learn from the worst of us and be like, fuck it, I'm testing in prod.

Siddharth Bidasaria [00:31:29]: Like a true pro.

Demetrios Brinkmann [00:31:31]: Yeah, exactly. I've seen enough of these models do that kind of stuff, like on GitHub issues or something where it'll comment. It's like, this was a human behind it, wasn't it? This cannot be. Really?

Siddharth Bidasaria [00:31:45]: Yeah. Have you seen those? Like, I think that was a trend on, I think levels IO on Twitter, like kind of tweeted about this and then people started doing it. It was like, I think he called it like Vibe DevOps or like Vibe Infra or something like that, where he didn't have any like deployment strategies or like deployment code. There was no infrastructure as code. He would literally just ssh into his production box, open up Claude code and just tell it to deploy stuff and do database maintenance or just upgrade the schema or something like that. And then, oh, I missed this. People just kept doing that over all their projects and I was like, yes. Maybe this is the future.

Siddharth Bidasaria [00:32:31]: I don't know. Yeah.

Demetrios Brinkmann [00:32:34]: Oh, man, that makes me so nervous. But I am glad that somebody's trying it and pushing the limits.

Siddharth Bidasaria [00:32:40]: Yeah, for sure.

Demetrios Brinkmann [00:32:42]: Speaking of pushing the limits, I want to hear about the power users and how they're using cloud code. Like, how have you seen people just surprise you with what they're doing?

Siddharth Bidasaria [00:32:54]: Yeah, there's a lot of power users who have surprised me. I think the number one thing that surprised me was this one person who was using a fleet of 10 to 12 clods for one problem. And he was using the file system as a mechanism for all of these different instances to talk to each other. And he gave all of these different instances a. A Persona. So he had like a, you know, like a backend engineer as a front end engineer. Like. Yeah, exactly like they.

Siddharth Bidasaria [00:33:38]: And they all operated on like different folders in his code base. And I was extremely impressed by that setup. So much so that I think the sub agents kind of feature in cloud code today is like inspired heavily by that approach.

Demetrios Brinkmann [00:33:55]: That makes sense.

Siddharth Bidasaria [00:33:56]: But yeah, it was extremely cool to kind of see that. And I think people have. I've seen people with like 100 MCP servers installed or something like that, and I was like, what do you even do with 100 MCP servers? But hey, it works for them. They're happy with it and it works for them. So yeah, there's always these surprising things where all the composable parts of cloud code, like they're being used in such creative ways in ways that we didn't quite expect them kind of to be. To be used.

Demetrios Brinkmann [00:34:29]: Plot twist. The hundred MCP servers are just playing Spotify from the command line.

Siddharth Bidasaria [00:34:36]: Yes. Love it, Love it. Yeah, I actually want to create this like little web app that literally just like controls your Spotify and that's all it does. You just like auth do it and it has like cloth sitting on top and just like all it does is like connect your Spotify and tells it what to play.

Demetrios Brinkmann [00:34:55]: I would love to be able to just sync with that on finding new music and just be like, you know, I want and just give it all a brain dump on. I want a playlist with this kind of music and try and explain everything that I'm feeling in that moment. And then it goes and creates it.

Siddharth Bidasaria [00:35:13]: Love it. Ooh, what if he. What if he gave it a webcam and a mic? So what you could do is that you could like set that up at a party in the corner and it like looks at the vibe of the party and plays music.

Demetrios Brinkmann [00:35:28]: According to it, it's the jukebox is the 2025 jukebox.

Siddharth Bidasaria [00:35:33]: Exactly, exactly.

Demetrios Brinkmann [00:35:35]: And then people can go and whisper requests into the microphone.

Siddharth Bidasaria [00:35:38]: Oh hell yeah. I think, I think we just created the next billion dollar app, dude.

Demetrios Brinkmann [00:35:44]: Yes. I'm going to save this episode and not release it until I get 2025 jukebox.com or AI website.

Siddharth Bidasaria [00:35:54]: Yeah.

Demetrios Brinkmann [00:35:56]: And so I've heard subagents called agents as tools. I've also just been seeing how incredible it is to have that like sub agent kind of design pattern, for lack of a better word. How do you see or foresee subagents evolving?

Siddharth Bidasaria [00:36:18]: Yeah, I think sub agents are. The possibilities are just too many. There's a lot of different ways that this can shape up. I think when we first launched subagents, there was no good way for kind of like everyday users of an agentic coding product to create their own custom subagents. And the idea really was let's do the simple thing that works, right? It's like we want to do the absolute simplest thing that we know will work and people can play around with and experiment with, but there's so many other things you can do. Right now, the way at least it's implemented in Claude code is it's subagent as a tool, it's an agent as a tool. Right? So it's like mama Claude or the main thread will call into these tools and then wait for those tools to return back to them at some point and then continue with the conversation. But then what if you just had like, you know, a master, master model where you had like multiple sub agents kind of all doing their own thing.

Siddharth Bidasaria [00:37:24]: There is no like parent. They're all communicating with each other. The way that they communicate with each other becomes an interesting kind of decision as well as like, is it message buses? Is it like point to point communication? And when they are communicating with each other, when do you actually inject those messages in? Do you preemptively inject them or do you wait for them to finish? How important is that? Where are these sub agents really running? Is the code that they're writing going to conflict with each other or is it going to be read only? Or do you designate some of them as read only or you designate some of them as read write? There's a lot. It opens up a whole can of worms. But it's also really exciting. I think there's a lot of research that's kind of being done about this. A lot of kind of Open source repos also that are kind of exploring this using cloud code as the base. So I'm extremely excited about kind of this direction and kind of what we find.

Siddharth Bidasaria [00:38:23]: To be honest, it is unclear to me right now whether some of these more complex agent or sub agent topologies lead to better results. And I think unless we have some form of evidence of like this kind of Asian topology, like works really well. I'm hesitant in kind of adding that complexity to the product because it's not just like the complexity of adding it like the code that you're going to add, but it's also the complexity for users to understand how it works. Yeah. And you know, I feel like a lot of, a lot of coding, agentic coding tools are kind of geared more towards power users to begin with. And what's more exciting to me right now, I mean, power users are great and I think I'm a power user, so I build for myself all the time. But what's more exciting is like there's this whole other spectrum of users who are just like, just about, you know, just getting their feet wet with this. Like they, they're skeptics or they don't quite believe in AI and they like, you know, oh, like I can write this better, it's wasting my time.

Siddharth Bidasaria [00:39:35]: Like there's like all of that stuff. But they're, they're not wrong. I mean, I think they're right if, when, when you, for certain things. But I, I'd love to kind of have sub agents and the future of sub agents kind of cater more towards them and helping them get a better outcome from, from, from the product.

Demetrios Brinkmann [00:39:55]: There's so much that you said there. And one thing that stands out to me in all of that, when you have agents going about and you add that extra complexity of just like a swarm of agents, right. And, or a hive of agents that are flying around and doing their own thing and then communicating with each other and figuring out how and when and where you want them to communicate or you want to add. Extra context is complex in itself. I instantly think about there's certain things in the context windows and maybe it's a little less so for coding agents, but if we were to extrapolate this out and say for agents across the web, the whole thing that I think about as you're talking about these complex agent intertwinings and passing around a lot of context from one agent to the other and not really having a superior agent, but even just if you did have a superior agent, that is overlooking it all of it, you know, like the I've Sauron agent, you still are going to have a really hard time being able to identify what is sensitive in all of this information that's being passed around. Like, there's no protocol right now for this is sensitive. And it's not, I'm not even saying like pii. I'm saying like stuff that I wouldn't potentially want going around the Internet, you know, and it's not like my Social Security number.

Demetrios Brinkmann [00:41:35]: It might just be like my YouTube preferences.

Siddharth Bidasaria [00:41:39]: Yeah, no, I get it. I think that is, I think you kind of nailed that one. I think there's two problems. Observability is really hard for complex subagent topologies. And because of that, as a result of that, how you do permission management and how you do that kind of becomes complex too. Um, so like for example, like if in, in cloud code, if you, let's say you, you spin up like parallel subagents. Like you spin up like three parallel sub agents that, that are like going off and doing their own thing for you. Anytime any of them encounters some form of like tool that it doesn't have permission to run, like bumps up and like tells you, asks you, it's like, hey, can I, can I use this tool? And then you like go and say, yeah, so whatever.

Siddharth Bidasaria [00:42:29]: And then it goes off and does its thing. This is not scalable. Right? Like, it's not. It works because, you know, the sub agent implementation we have is quite simple and so it works. But if you have more like complex agent topologies, it doesn't scale. What is the answer? I don't know. I'm not sure. I think it's something that we're going to have to come up with a better way of dealing with this.

Siddharth Bidasaria [00:42:58]: And then permissions is one thing, I think, like once you get permissions to do something, what do you do with that and who do you pass it on to? And like how do you know that that person or that agent will not will do the right thing with it? So yeah, I think it's dynamic.

Demetrios Brinkmann [00:43:16]: Like you can't give permission forever. It's permission right now and then maybe for five minutes or maybe for an hour, but I'm going to revoke that permission because I don't want you to be able to always do that. And so that's a little bit weird too. I, I find permissions fascinating too. Especially again, like extrapolating it out for just regular agents that sometimes I do want to tell the agent to go buy something and I want to give it permission to buy it. But then sometimes I don't and I definitely don't want it to buy the same thing like five times. So if I give it permission once, it potentially has that permission to just keep buying.

Siddharth Bidasaria [00:44:02]: Yeah. I mean, oh, I found a cheaper deal on this Ferrari you wanted to buy $500,000. Boom. Yeah, that would be bad. Yeah.

Demetrios Brinkmann [00:44:13]: Especially if it's the fourth one. It just bought.

Siddharth Bidasaria [00:44:17]: One for each day of the week. I thought that's what you wanted. I'm just trying to help you out.

Demetrios Brinkmann [00:44:21]: Exactly. Oh, exactly, man. Yeah. You can get into a lot of trouble there. I'll use like Claude code amp. Lovable. And what I noticed going back to that permissions part is whenever it comes up and it will ask me, hey, can I do this? I joke like I just will constantly hit yes so that I don't have to review it and be like, I guess I'll know later when it breaks. And it's kind of like the new looks good to me on a merge pull request.

Demetrios Brinkmann [00:44:55]: That's really how I feel whenever I press yes without looking at anything, I'm just like, send it. We'll see what happens.

Siddharth Bidasaria [00:45:02]: Yeah, we've joked about just having this little red button that we sell to everyone or just even just give out to our customers that just says you are absolutely right. And you just press that and just auto accept permissions for you every single time.

Demetrios Brinkmann [00:45:16]: Everything just like take it all. I don't want to be bothered again.

Siddharth Bidasaria [00:45:19]: Yeah, it's like super YOLO mode.

Demetrios Brinkmann [00:45:22]: Yeah, that's Peter level style. That is. But also you were talking about something fascinating with at scale I'm not going to be able to save any time If I have 12 sub agents that are constantly asking me for permission. It's a full time job like trying to hit yes even if I'm not even looking at what it's doing. God forbid I have to like look through the code it generates. Right. But if I am not even looking, I'm still going to have to just constantly be pounding the yes button.

Siddharth Bidasaria [00:45:56]: Yeah, I mean I think my take on this at least right now is that as the models get smarter, we'll be able to rely more on them to do kind of these like, you know, the Eye of Sauron agent that you have is going to get really, really good at understanding user inter intent and providing dynamic permissions. That's the only way that I see the scaling to be honest. I don't see how you can create a really complex web of dynamic settings and Dynamic config and auth and just. But it's going to have a breaking point. The more I'm kind of working with LLMs, I'm realizing that if your scaffolding is just really complex, you will not survive, you know, and then so maybe it solves the need right now and maybe like it is useful for a few months, but looking to the future, I think it is going to be offloaded to some sort of like model that's, that's highly tuned to something like this.

Demetrios Brinkmann [00:47:07]: How often are you going to the researchers and just saying like watch me work real fast and I'll tell you what's painful and then maybe you guys can figure out how to, to fix that. On the model side.

Siddharth Bidasaria [00:47:20]: Yeah, I mean I think we like our team and we're quite in touch with research in many ways. I think that's like one of the advantages of working at a lab is that you kind of have that and what you can come up with then is just like blessed and backed by research. And also it's like a flywheel. So it's not just like one way information coming in, but it's also like what you learn from user behavior and what you learn from product usage kind of like feeds back into research and what the research priorities are going forward. So that's like a flywheel that we're definitely tapping into and we make it a point to kind of tap into.

Demetrios Brinkmann [00:48:07]: That as much as I imagine you also have the boon of being able to see what's coming down the pipe and then see like, okay, is this new model whatever characteristic, can we leverage it in some way, shape or form with cloud code?

Siddharth Bidasaria [00:48:27]: Yeah, I think that, yeah, that's, that's, that's definitely a big advantage. We kind of like have access to, you know, some of the preview models and things like that and that that helps set product strategy kind of a little bit earlier than we would if we didn't have that. Do you?

Demetrios Brinkmann [00:48:47]: And this might be above your pay grade. So. And we'll have PR listen to all this. So they may tell us cut this question completely, but do you foresee a world or is it already happening where there's just like so many resources that are going towards Claude code and just coding models in general that that is now like a gigantic model machine in a way?

Siddharth Bidasaria [00:49:18]: I mean, I think Anthropic has leaned into coding for a few model releases. Now that's not to say that there's not other stuff happening, but there's definitely is kind of a focus on coding just because models are really good at.

Demetrios Brinkmann [00:49:32]: Coding and I would argue the best. I think it's kind of unanimously known that people try, but they're like, it's not as college code is like what you benchmark everything else against.

Siddharth Bidasaria [00:49:46]: Right? Yeah. I think we've been kind of fortunate. And of course, props to everyone who kind of built these amazing models. But yeah, I think that there's definitely a focus on it. I can't say so much about if there's trade offs or things like that. I honestly don't even know, but I, I know that is an area of focus. And, you know, there is a significant number of people who are kind of looking into it.

+ Read More

Watch More

Building the Next Generation of Reliable AI // Shreya Rajpal // AI in Production Keynote
Posted Feb 17, 2024 | Views 957
# AI
# MLOps Tools
Building a Product Optimization Loop for Your Llm Features
Posted Aug 08, 2024 | Views 185
# LLM
# AGI
# Freeplay
How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 15.1K
# LLMs
# Engineering best practices
# Comet ML
Code of Conduct