Sign in or Join the community to continue

Vibe Coding Changed Forever

Posted Aug 22, 2025 | Views 114

# Coding Agent

# LLMs

# Sourcegraph

# Prosus Group

Share

speakers

Beyang Liu

CTO and Co-founder @ Sourcegraph

Beyang Liu is the CTO and Co-founder of Sourcegraph. Prior to Sourcegraph, Beyang was an engineer at Palantir Technologies building large-scale data analysis tools for Fortune 500 companies with large, complex codebases. Beyang studied computer science at Stanford, where he discovered his love for compilers and published some machine learning research as a member of the Stanford AI Lab.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Demetrios chats with Beyang Liu about Sourcegraph’s AMP, exploring how AI coding agents are reshaping development—from IDEs to natural language commands—boosting productivity, cutting costs, and redefining how developers work with code.

+ Read More

TRANSCRIPT

AI Conversations Powered by Prosus Group

Beyang Liu [00:00:00]: So our primary objective is to save human time. You saw a lot of interesting demo videos, but I would say nothing that really worked day to day. The more we played around with tool use models, the more we realized, hey, the assumptions here have changed around what the model is capable of. Users of AMP, which is our new coding agent, generating 80, 90, 95% of their code, we do spend a lot of time thinking about how to nudge the user toward the the right way of doing things. So one specific example of this is.

Demetrios Brinkmann [00:00:38]: When we talked last. What, right when you released a coding agent.

Beyang Liu [00:00:43]: I think that might have been shortly before or shortly after we released Kodi, which I was like.

Demetrios Brinkmann [00:00:48]: I think it was. Yeah, yeah, it was right around when Kodi got released.

Beyang Liu [00:00:52]: So I think that was maybe like a little bit after the initial release of ChatGPT. Yeah, and so that was at least like one AI era ago.

Demetrios Brinkmann [00:01:02]: That's a good way of putting.

Beyang Liu [00:01:04]: And so what has changed since then? Like I would say like at that point the dominant modality of using LLMs for coding was still this kind of like copilot autocomplete mode where you just like type a couple of characters of code yourself and then you get a line completion. And at that point what we were really excited about was the whole like rag model. So we figured out pretty early, like, hey, if you combine search and information retrieval and use that to fetch relevant code snippets in conjunction with a really high quality chat based LLM, it was a really powerful way of doing co generation and technical question answers.

Demetrios Brinkmann [00:01:41]: You guys were uniquely like well suited for that if I remember correctly.

Beyang Liu [00:01:45]: Because yeah, so source graph, of course, like our first product to market was a code search engine.

Demetrios Brinkmann [00:01:50]: And so it's like this makes a lot of sense.

Beyang Liu [00:01:52]: Yeah, so like where our bread and butter is helping developers read, understand search code. And turns out that that's very, very useful still in the AI era. But the big shift since then has been the emergence, shall we say, of coding agents. And really it's the tool use and reasoning models that have driven that or enabled that because a lot of people tried building agents in the era of chat LLMs and you saw a lot of interesting demo videos, but I would say nothing that really worked day to day. But now we have really good agentic tool use models that are enabling essentially a new application paradigm that we call agents that are kind of taking the level of automated code generation to the next level. It's going from 30 to 50%, which is what we saw in the chat based LLM era to now users of AMP, which is our new coding agent, generating 80, 90, 95% of their code.

Demetrios Brinkmann [00:02:59]: Your theory is the IDE environment is already dead.

Beyang Liu [00:03:03]: Yeah, so I think this is like a general theme, which is the application architecture that was made possible by chat LLMs. So think LLMs in the era of GPT 3.5 or 4, those sort of like chatGPT era, there was a very specific type of RAG application architecture that was ideal for that kind of model. And our coding assistant Cody followed that model, so did many of the other tools of that era. So the AI ide, the VS code fork, was kind of the pinnacle of that era, I would say. But what has basically happened with this new generation of LLM is that they have unlocked this new capability tool use + reasoning. So those two together provide this kind of like agentic capability and that in turn unlocks a new set of interactions at the application layer, such that a lot of the UX that application builders built for the old era of Chat LLM is now outdated and in fact I would say in direct tension with the ideal UX of coding agents.

Demetrios Brinkmann [00:04:16]: All right, this is where it gets spicy. So then what does the new world look like?

Beyang Liu [00:04:21]: So the new world is much less manual context management and a lot less kind of like gooey chrome and like different toggles. So I think what you, what we've seen with the, the chat based LLM application architecture, the Ragbot application architecture is just there's so many toggles now. Like you gotta like manually specify through different like rules files and different ways of like tagging in relevant code snippets. There's a lot of UX Chrome around managing what goes into the context window so that the ELM can do a.

Demetrios Brinkmann [00:05:00]: Single shot with agents with that just. Sorry to interrupt. The funny thing there is if you know certain tricks, it performs better. And so there's some people that are doing it really well because they've played around with it and they've been able to tune the knobs.

Beyang Liu [00:05:18]: Yeah.

Demetrios Brinkmann [00:05:18]: And then there's others who are like, yeah, like kind of helps, I guess, but it also kind of messes up.

Beyang Liu [00:05:24]: Yeah, exactly. So there's this kind of like strategy for getting the most out of like a chat based LLM tool. And now a lot of that strategy has essentially gone out the window because with agents, agents have the ability to use tools themselves, fetch context themselves. And so it's much less the human manually managing what goes into the context window and more you describing the higher level of what you're Trying to do and letting the agent go figure that out. The analogy I like to draw is it's roughly similar to the transition from like the Yahoo era of the Internet to the Google era of the Internet, where like, in the Yahoo era, it's like, what did Yahoo look like? It was like a million hyperlinks and the UX was like, I'm clicking through a nested series of links to find what I'm looking for. So there's a lot of pointing and clicking and kind of like manual following of links. There's a lot of like knowledge that I have built in for, you know, what pages are good. When Google came along, they essentially took that whole UI and are like, you don't need that anymore.

Beyang Liu [00:06:27]: Just type what you're looking for and we'll get you to the right thing. And I think agents, coding agents specifically, are very similar in that regard, wherein a lot of the strategies that people develop for getting the most out of chat LLMs, that's like the pointing and clicking in the AHU era, you no longer need to do that. There is a new skill set that you need to learn, which is not as trivial as using Google. It's more like how you prompt agents to do their work most effectively without active human intervention. But that's sort of a very different skill set, often in direct tension with the skill set that people learned in the Chat LLM era.

Demetrios Brinkmann [00:07:05]: With the gui.

Beyang Liu [00:07:06]: Yes, yes.

Demetrios Brinkmann [00:07:07]: Interesting.

Beyang Liu [00:07:08]: Because what the GUI leads you to do is it really leads you to kind of like micromanage the LLM, which was necessary in the old world, but now with agents, you kind of want to just give it the appropriate context, the appropriate feedback loop and let it run.

Demetrios Brinkmann [00:07:22]: Ah, yeah. And you want to let it do what it does and almost give it that freedom that if you're putting too many micromanagement on it.

Beyang Liu [00:07:34]: Yes.

Demetrios Brinkmann [00:07:34]: In a way it's not able to get what it wants done because it's feeling restricted.

Beyang Liu [00:07:40]: Yeah. And the other point or the other thing that happens when you micromanage an agent is that you as a human also get frustrated. So like a common failure mode that we see is people who have over indexed on sort of like the cursor way of doing things. They're like, I want to be in there and like, I want to direct it at every turn because that's what that UI trains you to do. It's like at every turn I want to review the change before I apply it. But with agents, what you want to do is you want to give it enough Context to figure out itself what the right things to search for, what the right feedback loop is. It's similar to like another analogy I like to draw is like the previous generation, it was like the AI was like a coding student. You had to be there to review every single little thing they did.

Beyang Liu [00:08:30]: And they're like, okay, you did this right? Now let's go apply what you did. Now it's gotten to the point where it's more like a professional engineer, maybe like junior or mid tier engineer, maybe even senior in some domains. But with an actual professional engineer, what you don't want to do is you don't want to babysit them. You don't want to be instructing them at every turn. Like, okay, now read this file. Now go do this change. It's more like, hey, here's the overall context. I think you should use this command to run the tests in this case or use playwright to take a screenshot because you're iterating against UI code.

Beyang Liu [00:09:08]: Here's the general shape of the feedback loop that you want to construct. Now go off and figure it out yourself.

Demetrios Brinkmann [00:09:14]: It is very much like a declarative way of doing things.

Beyang Liu [00:09:18]: Yeah, yeah, I would say it's less in the weeds. It's more like, let me articulate at a high level the key points and I'm going to let you figure out.

Demetrios Brinkmann [00:09:27]: The low level and how have you seen the best folks being effective in getting that context that it needs?

Beyang Liu [00:09:39]: I would say there's, there's a wide spectrum of how far people have able, been able to stretch the coding agent. So like the top 1%. We love those users because in some sense they're kind of like discovering the future along with us. So when we look at the AMP user base, a lot of the forward looking things that we build are directly targeted at emergent behaviors that we observe in the kind of like top 1% of users. These are people whose token consumption is like 10x, in some cases like 100x what the median user is. And there we observe a couple different things. One is there's an emergent set of strategies or tips for instructing the agent to get as far as possible. This is like what sort of details do I put in the upfront prompt to enable it to construct the right feedback loops to search for context in the right places.

Beyang Liu [00:10:36]: And then the other thing we notice is more and more parallelization. So AMP is available both as an extension inside VS code as well as a cli. And so there's a lot of people who use the editor extension for more complex tasks, tasks that involve more like complex chains where you want to kind of like remain in the driver's seat, so to speak. And then they'll use a CLI for paralyzing a bunch of shallower tasks. So like you'll have like a TMUX window where they have, you know, three or four AMP CLI instances going on different, you know, shallower issues or bug bug fixes.

Demetrios Brinkmann [00:11:16]: That is so cool, dude. That is so wild to see.

Beyang Liu [00:11:19]: It's crazy.

Demetrios Brinkmann [00:11:20]: And how are you? So I guess you're just doing product feedback with all of these power users.

Beyang Liu [00:11:28]: So we've hired a good number of them, I like to say, like, that's our man. Yeah, so it's a great source of, you know, forward thinking devs, some of, some of whom we've welcomed to the AMP core team. And we also talk to them a lot. You know, we interact over social media channels, we have a discord, we hop on phone calls. But it's great to talk to that set of users because in some sense we're so early right now. People talk about AI as if it were one monolithic block that has the wave is one giant wave since ChatGPT, but it's actually a succession of multiple waves, I would say. And we are so early on the Agentic model era that a lot of our product development process is really kind of like partnering or sitting down with our power users and discovering alongside of them what the possibilities are.

Demetrios Brinkmann [00:12:27]: Is that how AMP came to be? Because you saw that folks were clicking around too much and you realized maybe this isn't the best UI and UX that we can have.

Beyang Liu [00:12:39]: That was a big part of what motivated us to build something from the ground up. So we had Cody, which is an assistant that was really good in the CHAT LLM era. But the more we played around with tool use models like Sonnet 3.7 and now Claude 4, the more we realized, hey, the assumptions here have changed around what the model is capable of. And if you're holding it properly, you can actually get a lot more out of it than you could in the previous era. The problem is that a lot of the old UI paradigms are kind of like actively working against you getting the most out of coding agents. So this is something that we kind of realized in using it ourselves heavily and also talking with a lot of our power users. In fact, one of the folks that we hired, this guy by the name of Jeff Huntley, he was at Canva at the time, he actually wrote a Blog post about how he thought most people were using AI coding tools incorrectly because they were still kind of using it like you know, Google search or in a very like chat based paradigm. And we brought them onto the team because we're like, this is a guy that gets it different and that really understands, hey, you should be instructing these things.

Beyang Liu [00:13:59]: You should be. It's almost like you're programming them through natural language, if that makes sense. So like you're, you're articulating a very set of, you're articulating a set of very precise instructions in much the same way that you would kind of like articulate those instructions to a smart but still junior engineer. So you're giving a lot of context up front and you're allowing them to get much further on their own.

Demetrios Brinkmann [00:14:27]: And how about the idea of just validating when code is working or not?

Beyang Liu [00:14:33]: The beauty of agents is that they have this kind of like built in ability to construct these feedback loops. And so when you're using AMP for instance, when it's generating code for you as part of that co generation process, it will seek out an appropriate feedback loop. So if you're doing front end code, it can use a tool like Playwright for instance, to screenshot the front end of the application as it's working. So you say like, hey, go make this background red or green or blue can actually take a screenshot and verify whether a change it made to the code had the intended effect. Similarly for backend code, it might be like a unit test suite or some other command line invocation that it can use to validate whether it did was whether something it did was correct in much the same way that you as a human developer would seek out these feedback loops, right? It's like re evaluate, print that sort of like core loop. That's what agents are good at figuring out. Some cases they need a little nudging, just as you know, humans need need a little nudging or some pointers in some cases. But by and large, if you can get the agent to figure out that feedback loop with, with very high confidence, it will iterate to something that is, is mostly correct.

Demetrios Brinkmann [00:15:58]: Why do you feel like coding agents had this breakout success and were uniquely positioned for such a lift with LLMs and the whole AI revolution?

Beyang Liu [00:16:11]: I think the answer to that question comes down to the immediately preceding question, which is how do you validate something is correct? And coding is one of those domains where you have a very strong validator in the form of a compiler or unit test runner. And because you have that validation point. You essentially have a way, a very reliable way to generate high quality synthetic data. So model evolution is ultimately a data game. And there's two ways to acquire data. You can either collect it from the wild, or you can create a synthetic learning environment in which you place your kind of like robot or agent in there and allowed to do stuff with feedback about what's good and what's bad. Sort of like reinforcement learning environment. And I think at this point we've exhausted the amount of publicly available large corpy of data.

Beyang Liu [00:17:14]: So, like, you know that. That those sources of data are largely played through. But coding is one of those domains where it's like you can create a simulation environment with unit tests.

Demetrios Brinkmann [00:17:25]: And are you guys doing that?

Beyang Liu [00:17:28]: To a certain extent. So for certain special use cases, we don't do foundation model training as of yet, but for certain targeted use cases, we do that sort of like validation and training.

Demetrios Brinkmann [00:17:44]: I'm just been hearing about how more and more people are doing simulations. More. It's more common to do that just to figure out where you have strong capabilities and where you maybe are failing silently sometimes even.

Beyang Liu [00:17:59]: Yeah, it's essentially what you're doing is you're designing a game that approximates what you want in real life. So in all the domains where AI has gotten really good, think about playing chess or playing some other form of game, it's because you have this feedback mechanism that tells you, hey, you're winning or you're losing. As long as you have that feedback mechanism, you can turn that into a reliable source of training data. Because essentially what you do is you take your model at a given snapshot and then you just run it. You say, go play the game. You simulate the game, and based on the moves that the model takes, you say, like, okay, plus points or minus points. And that's essentially what you're doing in these coding reinforcement learning environments. When you say like, oh, compiler error or oh, unit test failure, I like.

Demetrios Brinkmann [00:18:48]: That way of looking at it. You just sit around all day and you're thinking of a knowledge. Thinking of analogies, huh?

Beyang Liu [00:18:55]: Yeah.

Demetrios Brinkmann [00:18:57]: The thing that I'm also wondering is it feels like we had a big jump from these rag chat bots and the way that we were copilot writing code to then, all right, we're in the IDE and we are doing this almost like click ops type of stuff and very micromanaging.

Beyang Liu [00:19:17]: Yeah.

Demetrios Brinkmann [00:19:18]: Now you're saying we've got a whole new era that's being born with amp and how you're Giving it this context, as much context as possible, and then letting it do its thing. Is that the last era or do you feel like there's another one that you want to get to? It's just not yet possible or it's in the works?

Beyang Liu [00:19:38]: I don't think we're in the the final era. I think things will continue to evolve. So, you know, one of the things that we're doing is we're thinking about how to combine multiple models effectively in this new agentic paradigm. So in the old world, the name of the game was simple Rag. So every AI coding assistant had a model selector where you're just like whatever model you want to use, you can use that and then we'll just fetch the relevant snippets, put in the context window and generate the response. I think in the agentic area you have to be much more thoughtful about the models that you use. So we use one model for the core kind of like tool use an agentic driving of amp and we just shipped a feature that allows you to use another model, O3 actually for in depth reasoning because turns out there's certain types of nuanced problems that you might want to tackle where these reasoning heavy models can do a lot better than the models that were kind of trained primarily for agentic tool use. So that's one way in which the paradigm continues to evolve.

Beyang Liu [00:20:46]: It's now moving beyond just simple agents to maybe like reasoning agents or agents that can use different types of models to do more things in depth.

Demetrios Brinkmann [00:20:57]: And you want to abstract that away from the user or you want to have it that every time I go and I give a task to an agent, I can say here's your three or four models you can choose from. You figure it out.

Beyang Liu [00:21:10]: I would say if you we want to enable kind of like a spectrum of use. So for the first time user, you know, you don't have to know that we have this. So the tool is called Oracle. Because O3 is such a powerful reasoning model. It's like talking to an Oracle of sorts. And you know, we don't want that to be a prerequisite to being able to use amp. So if you don't know about what tools AMP has access to, you don't need to. It will just select what it thinks is the best tool for the job.

Beyang Liu [00:21:43]: But at the same time, instructing coding agents in our view is a pretty high ceiling skill set. So you can get good at coding agents in the same way that you get good at your editor of choice or you get good at your programming language of choice. And for our power users we do see prompting or query patterns where they're saying like, okay, I want you to use the Oracle in this case because this is a little bit of a hairy problem. It's more nuanced, I want some more in depth thinking so there is some exposure. But it's not at the point where it's like okay, decide what LLM you want to use for this case that is now an implementation detail. And I think it's almost. It was the best practice to expose that to the end user in the chat LLM era, but now I think it's a more not even needed.

Demetrios Brinkmann [00:22:31]: Yeah, it's funny, are there any other anti patterns that you're starting to see and then maybe surprised you?

Beyang Liu [00:22:38]: The number one anti pattern is people trying to use coding agents in just the same way that they use the chat based coding assistance. And I would say those anti patterns largely fall under this umbrella of in the chat based world you wanted like the human had to be in the inner loop of like back and forth between you and the model out of necessity. Right? Because each model invocation was like a roll of the dice. And in the chat based world the probability of it just working was probably lower than 50%. More likely than not it would make some subtle bug and it had no way of correcting itself because it couldn't iterate against feedback, it couldn't use tools and so it couldn't fix its own mistakes. And so as a consequence you wanted to be in the loop, so to speak, as a human, to constantly course correct it with agents. It now has the ability to gather that feedback on its own. And so if you instruct it properly, in many cases the fidelity you get from a single model invocation, like a single file edit or a single bash command that it runs is closer to 90, 95, 99%.

Beyang Liu [00:23:59]: So you can get out of the way much more if you use it properly and it can do more for you. But that almost requires almost like an active rejection of a lot of the best practices that people learned in the chat based LM era. So it's almost ironic some of the people who are struggling the most to use coding agents effectively were the ones who early adopted chat based coding tools.

Demetrios Brinkmann [00:24:31]: This little microcosm of the macrocosm or it's a. Yeah, just let go. Yeah, trust the process man, it's gonna work out.

Beyang Liu [00:24:40]: Yeah.

Demetrios Brinkmann [00:24:42]: Oh, that's hilarious. So I had, I had a question about that. I can't remember now. It's not coming to me because I was thinking about trusting the process and not the. Hold on, this is the power of post production.

Beyang Liu [00:24:54]: We can trust the process.

Demetrios Brinkmann [00:24:56]: Yeah, trust. Let me trust the process of my question asking capabilities. So you mentioned the different power users and how they're paralyzing different things, and that seems to be one very advanced way of doing it. I wonder if you've found nice tricks other than that that maybe aren't the power user tricks, but it's just in the way that you're prompting or you're asking the agent to do things. Like, my mind instantly goes back to the early days of ChatGPT and we started asking it like, think step by step on this. And everybody was like, whoa, it's so much better when you do that. And have you found any of those almost like prompt tricks or maybe there's other tricks that aren't even in the prompt or in the way that you're asking?

Beyang Liu [00:25:45]: Yeah. So by and large, and the best way to discover these things is really through experience. So I will do my best to tell them to you in the moment, but it's no substitute for actually using it and kind of building the intuition. But in my experience, there's kind of like three buckets of prompting tricks. There is what I would call context hints. Number two would be feedback loops. And number three is kind of like structured approaches planning. So the first bucket is really about helping the agent figure out what tools it needs to invoke or essentially where to look for the relevant context.

Beyang Liu [00:26:35]: So especially in large code bases, oftentimes it can be a little bit tricky, even as a human, to find the exact spot that's relevant to a particular task. And so agents, agents, agentic LLMs, the context windows these days are much larger than they used to be, I think. Sonnet 4 has 200,000 tokens total. Gemini now has a million. And so the context windows are larger, but they're still finite. And what that means is the more information you give to the model about where to look, the fewer tokens it has to expend, finding the sort of general vicinity of what's going to be relevant. And so the more hints you can say like, oh, look in this part of the code base, or I think it's under these directories, or maybe use this tool to fetch the context. That can help a lot.

Beyang Liu [00:27:39]: The second thing is feedback loops. So I was telling you before about feedback loops and how they're critically important. These are essentially like, hey, use this tool or use this command to Validate your approach. Oftentimes, it will infer the appropriate tool to use on its own. But in cases where it's not trivial to figure that out again, you can save on the main context window by nudging it in that direction. And then the third approach is just adding structure to the overall approach it uses to solve the problem. So in the simplest form, this is just like, hey, before you go do this large and complex task I'm about to give you, first write out a plan of steps and maybe even let me as a human review that set of steps so I can ensure that they're correct. And then more and more, we've sort of built additional features into the product where knowledge of those features can help.

Beyang Liu [00:28:41]: So one of the things that we've shipped, well, I guess not so recently now. It was like a month and a half ago. So it's like eons ago. In AI land, we ship subagents. So sub agents are essentially, as the name suggests, they're agents within an agent. So the main agent can invoke a sub agent to go do a subtask, like searching the code base or going and implementing a feature in one part of the code base. And so having knowledge of what sub agents are good at and kind of like nudging the main agent to use them where appropriate can help you conserve the context window. Because the beauty of sub agents is once they complete their subtask, they don't essentially like the tokens they use.

Beyang Liu [00:29:31]: The context window gets garbage collected. They don't use up the context window of the main agent. So that allows you to get further in complex tasks because you're essentially chaining together these subagent calls that don't eat into your overall token budget in the main agent. That makes sense.

Demetrios Brinkmann [00:29:48]: Yeah, I've heard that described as agents as tools. A lot of people are. That's like the hot buzzword these days. It's like, oh, agents as tools. It's coming.

Beyang Liu [00:29:57]: And yeah, everything is a tool, and tools are just function calls at the end of the day.

Demetrios Brinkmann [00:30:02]: Yeah, I was laughing with a friend because I was saying, you know, even humans are tools at the end of the day. When you're asking for the human to give you the feedback, it's like, invoke the human tool.

Beyang Liu [00:30:12]: Yes, yes. It's like, is the agent the tool or am I the tool for the agent to get his job done? Sometimes the line blurs a little bit.

Demetrios Brinkmann [00:30:20]: Yeah, yeah. The part that you just broke down really well on how to conserve that context window in A way so that you're not using it all A, because it's finite and maybe you don't have the ability to throw everything at it. But B it's really good for the cost and keeping the cost lower. I can imagine when you're looking at agents and folks that are using agents, I think there's probably two lenses that you look from, you look from like okay, the consumer is trying to keep their costs low, but they're interfacing resource graph in a way and amp. And so yeah, you also have to be wary of cost and passing on the right cost to the users and the pricing and all of that fun stuff.

Beyang Liu [00:31:10]: Yeah.

Demetrios Brinkmann [00:31:11]: How are you looking at all of these different costs and how do you feel like we've all heard this idea that oh well, LM calls are just basically going to zero. Right. And so it's just getting cheaper and cheaper. But now if you're talking about agents using sub agents that are doing super complex tasks, yeah, it still could be like 50% or 50 cents for tasks to get done and, or maybe even five bucks, who knows?

Beyang Liu [00:31:40]: So our, our primary objective is to save human time because that is still the most valuable resource by a huge margin. And so one of the core principles we we've adopted is essentially to not, to not worry too much about keeping the cost of the agent super, super cheap. So you know agentic coding tools, they look expensive relative to chat based coding tools. Like your average token spend is growing from on average 10 to $20 per user per month to in the hundreds, in some case, in some cases in the thousands of dollars per user per month. Like when, when we talk about the top 1% of users who are really like redlining the what the model can do and what the coding agent can do, oftentimes they're pushing into thousands of dollars per month territory. And so that looks expensive to people. But if you look at the amount of human labor that's being saved, given how productive people are with these tools, it's like a no brainer trade off. And I think a lot of other tools, I think they narrowly focus on, they over index on like hey, how can we keep the cost low as compared to the chat based LLM era? And I think that's a very poor trade off to make because it's kind of like saying like, hey, I have this magic tool that can save you the human hours of hours per day.

Beyang Liu [00:33:18]: Your time is super valuable. Like human developer time is still by far and away like one of the most precious resources within your Engineering. Org. And even if you're spending like $1,000 per month, that comes out to what, like 30, 40, $50 per day? Like how many minutes of human developer time does that translate to saving? And so I think what you're going to see is more and more people start to have this realization over time that, oh, I shouldn't be thinking about the baseline cost of this because really the upside is far greater. The amount of additional productivity and the amount of additional feature velocity that I can unlock in my engineering team with coding agents is far greater than the cost that I will be paying for them. So like right now you still see CFOs and people in accounting and the finance department being like, you know, oh, it's, it's difficult to forecast, but I think it's just going to play out over time where the companies that trust the opinions of their, of their developers and encourage people to make the most out of coding agents, they're just going to move much quicker and over time the market will reward the people that prioritize developer. Developer productivity. Essentially.

Demetrios Brinkmann [00:34:38]: Yeah. Do you feel like we are not going to be doing much of the or actually, so I play around, I've played around with a lot of the different tools and I feel like there is something that's happening right now where I no longer want to click around and get things done. I also don't even want to like code to get things done. I just want to say it and then it goes and does it for me. Yeah, that is very easy to do with almost coding tool type things.

Beyang Liu [00:35:19]: Yeah.

Demetrios Brinkmann [00:35:19]: Like if I say, hey, connect my website to this database. Great, that should be possible.

Beyang Liu [00:35:27]: Yeah.

Demetrios Brinkmann [00:35:28]: I don't know if there is a possibility for the world to look like this in the future, but I would love it if I could do that with any application. Yeah. I no longer want to have to write out the or click around to get things done. And I almost feel like we are getting spoiled in a way with the software being able to do it when you're coding or when you're creating your application with whatever. But like, are we going to be able to do that in Jira and Confluence at some point? Are we going to be able to do that with just like voice next? Right. Like you got to think it's going there.

Beyang Liu [00:36:17]: So I saw a tweet the other day. It was something to the effect of the gui. The graphical user interface was a blip in between command lines and agents, which I think, you know, range very true in the sense that I think it's exactly what you said. I think what we think of as a software application today is going to look much different in a few short years from now. So it's not that I don't think visual interfaces are going to go away entirely. I would, I would say precisely what I think is going to happen is that the primary input modality to computers or to software applications is going to shift from graphical modes of input like pointing and clicking, to more textual forms of input like typing or speaking. Now the output, what you get back from the computer might still be visual. It's like I type in what I want and then show me the results.

Demetrios Brinkmann [00:37:23]: Maybe like Airbnb listings with photos. Yeah, exactly. Certain things that chat cannot describe.

Beyang Liu [00:37:30]: Yeah, exactly it, like, I don't want to read all that text. Just show me like a picture. Picture is worth a thousand words. But in terms of like, how, like articulating what I want out of the application, I, I, I just think of it in terms of like bit rate, right? Like what, what, what's the bit rate of these input modalities? Pointing and clicking is a very low bit rate. It's like couple bits per second at most because you have to like, you know, takes time for you to drag your mouse cursor to the right button and then you got to, it's like think. It's a very primitive form of communication. It's like monkeys and apes do that, right? Humans, we have this innovation called language and language is beautiful because it's, relatively speaking, a high bit rate form of communicating our intent. And now computers can actually understand language and also translate that language to a series of actions that actually perform what you want it to do.

Beyang Liu [00:38:24]: And so to your point, I think a lot of the application input experience is going to shift toward textual or voice driven forms of articulating what you as a user want.

Demetrios Brinkmann [00:38:38]: So what are some gnarly things that you encountered while building out? Amp, maybe on the infrastructure side, maybe not necessarily the users and the evals and all of that stuff, but stuff that you, as you're building it out, you're like, oh shit, man. Like, I didn't think it was going to be this hard.

Beyang Liu [00:38:57]: Yeah, there's a lot of like trickiness and nuance to designing the user experience. And I think it really requires thinking from first principles, what you're after. So like we've given a lot of thought to how to bake in the appropriate feedback loops, help it get to the right feedback loops, help it to get out of like common failure modes where it kind of like loops and tries the same thing over and over again. That's, that's a common issue with, with a lot of agents, how to conserve the context window. We've given a lot of thought to that. Where to use the appropriate models, like what models are best for, for which task. But these are very different questions than the questions that we traditionally asked around UX design. Because traditional UX design is very visual.

Beyang Liu [00:39:48]: It's like, you know, how do I lay out the, how do I lay out the button panel, so to speak. Right.

Demetrios Brinkmann [00:39:52]: What color should the button be?

Beyang Liu [00:39:54]: Yeah, exactly.

Demetrios Brinkmann [00:39:55]: Whereas people, I remember reading a blog post on how people look at web pages and there's like the F shape of the, where their eyes go and the attention goes.

Beyang Liu [00:40:05]: Yep. And so, you know, like AMP doesn't have any of that really. Like the, the input interface is very simple. It's just like text box and like, you know, write what you want or some cases, in some cases people like to do the voice mode. So they'll use like the, like macOS, like voice input to speak to the agent. And that's the primary way of getting what they want. But it's a very different set of questions. So you can't rely on the rules of thumb that people developed in the kind of like point and click GUI world.

Beyang Liu [00:40:41]: You really have to think from the ground up, like, okay, I, as a user, what do I actually want? How do we want to guide the user to this behavior that's kind of like new in this new kind of like paradigm that unlocks the capability but still feels familiar enough. Fortunately, developers are accustomed to using command driven interfaces a bit more than the average computer user, so that helps. But we do spend a lot of time thinking about how to nudge the user toward, toward the right way of doing things. So one specific example of this is AMP is maybe one of the only. I think we're one of the first agents where typing enter in the input box does not submit the query. So typing enter just introduces a new line. You have to hit command enter to actually submit your request to the agent. And the reason we did that was it was a subtle nudge to encourage users to create longer prompts.

Beyang Liu [00:41:46]: Because the more information you give the agent, the more reliable it becomes, the more it can do for you. And so that was like a subtle nudge to users to say, like, look, don't stop here, don't stop here. This is not Google Search. Don't type four keywords and then expect it to read your mind. In many cases, getting to what you want, especially if what you're trying to do is more sort of like out of band or unique. Right. You actually have to give it the information because it's not a mind reader. The information has to come from somewhere.

Beyang Liu [00:42:19]: Either it's baked into the priors it learned during training, or it's going to be embedded in the words and tokens that you give it.

Demetrios Brinkmann [00:42:29]: Right now, did you think about adding certain shortcuts or hotkeys? And I've seen this done where you have the prompt box, but you also have little boxes underneath that you can click on where it's like, here's some common workflows or some common questions, that type of thing.

Beyang Liu [00:42:48]: So I think this is one of the things that we did very differently. It was sort of like a contrarian take. We actively didn't want to add additional toggles or modalities at the bottom because, number one, that's like mental overhead and it makes it so you have to point and click again, which is, I think we now live in.

Demetrios Brinkmann [00:43:08]: Trying to get away from.

Beyang Liu [00:43:09]: Yeah, we're trying to get away from that. Like, it's the age of agents. You should just describe what you want and be able to get what you're looking for. The second thing is that with a lot of other applications that do this, the more toggles and switches you add, they essentially exponentially complexify the interface surface area of your application to the point where if you introduce a single binary toggle that. Other coding tools, for instance, they have an ask mode or an agent mode or a planning mode. If you introduce a toggle that has three different modes now, you essentially have three different mini product experiences that are all very different from one another. If you introduce another toggle, again with three modes now it's three by three because you have this combinatorial like, okay, what if I choose option one from the first, toggle option three from the second. Now there's like nine possibilities for what the product experience looks like.

Beyang Liu [00:44:09]: And it's very hard to maintain a high quality product experience if all your users are using essentially different products, right? Different product experiences, because they all have some different configuration that they're using.

Demetrios Brinkmann [00:44:22]: And how about the times where you want to zoom in on specific things? Because I know, like if you're. If you mess around with Lovable, they have the. The bullseye and you can click on that and it's this new mode that, all right, it's got the specific parts of the web page that I can click into and then prompt it so that it changes that. Yeah, maybe you thought, well, we don't need this, we're just going to have the user put that into the chat.

Beyang Liu [00:44:49]: Bot the way we like to do it. So there's some like capability or some behavior that is more specific to a particular use case. The way we like to do it is put that into a form of a tool that the agent can use and then enable the agent to invoke that tool in the right situations or enable the user to nudge the agent to use that particular tool in the right situation. Like hey, go use playwright for iterating on this ux. And the reason why we think that is better than modalities is that the complexity of tools, it's more like linear growth and complexity rather than combinatorial growth and complexity. So each additional tool, it's not like, it's not like modes where it's like bless you. Each additional tool that gets added, it's just another tool that can be used. It's like O of N in terms of complexity.

Beyang Liu [00:45:50]: Whereas if you added a new modality now it's like M by N by K. You get this kind of exponential blow up in terms of the search.

Demetrios Brinkmann [00:45:58]: And so you have tools that are standard off the shelf for the users to use and there's a description and the documentation about it.

Beyang Liu [00:46:05]: Yep.

Demetrios Brinkmann [00:46:06]: But then users can also bring their own tools, I imagine.

Beyang Liu [00:46:09]: Yes. So we have three kinds of tools. There's the built in tools that are just, you know, as the name suggests, built in, you know, basic things like reading and writing files and executing BASH commands, as well as more advanced built in tools like the Oracle that use different models for advanced reasoning and other use cases. There are what we call connections, which are tools that call out to third party APIs. So think about like you know, bringing in your issue tracker or bringing in your observability tool, pulling in additional context from those sources. And then the third type of tool is MCP Server. So MCP is you know by now like everywhere. Right.

Beyang Liu [00:46:50]: And so the beauty of that is that you have all these different other tool builders out there that have built MCP servers that front their application or their service that can then be pulled in to to amp.

Demetrios Brinkmann [00:47:05]: How are you thinking about the tools that you support and are natively giving to users versus the other two.

Beyang Liu [00:47:15]: Yeah, So I would say all the tools that are necessary for the day to day core inner loop of software development where it's like you're in the code, you're reading and understanding the code you're writing code, that's our bread and butter. So we like to bake those in as really good first class tooling experiences. And then there's also like a second wave of third party tools that are just like so common that we also want to make sure that those integrate well. And that's why we have these like connections, these kind of like first party connections to third party services. So like bringing in things like linear or GitHub or Sentry. The way we bring these tools in, the description of the tool and the set of arguments and how to invoke them, it's very difficult to abstract fully. It's not like you can have the same set of tools that work really well in a coding agent as works well in, I don't know, like a generic enterprise knowledge retrieval agent or whatnot. And so we try to refine the tool definitions for those.

Beyang Liu [00:48:27]: But then we also recognize that there's going to be a long tail of things that people want to integrate. Especially because we serve a lot of the Fortune 500. And each Fortune 500 code base, it's like its own special environment with all its different internal tools and unique combinations of external tools. And so we also want to enable our users and customers to build tool providers, MCP servers that connect out to the unique tooling environment within their own company. Bring that into AMP as well. Of.

+ Read More

Watch More

KServe Live Coding Session

Posted Oct 27, 2021 | Views 598

# Prosus.com

Deploying Autonomous Coding Agents // Graham Neubig // Agents in Production

Posted Nov 22, 2024 | Views 1.3K

# Coding

# Agents

# Autonomous

Building Cody, an Open Source AI Coding Assistant

Posted Aug 28, 2023 | Views 855

# Open Source AI

# Cody

# Sourcegraph