MLOps Community
Sign in or Join the community to continue

Coding with AI // Chip Huyen

Posted Nov 21, 2025 | Views 214
# Agents in Production
# Prosus AI
# AI-assisted Coding
Share

Speaker

user's Avatar
Chip Huyen
Researcher @ Tep Studio

Chip Huyen runs Tep Studio at the intersection of AI, education, and storytelling. Previously, she was with Snorkel AI and NVIDIA, founded an AI infrastructure startup (acquired), and taught Machine Learning Systems Design at Stanford.

She was a core developer of NeMo, NVIDIA’s generative AI framework.

Her first English book, Designing Machine Learning Systems (2022), is an Amazon bestseller in AI and has been translated into 10+ languages. Her new book, AI Engineering (2025), has been the most-read book on the O’Reilly platform since its launch.

+ Read More

SUMMARY

This talk covers an overview of AI coding tools and different levels of coding automation. It also discusses workflow patterns that have emerged and how they will change over time.

+ Read More

TRANSCRIPT

Chip Huyen [00:00:03]: Okay, so today is an Agent Agentic event, right? So I think that's one of the biggest use case for AI agent is like coding agents. Because I'm extremely excited about. I feel like I can't, not really. I know I. I feel like I'm one of those people where serious engineers hate when I say that oh, I cannot really code without AI anymore and I want to talk about it. So. So this is a bit tricky because I usually try to get my talk in front of the audience because one of the fun part about AI coding agent is that we're still learning about it and we still want a lot of interactions like discussions like what is going on. So I like to get feedback from the audience and Sine Cana CEO, I don't know, maybe you have any feedback or thoughts about the process, feel free to send me an email like on my website, on LinkedIn or Twitter.

Chip Huyen [00:00:49]: Okay, so today I'm going to talk about coding with AI and usually when I ask people who has been using it for coding at a tech conference, it's like almost, almost everyone is using it. And also from the hiring manager perspective, someone recently told me that if he's interviewing a candidate for software engineering role and that person has not experimented with AI coding before then he would consider that a red flag. And the reason is that he feel like okay, it show that that person is not eager is not willing to adopt new technology. So when I asked this question online, some people said that not everyone has opportunity to try a new tool and that's totally reasonable. But regardless of what's your view about AI forecoding is I would say that AI for coding is everywhere and if you have not experimented, look at that it might actually hurt you more than help. So for AI coding there are very many different types of toolings and interfaces for AI coding tool. And the most popular one or the earliest one is probably the IDE based. When you have a coding editors like Vs code cursors when you can do auto completions and stuff.

Chip Huyen [00:02:07]: And there's the next generations of AI coding tools that I really love is CLI based terminal. You open the terminal, it type in a cloud codecs and it can help you do stuff. And another interface very interesting that I was so really excited about but I have not seen a lot of use. It's like GitHub based. So the idea is that let's say you have the GitHub repos or maybe GitLab repository online and maybe somebody posts an issue so you might invite a coding agent in and say, hey, given this code base and this issue, maybe you want to try to submit a PR to address the issue. Or if a pr, you might invite an agent in to review it. And of course the last one is like web interface maybe. So it's especially common with web applications.

Chip Huyen [00:02:55]: So let's say you have a mock up of the interface that you want. So you can put it inside the, you can put it inside the, you can post the image in this tool and then I can have you generate an application with the interfaces you want. So when you talk to companies, I usually ask people, hey, which interface that you prefer? So this is one of the companies. There's quite a lot of engineers, it would be hundreds to thousands of engineers on the call. And they did this survey when I was asked and the vast majority of people say, oh, it's ID. And then @ the same time I also want to ask my friends. So I actually put all my friends on a zoom call and I made them answer the same questions. And I was like, wait a second.

Chip Huyen [00:03:39]: So the preferences can vary widely from people to people. And there are several reasons for it. So first for the company, they told me that, oh, we have only adopted IDE based toolings. So that's why people have not had experience showing up other tools. So, yeah, so they prefer the ide. And when I talked to my friends, they told me that, oh, it's because my preference actually evolved over time. So when people work right, like I do, there's a book I actually really like. The title is the Principle of Least Effort.

Chip Huyen [00:04:15]: The idea is that as humans, when we face with a challenge, we try to progress from a solution that requires less effort. And when it doesn't work, we try out something with more effort like that how a lot of people do, but some people just like to suffer and jump straight to the most important. And that's totally fair. Yeah. So the least important way, let's say you have a coding challenge, right? So the least important solution is that you might start with something that extremely, that's not extremely automated, right? Like, hey, here's a spec. Build something, do it like, so you might want to give to like clock code or, or Codex or Gemini Cli and it works. And then if, if it doesn't work, then you might want to actually, then you might actually have to jump into the code base and look at it. And that's why you need to open the, the editor and actually look at the code.

Chip Huyen [00:05:05]: And that's when you use something like cursor or like other stuff. So people told me that the more they use AI, the less time they spend on actually looking at the code. And so that's where why their preference for ID coding tool actually reduce. So I think when we will talk about AI coding tool, one question that's always arise is like how do we measure productivity? Because with coding before we have traditional metrics like engineering time spent, I'm not sure about you, but when we used to make design documentations or API doc, it's like okay, how much time did this take? This time will take maybe two weeks of engineering time, three weeks of engineering times. And maybe some people use the thing of lines of code to measure the code based complexity. However, this metric don't quite hold up in the era of AI. My first engineering time is not the same as mental energy. And I want to explain this more because it's a little bit more abstract.

Chip Huyen [00:06:06]: So for me, right, like let's say a task take a day. If I have to spend all my energy or spend, if I pay your attention just for the whole day, then I cannot do anything else. But let's say there's a task to take a day, but all I have to do is that like I enter the task, I give the task to the agent and then I go and do something else and I check back like a day later and see the result. So even though it takes a day, the mental, actual mental energy I spent on it maybe just like maybe half an hour or like an hour and I'm totally fine with it. So I realized now with like with AI coding agents, I'm okay with things taking more time if it require me, if it require fewer mental energy from me. Because if I don't have to babysit it, I can actually spend time doing other stuff and I can actually spin off like instead of just spin up, do like one task at a time. I can do like 20, 30 or like as however much money I want to spend on it. So actually allow me to do things a lot faster.

Chip Huyen [00:07:01]: I can like start a lot of tasks, go to bed, wake up and get a result. Another thing to lies of code, AI is really really good at generating new code. Whether the code is good or not is another question. But AI is extremely good at generating new code. Before when we had to write code by hand, we feel very protective of it. We said okay, if we need to fix an error, we want to reuse as much of the code we had written as possible. But now a lot of poses try to get rid of it. Like Sometimes with exiting code base it just becomes too hard to try to fix it.

Chip Huyen [00:07:35]: So we just try to generate like just get rid of the code base and like build new code base instead of fixing exiting one. So okay, so I cannot see anyone so I'm not sure how we are doing but I'm going to continue. If you have any feedback feel free to reach out on LinkedIn or email or Twitter or whatever. So another thing that another metric that I look into is when we talk about coding is automations. Because coding AI tooling is to help us automate our work and I do want to understand how autonomous AI coding agents can be. So I started looking into this framework. So this is borrowing so I borrowed this framework from Graham Newbig who is an own hands creator and a CMO professor and he in turn borrows this framework from. From self driving car.

Chip Huyen [00:08:26]: So we said we to track the progress of self driving cars. We have different level automation to see where the industry is at and how far away from the holy grail of automations. So number one is autocompletions. It's when AI only suggests suggestions for code like auto compilations and you can accept or not accept it. So we have been doing that for a while. I think it's the first tools I use was back in 2017, 2018 so it's not new, it's been around for a while and it's still very strong with cursor and stuff. Then you still have to write code by hand but for very very specific tasks.

Demetrios Brinkmann [00:09:07]: I just want to say that and I'll pop off. Bye.

Chip Huyen [00:09:09]: Oh, that is interesting. I would love to see if there's a zoom call. I'm curious to see the audience here, what preference you have. Okay, so we're talking about level two partial automation. So you can write specific tasks like writing documentations for certain funct. Then you can use that AI can do that for you but it's very limited. And the next level is my conditional automations. When yes you can do full automations for a wide range of tasks but still pretty specific.

Chip Huyen [00:09:37]: First of all building a new features or a new app from scratch as the next level is very high automations. So the dev4 would be AI can do it. It's just for very specific type of target AI fail like let's say that you have like a legacy software with like a lot of Oracle or like Hubble software and maybe, maybe I cannot do it for you or if you want to do it very low level like Cuda like Cuda optimizations there's so many people trying and they realize that they are not a lot of code, like good cuda code to train the model on. So this task is still pretty hard. And of course level five is like full automation needs to do like everything. And I would say that nowadays most tools are between level one and three and I would be very excited to see when we reach level four. Have not seen level four yet. So when we talk automations I was like okay, so what is high automations? Is there a way we can a threshold or if you reach that threshold you can say it's high automations.

Chip Huyen [00:10:39]: So I also borrow this concept from self driving car which is like the interruption rate. So let's say you have like a Tesla and you turn on full self driving mode and let's say that you would turn it on for like a week and you realize like okay, every 10 minutes I have to like take control of the car because it's doing something dumb. Then you could constantly be in this like paying attention mode. You have to understand okay, how is this going? How is this it cannot relax. Like you have to like constantly pay attention. But let's say you turn on self driving mode and you can go on it for a month without any, you have to jump in to take control of the wheel. Then maybe you can say okay, now I have a lot more trust in this car and I can just like let it go. So we can have a very similar concept with like AI coding agents.

Chip Huyen [00:11:26]: So let's say you assign a task to the AI to write, to fix and to fix an issue and you say okay, it's doing the spider to jump in. I can just like comfortably check out the result. When it's done then you have high confidence of it. So when I was about this and actually there's this tool to measure my own interruption rate with AI coding agent and I tweeted about it and actually got quite a lot of support. Not industry wide, but quite a few people messaged me. That's the idea adopting these metrics. For example, Chat is a CEO, the CEO of kanji.dev which is like AI coding tool and he was talking about the intervention rate is as a new build terms. So in the early days of software delivery you want to measure how fast, how long it takes you to build and deploy a software because the shorter the build time the faster you can iterate and it takes.

Chip Huyen [00:12:23]: The intervention rates are very similar like AI automations for coding. So there's reason why it's important to reduce Interruption rate first, if you don't have interrupt AI, if you don't have to spend mental energy, then I can do more things. I'm not sure about you, but I realized that I'm really really bad at parallel processing when I was babysitting agents. I cannot keep track of more than three at a time. And I have a friend who can do it like five at a time. But I haven't seen anyone who can keep track of so much contacts across many task efficiently. So I think the highest interaction rate is actually limit my ability to do a lot of things at the same time. Also it's like if you reduce interaction rate, you can increase the potential for sub agent.

Chip Huyen [00:13:16]: So when I ask people as a family with sub agents, a lot will say yes. Like the first like card has a concept of task. It's like the golden agent can spin up a task independently and only returned to user result when it's done. And I was looking at the system instructions for sub agent and task and something that stood out to me is that user cannot interrupt a task in all a sub agent in progress. So because the main agent spin up the task and then it collects the result, users only see the result when it's done. So that means if the sub agent is doing something dumb, it can waste a lot of tokens and a lot of money without giving any good feedback. So like the main agent only spin up a task or a sub agent if it has high confidence that it can complete a task. So if we have more tasks that agents can just do.

Chip Huyen [00:14:14]: If you have high confidence that more tasks agents have high confidence of doing without requiring human interactions, then it can spin up more tasks in parallel. And of course the last thing is like reduce context loads because every single time you interrupt an agent, you add more context to it, right? Maybe the agent has something in mind once you execute and I interrupt it and was like okay, wait, I need to reevaluate it, it didn't work and blah blah. So, so, so yeah, so, so reduce context would allow the agent to use memory more efficiently. So I can't ask people here, but when I ask. So when I target talking person I usually ask people of like what's the. What everyone current instruction rate is. And it varies very very widely from people to people. So I should build this tool so it can look at people AI coding lock and see how often they interrupt.

Chip Huyen [00:15:03]: I think. And so these are just from people who contributed to without depth. I do not know them. And I can see that the tasks go from anywhere between 0% to almost like 30% or 50% and something I started asking people and I realized like there are certain areas like impact someone in terms of rate. So the first one is just user background, right? Like let's say you have like engineer users and people who come from a non technical background. So it's obviously like if people come, if someone comes from a non technical background, they are less likely to interrupt the air coding agent because let's say they give agent a task and the agents start writing code and they could be like. And because they don't know how to write code, it could be hard for them to like know whether it's like doing the right thing or not. So they are less likely to like interrupt the agent.

Chip Huyen [00:16:01]: Another thing is social engineers are like way way. But then the next part is about senior versus junior engineers. And we usually ask people like who do they think could interrupt who they think would be more likely to interrupt coding agents? And a lot of things are like, oh, senior engineers are more likely. But in my very limited data, which could be totally biased, I find this like senior engineer actually less likely to interrupt. And the reason is that like senior engineers, a lot of people who work in software for a long time and are usually writing design docs are usually communicating technical requirements to like other stakeholders. They're actually much better at writing prompt writing specs for agents to do it. So then they have a pretty good picture in mind of like what they want the AI quoting agent to do. And because you give very good instructions, AI can do the task better and they're less likely to like have to interrupt.

Chip Huyen [00:17:02]: Whereas people who are like more juniors, they usually just like, they don't know like what kind of risk, what, what kind of like pitfalls, what kind of arrows is my comm. They might have a good understanding of the spec or the skill requirement, so they might just do it as they go. They started putting in some short prompts and then things don't work out, they interrupt it and they change it and stuff. So they less likely to interrupt. And also another behavior that emerged that I find very interesting is that let's say that you start with an idea of an application you want to build. So you write as a specs for it, so you give AI to do it. And because it's like you're still building and thinking about it, you don't have a very good picture of what it should be. So you just figure it out as you go.

Chip Huyen [00:17:48]: So AI started building this, maybe suggest an interface that's not the interface I want. I want this to have separate pages. This has to be admins, it has to be stuck or you use this. Okay maybe they are over engineering it. So I want to specify the scale so so the people like I personally I learn as I go like I have a dry run just like putting things out there and learned from like as agent built in and then after that I have a pretty good pictures of the specs I want. So now I have a very fully fledged and very detailed specs and requirements. Then I would like just spread it of the code base and give the new complete spec to coding agents and this time the agent can just like build from scratch like very very nicely without me having to interrupt or interrupt it a lot fewer. So yeah from the first run when you're still exploring to the second run we know better what you want to do.

Chip Huyen [00:18:43]: The interruption rate I reduce significantly. And of course it's another thing it's about like task type so depends on complexity of the task or the tech stack that AI might do well or not do well. So I think people have found out that I think it's not a secret anymore is that AI usually is much better at working on new features on a new code base, like writing new code from scratch instead of trying to work with exiting code base. And the corollary of that is that now there are a lot of people discussing how to make existing code base more friendly to AI coding agents. So for example, if you have a code base that extremely fully structured with a lot of intertwining components and it's not modular at all, you might need to rewrite, restructure or refactor it to make it modular, to make it a. To make it easier for AI to work on a certain small part of the good base. So yeah, there's a whole school of thought and discussions on how to make existing large code base AI agent friendly and the tech stack. So there are different tools.

Chip Huyen [00:19:50]: So obviously AI will be better at like using popular tools with a lot of tutorials online and like less good at like using less good at using new tools when not documentations. But as soon as there are languages I could say this. For example, I usually like the example of three different languages. So let's say like JavaScript, Python and Rust. And I ask people of like which language do you think AI would be the best at? The answer always surprising. I think it's like okay, I'm going to leave this as an exercise for reader here, but I'm curious what People think I can say about JavaScript and Python. So AI of Demetrius, I saw you jump on screen. Usually we need to like run.

Demetrios Brinkmann [00:20:38]: That and we need like 10, 15 seconds before people in the chat get the feed and then they write it in. So I'll come back momentarily to give you the results.

Chip Huyen [00:20:52]: Yeah, no worries. So. So, yeah, so. So Zephyr script and Pythons is interesting because personally I tested it out. I found out that JavaScript AI is pretty bad at JavaScript. I talked to a bunch of friends who also noticed the same thing. I actually asked other friends who are actually model developers and they asked them what is going on, why our AI model is so much worse at JavaScript than at Python. They told me that, oh, it's because there's a lot of bad JavaScript code on the Internet.

Chip Huyen [00:21:24]: So because AI models, I've been training a lot of code on the Internet. The average JavaScript code on the Internet is just way worse than the average Python code on the Internet. And the question of like whether it's better rust than Python script, I wouldn't, I would like leave it to your imagination. But I do have some interesting data point on that as well. So. So, yeah, so a lot of people.

Demetrios Brinkmann [00:21:52]: A lot of people in the chat are saying Python, they think Python's going to be the best. And that's hilarious on JavaScript. All right, I'm out of here. The interruption rate from me is going down. Sorry, I'll be out of here.

Chip Huyen [00:22:05]: Yeah. Can we make this more interactive? I would really love to interact with the audience.

Demetrios Brinkmann [00:22:11]: Yes.

Chip Huyen [00:22:12]: So, yeah, that's the beauty of events. Right? Okay. So another metrics that I also track when I work with a recording agent is how long does it take for a coding agent to complete a task? I give it. So, so when I work on a coding agent, it's usually like for a lot of tasks, for a lot of things I wanted to do. It's not simple anymore. It's like, require a lot of back and forth. For example, one as a project I use, like, it requires 5,000 back and forth. Like, it's like, it's a ton.

Chip Huyen [00:22:40]: Sorry. It's a lot, it's a lot of comment. And every time I give it, I give instructions, I have to wait for it to like return the results. Right. So when I give you instructions, it starts thinking, okay, how do I complete it? So it starts coming up with different steps and like, the more steps it takes to complete a task, the longer I have to wait. So it started counting, like, what is the Average number of steps the agent does in the background given each comment. And usually the graph look like this, you can see. So it grows over time.

Chip Huyen [00:23:13]: And the reason it's simple is because the code base gets more complex over time. And a lot of tasks, a lot of steps it does is just search. So let's say AI is like, hey, go fix this chart somewhere. It would first like, okay, where in the code base is the chart written? So you start looking into different files and then from the files it's like, okay, maybe the function name is. This is just searching for it. So the more complex the code base, the more search steps you have to do. And I get annoyed because I'm very impatient and I don't want it to do things. So I don't like the chart of like growing upward and upward.

Chip Huyen [00:23:46]: So what I try to do is to make it very flat. Like I want to do my code base in a way. So it's like it does not increase the complexity for each instruction. An agent can do more faster. So, so here's an example. Like when I does a task, an app and the AI coding is the first somehow first intuition, like for reaction is to build it in the backend JavaScript. And it was like, why would you do this in JavaScript? It's like horrible. So I asked you to change to Python.

Chip Huyen [00:24:15]: You can see that the complexity immediately drop. I think it's great because now it can complete things a lot faster. Actually try to reflect the long code files because I use a longer. The files are harder for AI to parse and understanding. So I make the code very modular. So make sure mechanical about the structure of my code base so that AI knows where to find things and where to put things. Do not do like duplicate code. So yes.

Chip Huyen [00:24:41]: So I hope that I have convinced you that interaction rate is something important. So it's important not just you have us be more productive, but also it can be a mirror for us to understand how good we are using AI. So sometimes when I feel like you interrupt AI a lot, it's not because AI is not good, but because I'm not able to give it like clear instruction enough. Like a lot of time I realize the issue is me because I just fail at specifying what I want and of course AI misunderstand it. So okay, so yeah, so review instruction rate is like one level is for model developers. You can build model with more reasoning, cod building, more thinking, logical planning. Then it can help with a lot more tasks. Another layer is from like tool developers.

Chip Huyen [00:25:29]: So given A base model. They can build better agents. First of all they can have better tool design. They can give the right setup for maybe wrap instead of bash or some other modular tool. Tool design is very challenging. So because you want to give agents a tool set that is large enough that they can do a lot of tasks but not too large. So agent will get confused. So when we're talking to developers, usually they avoid like at this point they avoid giving agent like more than 20 tools because it's very hard for agent to learn to use up many tools and actually challenge with like things like mcp.

Chip Huyen [00:26:07]: Because when you onboard an MCP server, like that server can bring out a lot of tools. It's just like it, just keep it, make it so. So it's so easy to just keep adding MCP servers and then you get a lot and a lot of like tools that the agent has no idea how to use. And it can make agents perform worse rather than better. And of course is the last step is like users, right? Like we need to learn how to work agents. We need to, we need to understand like what mistakes it makes so it can like structure our workflow and code base in a way that it not to commit the mistakes anymore. So yeah, so I think it's like when I look at it real good users human AI workflow, I think of it as like three steps. The first step is like plans.

Chip Huyen [00:26:49]: So you give the agent a spec and then you can generate like step by step plan. And then once you execute it's like you write code for it. And then after they had to verify whether the execution the code generated satisfy the original goal and plan. So yeah, for users you do need to respect the instructions. I don't think this is going to go away. I remember someone just talked the set from just now. Demetrius, you mentioned from the panel, the thing you learned is that problem engineering is going away. And for me it's a very strange concept because saying that promengering is going away is like saying the communication is going away.

Chip Huyen [00:27:30]: AI is not the only intelligence entity out there in the world. We have like 8 billion people working human intelligence out there in the world. Do we think, do we stop communicating with each other? I don't think so. I think a lot of the challenge is like being able to communicate to AI what we want to do. And for me that is problem engineering. I'm not sure. So for me I don't think it's going to go away for executions. I do think it's like people have a lot of People have friction support when they don't write code and I say let AI do it.

Chip Huyen [00:28:03]: And they only interrupt when automation fails. So they actually a lot of the time they spend on an IDE is very, very limited. So they spend mostly on terminal or some kind of specs, provide markdown file reading spec and then they go into GitHub to review the code and they don't write code anymore. And for the verification process, it really depends on the task. Sometimes it's very easy to verify if you write a simple app. You can just look at the app, okay, it's working just fine, but sometimes it's a lot harder. So it really depends on the kind of task. But I do think that we are reaching the point when a lot of us are actually reviewing code more than writing code.

Chip Huyen [00:28:47]: And so a company that I worked with recently and they told me that they have been structuring their team so that they let junior people and AI agent write. A lot of PRs and senior people spend a lot of time writing down architectural designs, specs and then reviewing code. And they got upset when we had the discussions mainly because we were like wait, so what am I, am I just might be the person reviewing code? Because I'm not sure about you, but I have never met a single engineer who enjoys reviewing code. Everyone considers a chore people like building. No one like reviewing. But, but if you think about it, if you think of from like a career progression, we do move more toward reviewing as we as, as, as we get older. Like let's say like when we beginning right, we are more junior, we do a lot more hands on execution stuff. But if you become a manager, then a lot of time you don't actually do things yourself anymore.

Chip Huyen [00:29:46]: Like a lot of times a task is like okay, how do I assign tasks to my team? And then how do you evaluate the performance of my team and how do you guide them to what you're doing at the right time? So I do think with AI agent we are moving like another layer of attractions. Like we do more, we do less hands on stuff and more reviewing and guiding and verifying. So yeah, so I think that's what. Wait, if I delete this one, but let's just repeated it. Oh no, it's not that, it's just wrong thing. Okay, so I do think one is like, I do think I'm very bullish on spec driven development. So I think the idea is that you just give it very clear instructions. Having very good understanding of what we want to build, why we want to build it.

Chip Huyen [00:30:33]: And also like site rules, for example, like in my rules file I usually get things like what, what to expect that I want to use. First of all, if I hate something, I don't want you to use it. I have to make it like forces to like do a lot of like understand documentations. Like don't just make up like API calls and stuff like that. Like I make it very clear to scale. Because if you don't tell it to scale, like hey, this is just a test project. I'm like, I'm the only users, it might go and go crazy with like package design and like, because like handling like thousands of requests, which is an accessory, everything is very important to do error analysis, like looking at the arrows that AI agents make and try to reduce these errors. So for example, here's what I chart and make with a tool.

Chip Huyen [00:31:19]: When I look at the ways look at when you interrupt or look at the arrow messages from the coding agents and it tries to break down what kind of arrows out there that it make. And then you try and just like fix it, like try to reduce it. So, so yeah, I do think this like with all of that like spectrum development and analysis, I don't think it's like four into problem solving system thinking. I'm very bullish on system thinking that I do think it's one of the most important skills that we need nowadays. Yeah, that's pretty much my talk for today. Thank you so much. Thank you so much everyone. If you have any questions, feel free to reach out.

Demetrios Brinkmann [00:32:01]: Yeah, there are a lot of questions coming through here in the chat and this is how we're going to do it because I promise some of your book to be going to the attendees. We already gave away some AirPods. We're now going to be giving away some straight knowledge chip. That was awesome. First of all, I got to say that before we go any further, that was brilliant. You articulated things that I've been feeling and seeing in such a way that it's like. Oh yes, especially with the like time doesn't equal mental effort piece. So well put.

Demetrios Brinkmann [00:32:33]: And I'm probably gonna steal that and try and re quote you later on because it's really well done. Here's what we're gonna do.

Chip Huyen [00:32:42]: Thank you.

Demetrios Brinkmann [00:32:43]: You wanna play along with me? We'll go. And everyone that is now in the chat, there's a Q and A section and you can vote thumbs up for the questions that you like the most. Either write your question in right now or go through and look at all the questions, because there's a whole ton of them. And upvote your favorite one. And I'm going to ask five questions of Chip. And those five people that I ask your question, reach out to me so I can send you one of Chip's books. All right, let's do it. I'm gonna give them.

Demetrios Brinkmann [00:33:19]: What should we give him? Like, 10 seconds, 15 seconds to upvote, read through all the questions.

Chip Huyen [00:33:24]: You're the expert.

Demetrios Brinkmann [00:33:26]: They're gonna speed read all of these questions, and then they're going to be able to upvote whichever ones they like. I'm gonna start with my favorite first because I'm making up the rules. That's what we can do. And the next one I'm going to ask is going to be the one with the most upvotes. All right, so. And then we'll go sequentially down the row with the most amount of upvotes. All right, so it was this one. Let me find it again.

Demetrios Brinkmann [00:34:03]: How do you prevent over automation from creating brittle ML systems?

Chip Huyen [00:34:14]: How do I prevent over automation to create brittle? So I think what over automation means, right, it's like when you're trying to rely on AI to do the task that is not able to do. So I guess over automation, right? So I think it does require understanding, like, what AI can and cannot do. And this is tricky because I first require you to actually spend time with the AI and try it out and see, like, over time, it's like, okay, what is good and what is good? But also, AI is evolving so fast that, like, you actually do that constantly, right? Because, like, maybe. So I think chart. Because before, I think one thing that I try to measure is like, how, how, how much, how complex, how much complexity AI can handle with planning. So I look, it's like, okay, Maybe in early 2024, I found out that for a lot of tasks cannot reliably solve things that require more than like five or 10 steps, right? But then, like just 20, 25, a year later, I can see that they can reliably perform things like, I know 14. So I feel like it's growing so fast. So AI is going very fast.

Chip Huyen [00:35:25]: So. So, yeah, I don't think there's a good way of doing that. But, like, just like, with a lot of, like, learning, I don't think it's a process you can just solve instantly.

Demetrios Brinkmann [00:35:35]: Yeah, I do like that. It goes back to that slide that you shared, which showed how complex you can let the agent go and create how big of a system or how many Steps and as time went on kind of has gone up and up. It's. It seems like it was a little logarithmic. Like it's not going straight up now, but who knows, maybe that, you know, the new model you can see.

Chip Huyen [00:36:02]: Yeah. You can see that newer models just like can perform better. Right. Like older models like drop at like three or four and then newer models can go up to like what, seven, eight years. So it's quite interesting.

Demetrios Brinkmann [00:36:13]: Yeah, that is fascinating. All right, next one we've got. This is awesome that somebody asked this. They want to know Chip's opinion. Is MCP overhyped?

Chip Huyen [00:36:32]: Do you want me to make enemies? I think how could I possibly say anything negative by anything publicly.

Demetrios Brinkmann [00:36:39]: It's a lose lose situation.

Chip Huyen [00:36:41]: Yeah. I do think that standardization is good. Like anything that make it easier for people to collaborate and reuse things is good. But of course I think we will never have a standardization that can meet every edge cases. So we need to understand the limitations. And we have seen this like mcp. Right. And so like users as well.

Chip Huyen [00:37:03]: So actually work with companies, they have very, very good instruction internal guidelines on what type MCP server should adapt. Right. First of all, like they would prefer MCP created by the tool developer. Like let's say, like they. First of all, let's say you have the MCP for Google Calendar from Google vs. MCP Google Calendar from someone else. They could probably prefer the one from the developer because the tool like Go Calendar would change over time. If I would change over time.

Chip Huyen [00:37:31]: You need to trust that the MCP server would get up to date. You need to trust that the MCP server is not going to try to steal information and do something crazy or something like that. Yeah. So I think it's really up to the users. We had an example in the talk about people just keep on adding MCB server without looking at the set of tools and the AI agents suddenly have access to maybe 100 different tools with very similar selling names and has no idea what to do. So you have to be very mindful of the tools you're giving the agent. So I think it's like I like standardization, but standardization can never be perfect and also really depends on on the way people use it.

Demetrios Brinkmann [00:38:12]: Well, that was a very diplomatic answer. I like it and I'm going to go with it. Well said. All right, we've got two more really top like they've been upvoted the most. I'm going to go with Abel's. When using AI to generate production grade code, what are the most overlooked failure points in data flow design or system architecture that engineers should proactively guard against.

Chip Huyen [00:38:41]: Oh, so so there is a different threat. There are two different threats here because like quoting sharing code for you and also like buildings infrastructure. So I think a lot of I don't, I don't think is a point is any can automating buildings infrastructure for you. Like it's not going to help you because for a lot of companies you don't build infrastructure from scratch. You usually have. Okay, I recite the contract with like Google gcp. So now we have to stuck with bigqueries. Now we have to stuck with that.

Chip Huyen [00:39:14]: I'm not stuck as a bad thing. People love bigquery. I think a lot of people. I was saying that like infrastructure is not about what is the optimal tool, but it's usually like a legacy system that we have to work with. So I don't think AI can automate that. So I think we just need to be able to specify the system you have so you can think of as a constraint for the application. So let's go into the spec. Here is what we want to do, here's what we have and here's what we cannot change.

Chip Huyen [00:39:45]: Have us come up with a plan to do it.

Demetrios Brinkmann [00:39:48]: Yeah, no AI migrations yet.

Chip Huyen [00:39:55]: Yeah. It's a weird thing because when we work with our customers on the first thing we ask them like hey, what companies have very big contract with because like they don't like I don't know, I don't know who. They don't change that very often. I think like some company. I think this one company just like they change providers every time they change CTO and the engineers. Absolutely. It they change like back and forth like GCP to aws. They change back to like I don't know, did I break and was like what is going on? Yeah.

Demetrios Brinkmann [00:40:26]: Oh, that's so easily, so painful. Yeah. Okay, last one for you because we're starting to run over time and I'm going to serenade everyone. Good night. As our closing act, Rashab is asking, given that senior employees are more experienced prompters, do you see the early career job market decline further in comparison to the rest of the market?

Chip Huyen [00:40:53]: It.

Demetrios Brinkmann [00:40:54]: And there's a little bit of an assumption. Sorry, I didn't mean to cut you off. There's an assumption there that it's declining because I don't, I don't actually see it declining. But let's take Rashab's question because I got a lot of upvotes as is given the senior employees. You remember it. All right, go for it.

Chip Huyen [00:41:13]: It's actually really, really hard problem, and people like debating it a lot. So. So I think some companies told me that, right? Like, they think it's like, okay, like, if AI can just automate a lot of junior work worker job. So we just hire senior engineers and, like, let them make reviews the code like you design infrastructure for. For. For agents to do. And the question is like, but if we don't have junior engineers and how can we have senior engineer? That was a lie, right? Like, if nobody, like you coming, like, yeah, like, in 20 years, how do we have senior engineers if no one is entering the market? And it's a hard question. And I don't.

Chip Huyen [00:41:49]: I don't know the answer for it, but I do have. I want to bring out this model of, like, internships. So in college, I had a lot of friends who, like, got very fancy internships at, like, fancy companies. And yes, I also got a fancy internship with, like, fancy companies. And we got paid a lot. When we look at the paycheck, it's like, there's no freaking way. We're doing so much. Okay, if my own employers, like, please do not take it, like, seriously.

Chip Huyen [00:42:13]: I appreciate it, but I was the same. It's like, it's very hard to justify crazy paycheck for people coming for those a few weeks and, like, have no context. Right. But companies do it because they think of it as a nurturing talent. So it's like, okay, now they get this internship interns in the pipeline to train them. They understand the workflow and they have a chance to evaluate how good they are. And then maybe after they graduate, when they become a much better engineers, they can join this company. So I do think they would need a pipeline for nurturing Julia talents.

Chip Huyen [00:42:41]: Even though if we don't necessarily meet them right now for a lot of tasks.

Demetrios Brinkmann [00:42:49]: Yeah. And I would say I'll go out on a limb and say that a lot of the newer devs or junior engineers are learning how to use these coding tools in different ways, and they're coming up, learning it and being able to lean on it much more than you would necessarily think. So it's a fascinating one. You always got to learn the fundamentals and the principles, and you got to put in your time and get burned and have those experiences to learn what not to do. But it's cool to see too, that there's now new opportunities and new doors opening for the folks who are junior that can use this right out the gates.

Chip Huyen [00:43:30]: Oh, I love that idea, actually. So what you mentioned about, like, maybe like junior people can actually use AI to level up quickly. So I do think it's like I make this very easy to build an end to end. You can just launch things yourself, test on apps, send it to friends and try to grow it yourself. And you can learn a lot about the process and become much more experienced.

Demetrios Brinkmann [00:43:50]: Exactly. Chip, this has been great. Folks who I asked your question to, please hit me up in the messages. I'll send you one of Chip's books. Thank you so much for doing this. There was no better way that we could have ended this conference than with your keynote. Chip, I learned a ton, as I always do, and I really appreciate you coming on.

Chip Huyen [00:44:13]: Thank you so much for having me. I think this I really love the conference. So, yeah, thank you so much for having me again. And people, if you need any questions, you can reach out. Have a good day.

+ Read More
Comments (0)
Popular
avatar


Watch More

Real-time Machine Learning with Chip Huyen
Posted Nov 22, 2022 | Views 1.8K
# Real-time Machine Learning
# Accountability
# MLOps Practice
# Claypot AI
# Claypot.ai
Building Cody, an Open Source AI Coding Assistant
Posted Aug 28, 2023 | Views 856
# Open Source AI
# Cody
# Sourcegraph
Code of Conduct