MLOps Community
Sign in or Join the community to continue

Agent Use for Coding in 2025 // MLOps Reading Group January 2026

Posted Jan 26, 2026 | Views 7
# AI Agents
# Coding Agents
# LLMs
# Devs Code
Share

Speakers

user's Avatar
Valdimar Eggertsson
AI Development Team Lead @ Snjallgögn (Smart Data inc.)

Raised in Reykjavík, living in Berlin. Studied computational and data science, did R&D in NLP and started making LLM apps as soon as GPT4 changed the game.

+ Read More
user's Avatar
Lucas Pavanelli
Senior AI Engineer @ Stone
user's Avatar
Rohan Prasad
Staff Software Engineer - ML Platform @ EvolutionIQ
user's Avatar
Jimin (Anna) Yoon
Tech Lead / Senior Software Engineer @ Statsig
user's Avatar
Arthur Coleman
CEO @ Online Matters

Arthur Coleman is the CEO at Online Matters . Additionally, Arthur Coleman has had 3 past jobs including VP Product and Analytics at 4INFO .

+ Read More

SUMMARY

AI agents aren’t “helping” devs code anymore—they’re starting to run the workflow. This panel pokes at the uncomfortable question: are engineers still in control, or just supervising very confident machines that are slowly replacing how we think, design, and build software?

+ Read More

TRANSCRIPT

Arthur Coleman [00:00:00]: Welcome back to the reading group for 2026. Glad to have you all here. Glad to know that when 2025 ended, we didn't suddenly lose all of you. Today we have an interesting topic and you'll see it's a very different session. It's professional software developers don't vibe. It's an article they control about. It's about AI agent use for coding in 2025. Now it's six months old.

Arthur Coleman [00:00:24]: By the way, that's an important note as we get into this. I'll come back to that in a minute. The agenda is we're going to start with Voldemort doing the overview section. Then Lucas is going to go through the methodology and results and takeaways are going to be done by Rohan. We're going to spend all of 15 minutes on this because then what's going to happen is we're going to open it up to a series of questions. Similar questions to what were posed in the article. Slightly different, but ones that get to more practical. How do you do it? What have you? What techniques, tips and tricks and workarounds have you developed to make LLMs make your coding more efficient and agents? And that is as one of the people out there dealing with this.

Arthur Coleman [00:01:11]: It is one of the most infuriating things to deal with because the LLMs sometimes are the most intelligent animals on the planet and sometimes they are like your dumbest intern that you hired for the summer. And so whoops. And so as a reminder in the rules, these sessions belong to you. Everybody knows that. I think they have say it enough times. But today this session is really about you and it's really highly interactive. As I said, we're going to share all of us best tips, tricks and workarounds for coding with LLMs. Anna will be our moderator.

Arthur Coleman [00:01:46]: She will ask questions to the speakers and then she's going to open it up to the audience. Note this is important because if you ever worked on the W3C committees, they always warn you about this. If you consider a technique that you want to share proprietary to your company, or company considers it proprietary, please be aware that if you share it, it becomes public domain. So be very careful. If you think there's something that your companies don't want you to talk about and you know that may reduce some of the interaction, I don't think it's.

Lucas Pavanelli [00:02:17]: Going to be huge.

Arthur Coleman [00:02:18]: But I did have to provide that warning. You can give your request to speak or put questions into the Google Doc. I need to put that in the chat. I Will in a moment, the link and is Anna on at this point? No. Okay. I don't know if I'm going to be running the Q and A panel or Anna will, but when she gets on, I'll ask. And by the way, I'm going to break a rule that I've never done, which is I'm willing to stay on afterwards and the speakers don't have to and anyone here doesn't have to. But if you want to, and we want to continue sharing best practices, things that work for us, I'm willing to say, because I guarantee you, if I stay at 5 or 10 extra minutes and someone says one trick that can save me hours of coding time, that's a good use of my time.

Arthur Coleman [00:03:03]: So I'm willing to stay, but no one else has to. And if people don't want to and we lose everybody, then I'll shut the session down with that. The last comment is don't forget to fill out the post event survey. This is your event. We want to know so we can do better events for you in the future. And at this point I think I will turn it over to Valdemar. Thanks, Valdemar.

Valdimar Eggertsson [00:03:24]: Thank you, Arthur. So hi, welcome everybody. I'm going to talk about the professional software developers. Don't wipe they control AI agent used for coding in 2025. I will share my screen. We have this figma here. Let's just use that. I'm going to try to get into full screen mode.

Valdimar Eggertsson [00:03:56]: Sorry, I answer this for now. So you see my screen here.

Rohan Prasad [00:04:06]: Okay.

Valdimar Eggertsson [00:04:07]: So this paper was the most popular one in our public vault. We got in the Reading Club channel and it's a very interesting topic. However, this paper was not as interesting as I thought, but we would make it interesting session. So this is basically a study. Some people made it like six months ago, trying to investigate empirically or using some kind of methods, maybe from sociology or something, to understand how people behave with how they use AI agents. And the gist of it is literally just in the title. Developers don't vibe, they control. And then we have 28 pages of like proving that somehow.

Valdimar Eggertsson [00:05:00]: So they did two things. They had surveys. So for example, here in the intro we have a quote. I've been a software developer and data analyst for 20 years and there is no way I'll ever go back to coding by hand. That ship has sailed and good riddance to it. So that comes from the survey. Who's one part of it? And then to complement the survey, they had observations where they had like These video calls with developers who were interested in AI coding and using it and interested to. Interested in showing it off, showing how they use it.

Valdimar Eggertsson [00:05:34]: So they had these four research questions regarding the motivation, strategies, suitability and sentiments of the developers. Investigated with these observations and video observations and surveys. Number one, what do experienced developers care about when incorporating agents into their software development workflow? What strategies? And number two, what strategies do experienced developers employ when developing software with agents? Number three, what are software development agents suitable for and when do they fail? And finally, what sentiments do experienced developers feel when using each other tools? Yeah, that's the gist of it. We go into the methods and the results. Lucas was going to take care of the methods. I'm not sure if I need to comment anymore on this for now. I have lots of things to say about this, but we can do that in the round afterwards. So I think I'll hand it over to Lucas.

Valdimar Eggertsson [00:06:49]: I'll stop sharing my screen and turn off my.

Lucas Pavanelli [00:06:54]: Thank you very much. So let's take a closer look in the method parts.

Rohan Prasad [00:07:02]: Let me just.

Lucas Pavanelli [00:07:14]: Okay, guess you can see my screen. So as Valdemar said, this was a study that was conducted using experienced developers. So basically they define experienced developers as someone who has at least three years of preparation, professional development and they try to study those four research questions through a two part study. So the first part is through an in depth field observation and the second part is a more in breadth qualitative survey. So in the first part this field observation consisted of they recruited 13 participants and for each of these participants they manage a study session with two parts. So for 45 minutes they observe the participants using AI tools and then the last 30 minutes they conducted some type of interview. So basically they asked the participants to bring some tasks that they are working on using their preferred setup. Here in the paper they have a table listing like what are the tools? The tools range from like cloud, code, cursor, Codex, GitHub, Copilot.

Lucas Pavanelli [00:08:32]: They have a different, each participant use a different type of tool and then they basically observe the participants using these tools doing some tasks, right? And they list here what are the tasks but just basically like or writing software or maybe like doing some experiments. And then in the last part they ask questions about the use of AI tools in general, how they use. So this is the first part which is the field observation that they conducted. And the second part is this survey. So the survey is like a 15 minutes question and answering that they give to participants and they invited, If I'm not mistaken, 99 people to answer this also considering the justice experienced developers. And in this survey they have in total like 13 questions which we can see here. So these are the questions that these participants, they answer basically they are regarding like they, they in the, they introduce that oh, regarding your last use, the last task that you did using AI agents answer these questions and these questions are basically like what are your tasks? Like what are the prompt techniques that you use if you run agents in parallel? We will talk about this also in the last part of the session. But basically they answer, they, they give this questionary and they ask for the, for the answers and they also have in the end here more two open ended questions which they can answer like more free way.

Lucas Pavanelli [00:10:16]: And I think basically that's it. And yeah, they have some data regarding the participants. They also in the paper they describe what are the different developer experience that each participant has and also they conducted the data analysis to then generate the results. Well that's the main part of the methodology. One thing that caught my eye regarding some limitations that is heavily male participants. So basically like only two females on this survey. Right. And also obviously there are limitations regarding the sample size and the variety because it's basically 13 field observation in 99 developers that were conducted in the survey.

Lucas Pavanelli [00:11:10]: So yeah, I think basically that's it. So we can jump into the results.

Rohan Prasad [00:11:16]: Thanks Lucas. I'm going to slide through the results pretty quickly because I think what's more interesting about this paper is we have the ability to kind of like live take the survey here. Let me just make sure I'm showing the right screen. So the, the TLDR on the results is we go back to like the, the first four questions that Valdemar went through which is like figuring out their motivations, like what kind of strategies they're using. Does this like actually work? Like what do I want to use this for and what do I not want to use this for and like what's the general sentiment? Similarly run through the results section going through the first motivation. Well some people come in with interesting motivations. Obviously this quote stands out. So when someone has a serious like issue such as like maybe they have carpal tunnel or maybe they have some other severe arthritis or something and they can't actually code using their hands.

Rohan Prasad [00:12:19]: Like agents actually help people be a lot more productive. But in addition I think a lot of the big sentiment in here is really just helps people feel more productive and it allows them to focus on more quality attributes. So instead of just coding out some particular functions, like how do I Think about like, what does this really mean to be maintainable and testable and correct? The ending takeaway of this is they really appreciate the tools and they feel like there's a big boost in productivity. That's something that I would say I share as well. And they tend to focus on software quality when working on agents, which allows them to kind of focus on that particular area as opposed to the entire.

Bruno Lannoo [00:13:07]: Breadth.

Rohan Prasad [00:13:10]: In terms of strategies. There's two different ways that people really talked about it in this paper. So one is like controlling the software design and implementation. This is really focusing on prompt and context engineering. So how do I really design a prompt with very, very specific detail? What are the specific features to be implemented? Like having a draft plan, making sure that I have a really good understanding of what's going to happen. As you can see here, a couple of them in the 13 actually had plan files and context files. Most people at the very least though were heavily focusing on the other part of it, which is checking through behavior outside of prompts, which is reviews and edits and using things like Git to essentially like undo changes. And a lot of people are relying on, you know, smell tests, like I've built this kind of thing before so I can go back and use my existing knowledge to kind of like this agent and say whether this is correct or not.

Rohan Prasad [00:14:15]: So experienced devs like to take away from the section are focusing a lot more on how do I distill the information in my head to very clear information, very explicit pieces of instruction and really letting the agents hone in and focus on a particular task. But outside of that it's, it's really once again like the best practices outside of using agents are still relevant. So like version control, like reading through PRs, treating them as like discussion forums, like asking for changes, seeing like testing out the architecture, like seeing how maintainable it is in terms of suitability, which is the next area in focus. Most people. If you look through this chart right here and I'll give you a high level overview. When people think about agents, they think about repetitive tasks or things that are just very, very straightforward, like documentation, doing a couple of like small fixes like refactoring. Generally it accelerates productivity and makes things forward. But you can see like the hard theme is like small, simple, straightforward tasks and that's the general bucket for maybe suitable for where it gets a bit more controversial or people tended to disagree A lot more is when folks wanted to focus on like high level plans, like should you let an agent design the entire Architecture interview that a lot of people said no, but quite a few people also said yes.

Rohan Prasad [00:15:45]: And they actually liked agents as being a co pilot for helping them design high level plans. Where everyone was kind of on the boat of where they did not want to work with agents is essentially really complex, ambiguous tasks and where things are very security specific, where things are really vague, unclear. And the general gist of it is if I can use agents for specific tasks, I'm very much going to focus on that and I'm not going to use it for like high level complex things where I don't even have a clear vision in. And once again, agents are statistical machines. They're based so they're not going to give you perfect code all the time. So once again it goes back to the prior section of like how well how do you build like strong reviews and other things around there to make sure the agent is successful. So to sum up this particular section, it's suitable for like straightforward repetitive like scaffolding tasks. Like if you ask an agent to build a to do app, it's going to be very, very successful because there's probably like 5 million examples of that out there.

Rohan Prasad [00:16:55]: But if you ask it to build something bespoke and special, that's when you need to give it more guidance and things get a little bit more complex and challenging. Generally sentiment people are really happy. 5.116 is a pretty good score. Most people felt pretty excited of using agents. They, they were like, especially when working with older code bases. This is something that made the approach generally more exciting. They did stress like people need to stay in the loop. You can't just like, you know, close your eyes and type a prompt and hope things, hope magic happens.

Rohan Prasad [00:17:33]: There's still a lack of trust in sort of agent capabilities, but that comes back to being queued in the loop. And a lot of people like to using agents and this is something that I really like doing as well to validate and stress test ideas. So almost using an agent as a rubber duck where you're talking to it and providing information and it's responding back with other things and giving you additional strategies. And everyone knows, I think at this point it's no surprise to talk about agent type coding as the future of software development. And I think there's still a very strong place for humans to interact as a part of this. So the high level here kind of getting towards the end is once again from the motivations perspective, people appreciate the boost it allows them to focus on fundamental software quality. And that's the way they hedge against any issues. It's really about being like concise and clear and specific in your prompts.

Rohan Prasad [00:18:34]: It's really suitable for repetitive tasks. That's the sentiment that came from this paper as well as generally like talking about people's. People generally have enjoy working with agents.

Erin H [00:18:48]: Cool.

Rohan Prasad [00:18:49]: I'm going to pass Mike back to. To me or Arthur. We want to go into the discussion.

Arthur Coleman [00:18:58]: Anna is on her way but she's not here yet. So I'm going to break protocol already because there was a comment that I think is really important right at the top from Aaron. It's one of the things that I was my first lesson in working with these machines. I wrote my first code in a day on a site for First Business. Took me four days to debug it. And that was a lesson about testing. Aaron, could you ask your question? I think it's really valid. I'd like to get people's comments on it please.

Lucas Pavanelli [00:19:33]: If.

Arthur Coleman [00:19:34]: And if you want to go on video. I can't see everybody unfortunately with this interface. But great. If we could see you.

Erin H [00:19:40]: Yep, I'm here. Video on. You can see me.

Arthur Coleman [00:19:43]: Yes.

Erin H [00:19:45]: So yeah. So my question is test driven development being a little bit more precise with the language, meaning writing the test before you write the code, not like writing tests and saying that that's test driven development. Because I think that was a little bit fuzzy in the paper. And so that's one place where like Kent Beck, who is like someone who's well known as like extreme programming things that not necessarily everyone is actually doing in practice, he has said like no, I write the tests, I do not let the agent modify the tests. That's how I know it works. Because the agents are kind of inclined to change the tests to make it into something that's maybe easier to implement or like more statistically probable. And therefore there's more examples that are that it can pull from or something like that. So was just curious like in your experience, are you doing a like strict TDD workflow or like some of the participants in this paper, is it more like the fact that there are tests and that you read the tests is what makes you feel confident even if the tests were also written by the agent.

Arthur Coleman [00:21:00]: We're going to do a lightning. Oh, Anna's on. You want to take it Anna, or can I. Let me do this one and then you can take over. We're going to lightning round it. So we'll go through the speakers and you can, you can take a question. We'll do four Responses Lucas, why don't you give it a start?

Lucas Pavanelli [00:21:19]: Sure. I can talk a little bit about my experience on this. So normally, like for the test part, the test driven developer development part, I don't normally do in the beginning as strict as it should be, but I found that using agentic tools for helping me writing the test is really useful. There are many parts where it's basically just templates that you need to write into the tests. So for example, for unit tests and also I think that writing the test itself is a thing that is less inclined to be a problem because it's not facing production, for example. So I use agentic tools to help me write the tests. Actually.

Arthur Coleman [00:22:10]: Rohan, you're up. We're going to do this very fast because there's so many things we can learn here.

Rohan Prasad [00:22:18]: So short of it, I wouldn't say I'm like a purist here that I only write the test and I don't ever have agents write the test. I think that would be disingenuous of me. That being said, I think it depends on the level of tests and the strategy around those to some extent. Because sometimes like the way I think about it is I see like, usually the end to end and sort of like integration tests as sort of like the stopgap that will really blow up something. I'm a little bit more lax when it comes to like unit testing and I tend to more prefer more heavy review on that particular section and then like validating edge cases. For what it's worth, I think agents have sometimes been better at catching edge cases on my unit tests than I would have. But understanding the functional requirements of something, that's where I tend to be a little bit more stringent in terms of having something which is like, well, you need to see this button here and you click this piece right here and this so and so happens that's where I tend to be a little bit more strict. I think where I tend to scrutinize a lot more is the testing strategy and I actually focus on having agents do that.

Rohan Prasad [00:23:25]: And I think a lot of like spectrum and development and when I've used plan mode in cursor, it tends to skew towards writing the implementation first and then test. So that's the way I kind of approach it. Yeah, it's just a question, Valdemar.

Valdimar Eggertsson [00:23:41]: Yeah, so a few comments about using LLMs to test. Like LLMs are trained to be helpful and trained to like make the user feel like they're doing what you want. So they tend to take Shortcuts and cheat. So that's. I have some experience with just relying on the AI to make the test for me without being specific enough on what the task is. And then it just goes build something that fits what I asked for, but it's totally not what I wanted. So, like, you have to be really specific about how the test is supposed to be. So I'd say it's a really good strategy to do the tester and development.

Valdimar Eggertsson [00:24:29]: I have spent a horrible amount of time debugging code that was put into production by someone that I had been tested a lot. But all the tests were written by AI, so edge cases weren't caught. And yeah, that's my two cents.

Arthur Coleman [00:24:50]: Okay, I'm going to open up the audience. Baokke, you made a comment which rings so true to me. I'm working on a new business and a project. I have over a thousand tests, unit and integration. Would you please talk about how you do it? Very specifically when you develop the tests, for example, do you write the tests at the same time you write the code or let the agent do it? Do you run those tests? How often do you run the entire test suite? Take us through your process.

Bauke Brenninkmeijer [00:25:19]: Yeah, I'm not sure I'm the best example, but my process is so, I mean, applied research, we do it a lot of kind of greenfield situations where we, we build functionality and then once we have a. A certain confidence that it's right, we add tests to that. So it's really easy to write a relatively complex prototype and then add, you know, a ton of tests to make sure that will keep working in the way that you've intended to. And then slight deviations from that with further increment increments are easily detected. That way, once you reach that kind of V1, it's really effective. But yeah, obviously there can be issues in that V1 that you don't immediately notice. And that can be a bit of an issue sometimes. But yeah, that's part of the vibe coding experience, I think.

Arthur Coleman [00:26:17]: How do you run the tests? Let me just ask the question you talked about when you created them, which I find interesting. How often do you run them? When do you run them? How many do you run at a time? How do you figure out your strategy for using tests?

Rohan Prasad [00:26:35]: So, yeah, typically they're fast.

Bauke Brenninkmeijer [00:26:38]: So every time there's a commit or anytime there's changes from AI and make sure that AI also runs the test and that fixes any issues that are there with the instruction to remain true to the original prompt and not Just, you know, remove tests or change tests in a way that actually is not in line with the desired functionality, but. But if they're fast to run it, you know, it doesn't really provide that much overhead. You can do it often and you can make sure that it's aligned.

Arthur Coleman [00:27:10]: Yep. All right, Ana, turning it to you to answer the question and let you go on to the next set of questions. Did we lose Anna? Emma, you there?

Anna Yoon [00:27:27]: Sorry, I think I was a little. I had a misunderstanding about this, but I will go into asking the lightning questions to the folks here. I think there are some good questions we can cover. So the first question that I really want to cover is from Bruno, and Bruno asked, what do people think of the importance of keeping prompts, logs of conversations to be able to go back and see where things went wrong when discovering issues fail later. Does Bruno want to first share his experience, you know, coming up with this question? Sarah is mayor.

Bruno Lannoo [00:28:11]: Oh, yes. Yeah, yeah. So I try to do that, like, when I'm doing, like, more serious work. Like, not always when I interact with an LLM or with an agent, but, like, when I'm doing stuff that I kind of feel like is important. And I still want to outsource it to the LLM because it's, I think they do enable me to think deeper by having, like, a partner to bounce things off. But what I do discover is that, like, maybe I'm like, 20 prompts down the line, and I'm suddenly starting to realize that the thing I thought we fixed, like, three hours ago was actually not correct.

Arthur Coleman [00:28:45]: Yes, our.

Bruno Lannoo [00:28:46]: And at that point I'm like, I, I, I do am happy having these logs. And I also do this meta analysis of, like, let's say I will not reread the logs because if I didn't read them at a time, reading them a second time is going to be, like, even more painful than the first time. It was too much. Rereading something is not very fun. But I will ask the LLM to reread or a different LLM to reread these logs and to confront it to what I now suddenly realize and figure out, like, what these things are. So, yeah, that's something that is the kind of the experience I wanted to share here. Like, and I'm wondering if other people feel like that that's also something they either do or are interested in doing.

Anna Yoon [00:29:28]: God, that's such a huge topic these days because everyone is using AI tools every day, and you just cannot, like, you don't have to say memory power as those agents and it all comes down to like can the agents memorize all of these for me so that it's easy for me to just read retry by asking it? Yeah, I wonder. It is definitely a huge topic. I wonder if anyone else has a similar experience around this.

Rohan Prasad [00:29:54]: Yeah, I actually really like this topic. Like one of the things that I found a lot of success and I think this is like similar veins, maybe slightly different approaches. If you've ever been in like a relay legacy code base and there's like a comment that says don't change this function because like and this is how many hours people have wasted changing it. I like to actually take those and put those as like non goals or like things not to do in prompt and like keep a track of that over time because I think when you're working on a greenfield code base it's very easy just spin up stuff, just do whatever but like challenges like in complexity and like business logic or some domain information that's been lost over time.

Lucas Pavanelli [00:30:38]: So it's.

Rohan Prasad [00:30:41]: I don't necessarily keep a copy of all the logs and whatnot or but I do like to compact things and keep goals and non goals or things that you should never touch and never do based on like past interactions and history and use that as a thing to feed forward the part that I haven't really figured out or the part that keeps me a little bit worried over time my agents contacts are slowly growing and growing and growing and I have seen like the amount of non goals I'm now putting and goals I am putting is like starting to like I do experience signs of context fraught and that's like something I haven't truly figured out yet. But that's, that's kind of been my experience with it.

Anna Yoon [00:31:22]: Yeah, there's another tool out there speaking of the golden pieces and some other pieces that you're like ah, like iffy about. There is, there are these existing tools where you can really break down your prompt into chunks of the message. So there's like message chunk 1, 2, 3, 4 and then in code you can decide if you want to switch on and off like each of those chunks. So that's how you can experiment like different prompts to see oh, when I turned off chunk 2 the output was like this. When I turned on chunk 2 the output was this. And you can kind of compare the performance of your, of your context of your prompts that way. So just wanted to mention that.

Praveen K [00:32:10]: And also I think one thing which helps is you try to instead of putting into paragraphs. You try to get into markdown code block format from using ChatGPT. That is very helpful. And that gets you very quickly to what you want. So instead of writing, you, you summarize, you give the goals, you say that this is how I want to do. And then you say, give this as a prompt to whichever tool you want to give or give it in a markdown code block format. And then you just paste it. And then your output is 90% close to what you are looking.

Anna Yoon [00:32:49]: That is really cool. All right. Yeah, prompting. Chatting about prompting is always so fun. But the next question that we'll cover also from Bruno, do you just want to speak about your question yourself? Are you.

Bruno Lannoo [00:33:03]: Actually, I was almost going to bring it up myself because, like, the contextual thing is.

Anna Yoon [00:33:07]: Yeah, yeah, yeah, go ahead.

Bruno Lannoo [00:33:09]: Of course, to that other thing, which is like, I'm trying to work on this thing. I don't have a great name for it, but I'm calling Vibe reading Vibe documenting, which I think is kind of trying to figure out a way of dealing with, like, how do we gradually accumulate more and more processing with DLL without losing track of it? And what I saw, what I do is like, I create this, like, docs folder. Inside the docs folder, I create an LLM folder, which I say, like, that's your scratchpad in there. You can work. And I ask it to collect insights and make little markdown files of these insights. But I also give it the rule of, like, I heard once, and I don't even know if it's true, but I'd just like to work with it that, like, the human mind can handle, like seven elements at one time. So any system should be split up into sets and subsets of seven. So you can have seven subfolders, and in each subfolder you can have seven subfolders again.

Bruno Lannoo [00:34:01]: And at the end you can have seven files and you can have, like, this kind of things. And I kind of put that rule into there, asking the LLM to generate these insights in Markdown. But anytime there is more than seven files, it needs to create subfolders subclassive, categorizing them and pushing it further down. I also ask it to create, like, index files that kind of help you to discover where these. These files are being. Yeah, kind of guiding to and in what order should you read them and when should you move towards extra. And the last thing that I also do is I'm starting to experiment and I currently have, like, two levels of that with levels of granularity. So I asked it to give a a human readable for a highly experienced senior developer.

Bruno Lannoo [00:34:47]: Rapid to skim level and LLM target. It's like more overview of all the insights and also I also give it a lot of push of never duplicating actual documentation that is actually in the code base made by the engineers, but only hyperlink and try to use as many as possible links between these different sections where I run into a little bit troubles that like in the IDE I'm using in pycharm like you can link a markdown file but you cannot link a section inside a manual file. So that does make it a little bit less convenient. But I have this idea of like well if you would create this structure of knowledge that is like not too broad but like gradually growing and then multiple levels of it. Also something that is also always in my mind when I think of that like many, many years ago I saw this presentation about C4. I think it's Simon Brown that talks a lot about C4 and you are saying like wouldn't it be cool if like we saw our code like we saw Google Maps where like well when you zoom in all the way, you see the city, you see the streets, you see the houses, you see the single commands and the single like keywords and the single variable names. But if you scroll out and you zoom out, there's a point where like it doesn't all fit in your screen. And there's something about Google Maps it's somewhat magical and it knows like well the street name for that short street like that can disappear before the street name for this major street needs to disappear.

Bruno Lannoo [00:36:09]: And it's something that kind of gradually chooses what to remove so that you can keep on zooming out and always have a reasonable amount of detail visible for the level of zooming you have. And I'm thinking of these, these different folders which like is like the detailed versus the like more. More high level and the very, very high level. Like trying to create these scopes and a lot of hyperlinks to be able to jump between those. And that's kind of like. I think, yeah, something, something I think could be very helpful to kind of manage like this vibe. Quoting. Yeah, it is useful in some ways.

Bruno Lannoo [00:36:44]: It's here to stay I think. But it is hard to deal with the amount of quantity of stuff we need to read because of what it's generated and having a technique that can help us like to navigate that I think would be quite valuable and I'd love to hear what People can contribute to it to. To tune it or improve it, or like look at it differently and.

Lucas Pavanelli [00:37:05]: Yeah.

Arthur Coleman [00:37:07]: Go ahead. Sorry.

Praveen K [00:37:08]: Which LLMs are you using? Right. So are you building your own to do this? But any existing LLMs which you are using, which are for example, copilot or cloud, right. They are already doing all that by creating a restore checkpoints each time your prompts grow. So I don't know, I think unless you're doing your own, it's all taking care. All these models, whichever you are doing, they are already creating their restore checkpoints. You come back or you are switching between the models, they are already creating their restore checkpoints and coming back to not disturbing the existing documentation. I think when you get experience with this prompts, I think it can follow that whatever you're trying to do, that's already happening.

Arthur Coleman [00:37:56]: Bruno. Anna, can I jump in? I want to share my screen, if that's okay. I know I'm supposed to moderate, but this is such a important topic. Can you see my screen? Okay, so to. To whoever was speaking, I lost two. Bruno. So to your point, what I've learned is that especially when you change context windows, new conversations, I don't use CLAUDE code. I tried it and I won't go into the details of how I tried it.

Arthur Coleman [00:38:26]: I use CLAUDE because I find that the context window is longer and it's more sophisticated, uses a different version of opus. And so I'm using that. The one thing I've learned to do is I created a collaboration process doc and it goes over time as I learn things about using. And you can see, here's what is covered. Here are the problems that I've learned that we've learned the AI and I have learned working together. And then I have rules for how I'm supposed to do things, and there are rules for how Claude is supposed to do things. At the beginning of every. It's in my Pro.

Arthur Coleman [00:39:05]: Claude allows you to bring files into a project file store. So I put this in the project file store that's always there. So even as I change conversations, it's there. But. And this is how Claude knows there's other reasons it has project instructions as well. But this is one of the things I use to make sure that it follows a set of dang rules every time I keep coding, because otherwise it goes off track. And in particular, these are the things that it does that drives me absolutely nuts. So anyway, that's.

Arthur Coleman [00:39:37]: That's to your point, Bruno. That's an example of a document that I do. How do I stop sharing. There we go. That actually has. I found. And the AI writes it. And then as I learn, I say, okay, we missed that.

Arthur Coleman [00:39:49]: Write me a section. And then I cut and paste it into the document. I'll stop. Sorry.

Anna Yoon [00:39:55]: I think that was golden. It was really nice to see how you create this like, feedback loop in to the agent almost. That's so cool. Bruno, is the process that you follow to create all these logs, like, do they look similar? Is there something that you do differently to this or. Yeah, we'd love to hear how you handle or manage your log.

Bruno Lannoo [00:40:18]: There are different similarities with the things that have been shown. Like, I have a guidelines document that describes like the way I want the agent to actually create this whole, like, documentation overview, which looks a little bit. Because there are a lot of my frustrations of things that like the agent doesn't end up doing well that are Enigma in there. And I also allow it to extend these guidelines. Whenever I express a frustration, I kind of tell it like, well, if you notice I wasn't happy about something, maybe change it in the guidelines so that you do it less often. Which does like, go. Go very much into this. Like, this might already be implemented because I use Juni mostly, sometimes cloud code because it just integrates in the tool and because of like, I have access to these tools easily based on my work.

Bruno Lannoo [00:41:05]: And. And yeah, I do think, like, there are these hidden files where like they do do similar kind of recordings.

Arthur Coleman [00:41:09]: The.

Bruno Lannoo [00:41:10]: The one difference is maybe like, I'm more in control. Like, I. I haven't really dived deep into like how they kind of do these recordings. And if I can control it, I can influence. So that that's where maybe I chose to go more that way because I felt more control, but maybe just a lack of my insights of what they're doing and I should just reuse a bit more of that because like reusing other people's work is often like, leading to a lot of discoveries. On the thing Artur showed about the guidelines, I do think, like, I try to split up into multiple files a bit faster because I see like it's a rather relatively large file with a content table. And I find for linking and for quick overview. Personally I prefer having like, short files that fit within like one or two screens and then having many of these files and wiring them around with hyperlinks.

Bruno Lannoo [00:41:59]: But that might also be dependent a bit on the tool you're using because the tool I'm using is quite convenient to hyperlink between files and not as much between sections of a file. So depending on if you have that capability, that might evolve very differently. But there are similarities for sure.

Anna Yoon [00:42:14]: Very cool. Yeah, I think handling, being able to handle the links will be really important because from my experience, you know, like working on the production code, links are something that I do not want to mess up and don't want to like give the agent to generate. So I always do a post processing to make sure that these hyperlinks work and exist within my project. But it's really cool to hear that there's a tool out there that just manages it well for you.

Rohan Prasad [00:42:46]: I think the other interesting part about this in terms of like linking and stuff is like, it sounds like you're embedding your entire code base with context based on how you're creating those various nested layers in terms of documentation and how that works. So it's like maybe you only need a cursory overview of this particular module, but maybe you need to double click on this module and go a couple layers down. To me, where that gets a little bit interesting is it starts feeling a little bit more like a search problem where if you actually are able to create a set of like embeddings on top of like all those various layers, you can actually like before you start a new task, you can actually search across your entire code base efficiently and know where you have pieces that you need to pull in and where you have actually like things. Once again, going back to the earlier example where things you just actually want to completely avoid, like avoid folders, Foo and Bar. But I want you to double click on Baz and double click on Buzz and go like a couple layers deep to touch these particular sub packages and sub modules. For what it's worth, one of the things that I've been trying to do is whenever I have Claude make a change, I document it in an architectural decision record, which has top level. But I haven't figured out the whole nested structure thing that you were talking about, which I think is super cool.

Anna Yoon [00:44:05]: Right. That was a great discussion. Arthur. Do you want to.

Arthur Coleman [00:44:10]: Yeah, this is a good one. I really do want to get this out on the table. Yeah, I am. I'm actually doing something called Vive Entrepreneuring, which is I do it all. DevOps, coding, marketing, business planning, all of it. So when I get into the development part, the question is, here, AI, here's what I want to build. Let's go architecting. Great.

Arthur Coleman [00:44:34]: The problem is that it doesn't think about alternative mechanisms that it should. You know, it'll Come across it'll want to write code when Google Cloud has a function that's already there. So how do you, how do people on this call ensure that it's followed the most efficient, most cost effective approach versus the one that first comes to mind. Is that a good way of phrasing it, Ana?

Valdimar Eggertsson [00:45:02]: For sure.

Anna Yoon [00:45:02]: And I am curious. One other piece that goes into it is like how much willingness do you have to involve human in the loop process? So like, do you want Claude, for example, to give you like lay out different options and do you want to pick that best approach or do you want it to like lay out all the options, also make the decision themselves?

Arthur Coleman [00:45:31]: Who do you want to answer, Ana? Choose somebody.

Anna Yoon [00:45:34]: We'll go with you first, Arthur.

Arthur Coleman [00:45:38]: No, I answer the question, let someone else answer. I'm not the expert.

Praveen K [00:45:42]: So usually what I do is I say that act like a senior engineer and you are an architect, you have a lot of experience dealing with this and this is what we want to do. And then usually I did not use plot much, but I've used Grok. Gronk doesn't give much explanation, but the GPT models, they usually give these approaches for this problem. You can approach this, this, this, and I think probably this might be. And then you reason on top of it that then if you go with this approach, how many, for example, right. It's a multi threaded application. Do we run into any thread or deadlock kind of thing? What do you think? So then that's how the conversation keeps going and you can pick one of those. That's what I have done.

Anna Yoon [00:46:35]: Nice. Nice.

Lucas Pavanelli [00:46:37]: Yeah, I can talk a little bit about that as well. So. Yeah, so normally I tend to get a lot of time designing the architecture first and then jumping to the code. So normally what I do, I, I use like Gemini for example, not the coding environment, the coding agents.

Arthur Coleman [00:46:57]: You get me.

Lucas Pavanelli [00:46:59]: Like I do that part of like getting some options of what are the options that I can use to implement this feature, for example, that I need to, that I need to do so and then I keep chatting and like trying to like also using my own knowledge to see if I'm reaching some cost, like cost efficiency and also if it's actually solving the problem and then only after that I get to the coding agent to implement this and then also I keep reviewing what the agent's doing. But like for features that are important, like that I want to face production. I like to do this because also in the end of the day with this first iteration, with like Gemini for example, I get some type of documentation what I'm implementing which I can share with the company side as well.

Bruno Lannoo [00:47:47]: Cool.

Anna Yoon [00:47:47]: Rohan.

Rohan Prasad [00:47:50]: Yeah, it's the interesting thing.

Praveen K [00:47:53]: I think.

Rohan Prasad [00:47:54]: There'S like a couple different ways that I kind of go around this. I'm not, I'm gonna be pretty upfront and say I don't think I have a perfect solution around it. One, I'm, I'm like, I, I heavily, heavily use research modes on for example like Gemini or Anthropic in terms of like defining architectures or making decisions or like even exporting like a particular architecture decision in terms of like what I want to use for a particular place. The other thing that I have started adding more from a testing perspective is more perf based testing even in kind of unit tests at the end of the day. Like I think for me the thing that matters is whether the code performs at a certain level and has a certain rigor and quality around it. There's certain aspects around readability and maintainability and whatnot as well. So I have a lot more tests now that like you know, validated that this function runs on a good known well set of data and runs within 1 second or 2 seconds or whatnot. So that's, that's how I kind of take a little bit more of the tack against performance characteristics.

Rohan Prasad [00:49:00]: Examples also help. So if you know you have a good. If your code base is filled with bad examples then like gliota is like I've seen like my agent pick up bad examples so I try to like split those away as well. It's kind of like a hand wavy response but that's kind of the way I kind of this particular thing.

Anna Yoon [00:49:22]: Yeah, totally makes sense. Baldmore.

Arthur Coleman [00:49:27]: Yeah.

Valdimar Eggertsson [00:49:30]: So it's been a while since the question, since Arthur asked the question, but it's about like making the design decision like not trusting the agents with all of the scientists.

Anna Yoon [00:49:43]: Yeah. What's kind of your approach to making sure that the agent is implementing the best approach possible? Both low cost and you know, high performance.

Valdimar Eggertsson [00:49:57]: I mean it depends on if it's something that I know as well or better than the agent. So I would start with using the deep learning research mode, maybe to find strategies that work, review it with them and yeah, maybe point out that okay, these don't work given our limitations. But let's go with this. Make a strategy document, hand over that strategy document to another agent to figure out a on point plan and from that you can get specs of what the software should be like and from there you can tap tasks and then you can run this through different data and get something running. But at every step of this process, if you want it to work properly, you should put in as much of your knowledge and intuition as you can. And yeah, and often I don't know it well enough, so I just trust the agent. But if I know it well enough then I can really help.

Anna Yoon [00:51:12]: I think that's a pretty common dynamic between you as a developer and the agent. Cool. Praveen.

Praveen K [00:51:22]: No, I didn't hand any question. I think I answered my part.

Anna Yoon [00:51:27]: Cool, Cool. Okay, we have a stream of new questions that came in. Aaron, are you up to channel say.

Bruno Lannoo [00:51:38]: Last word on the previous question?

Lucas Pavanelli [00:51:40]: Because I really liked.

Bruno Lannoo [00:51:43]: Because I have a very different perspective actually to it. I already see this the other way around where the original question was how do you make sure the agent doesn't come up with the first thing that comes to mind? But I feel like that's a human problem. Like that is how do you make sure I don't go with the first question that comes to my mind and my colleagues don't go with the first question that comes to their mind. I feel like that was always a very challenging problem. I never knew how to solve until LLMs came around. And like, it's not by default solved, but it's quite easy to solve with an LLM, I find, because you just have to ask the LLM like, can you come up with three different solutions? And it is more likely to come up with different solutions than a human would. And on top of that, if that doesn't work, you can go with the roleplay approach where you say like, well, you start a first conversation, ask it for one solution, then start a new conversation, explain or ask a summary from the first conversation to explain the position and say like, you are an engineer that does not agree with the previous engineer, explain your alternative to learn and like that way you really can get different ideas, which was so difficult with humans because even with myself it is very hard to be creative when I think I already have a solution.

Arthur Coleman [00:53:00]: And I think I gotta, I gotta come back to you on that, Bruno. Because what I learned is architecture at the beginning is design patterns. And if you don't lay out the right design pattern, you go, you go off track very quickly. And yet there are lots of design patterns. And I'm not an expert on design patterns. So the, I need the machine to recommend 2, 3, 4 of those in order to understand. And then what I do is I go research what the machine tells me and then I Learn, and then I can make an informed decision. But I need the.

Arthur Coleman [00:53:35]: I need the LLM to first recommend to me different approaches. So that's how I hear you, but I disagree slightly.

Bruno Lannoo [00:53:44]: Yes, I can definitely see different perspectives, and I don't think the one thing I was saying is guaranteed to be. I'm just like, it's a very different perspective. I definitely need to throw that one in there.

Anna Yoon [00:53:59]: Cool.

Rohan Prasad [00:53:59]: For what it's worth. For what it's worth, there's like, a comment in the chat about, I think Aaron put it in, which is just about, like, LLMs being, like, too agreeable. And I think, to your point, Bruno, it's like, I think, like, adding extra stuff or asking an LLM to fact check another LLM and like. Or completely disagree. Enforcing that, I think leads to, like, a lot of interesting patterns. Like, I think a lot of these models are trained on just being really, really agreeable and, like, pleasant. And so, like, actively forcing patterns which forces contention, I think is pretty useful.

Bruno Lannoo [00:54:37]: It varies from LLM to LLM, though, because, like, I have definitely noted large difference in agreeableness between different LLMs. Right.

Valdimar Eggertsson [00:54:46]: I saw someone yesterday prompting the coding agent to behave as it were, lean, as tolerant. Maybe that's a strategy. There's a raised hand in the audience also. We've been missing some raised hands. Not show it so see it so easily in the ui.

Arthur Coleman [00:55:04]: Raised hands, please, Dennis.

Valdimar Eggertsson [00:55:06]: Okay, took it down. Sorry.

Arthur Coleman [00:55:11]: By the way, we are at 9 o', clock, almost like one minute away. I'm letting this go on as long as people want to let it go on. Ana, if you can stay, I know you're really busy. If not, I'll pick up the moderation, but feel free everybody to. I know you got. Everybody's got a busy day, but those who want to stay on and learn more, there's loads to learn. We've hardly even scratched the surface of all the different workarounds that I know I've had to do. I would love to hear yours.

Arthur Coleman [00:55:37]: I've learned so much already. People who see this video are going to just enjoy this a lot. So I will. I will stop and let. Anna, you can keep going or let me know what you want to do.

Anna Yoon [00:55:50]: Yeah, I do have a hard stop, but it was really interesting to see how different people use these tools and their strategies. Thanks for the discussion.

Arthur Coleman [00:56:02]: All right, let me. Let me let people go who are going to go, and then let's see how many folks stay. And that way I know who's asking questions, who's still Here. Give it a minute, folks. I'm just going to let. Let people drop. Benoy, is it okay to let the video run on?

Valdimar Eggertsson [00:56:31]: There's some time around this call down in the left corner, but we have.

Rohan Prasad [00:56:34]: Another icon to that need be.

Bruno Lannoo [00:56:37]: But just.

Rohan Prasad [00:56:38]: Just in case, I set it to.

Bruno Lannoo [00:56:41]: One hour and 15 minutes.

Lucas Pavanelli [00:56:42]: So.

Rohan Prasad [00:56:44]: Okay, I can ignore it.

Arthur Coleman [00:56:47]: That's cool. Let's. But you're okay. The video can keep going for longer. Okay.

Lucas Pavanelli [00:56:57]: All right.

Arthur Coleman [00:56:58]: Guillaume is up. And Guillaume, I don't know if you're still here. Otherwise I'll ask the question for him. Gam. Yes.

Praveen K [00:57:11]: No.

Arthur Coleman [00:57:11]: Okay. AI can write code better than me. No kidding. My task now is more about architecture, engineering decisions and so on. How can I learn that while working with an AI agent if I'm distant higher abstraction from the work itself more generally, how can I learn the skills that will be valuable to this work if I'm increasingly delegating the engineering and decisions to AI? Who's here from our speakers? Anybody know? Our speakers have all left, so would anyone like to take that blue call?

Valdimar Eggertsson [00:57:51]: I'm one of the speakers, but I'm great.

Arthur Coleman [00:57:55]: Take it, please.

Valdimar Eggertsson [00:58:00]: Yeah, I just think it's a great question. It just sparks some thought in me and I don't have an answer to it. Like we. If you don't use it, you lose it. So like it's kind of pointing out the way we. How can I learn the skills about architecture, engineering decisions and so on if I'm distant from the work itself? Yeah, I have no answer, but I think it's interesting to ponder. Does anyone have comments in particular?

Arthur Coleman [00:58:31]: Does anyone who's been an engineering manager? Because that's a management question. This is the problem engineering managers have they when they move upstairs, they lose hands on with the code. Is there any engineering manager who's working with AIs right now?

Valdimar Eggertsson [00:58:44]: I see Guillermus on the call with a raised hand. There's two raised hands, so please be.

Arthur Coleman [00:58:48]: I can't see the raised hands. I somebody else. I can change the view now. There we go. Okay.

Valdimar Eggertsson [00:59:00]: And they dropped their hands.

Arthur Coleman [00:59:04]: They dropped their hands. Okay, does anybody have a comment? I do, but does anybody else have a comment on this? Okay, I really want everybody ideas that we share here guys. So please, I don't want to be the only one with me involdem are the only one talking. And Aaron, I see you so feel free to jump in on this topic. So let me ask you, since I can see you, do you have an answer to this question?

Erin H [00:59:36]: I think this, it's a bit of a conflict of interest of we made a course on this topic at Udacity that's teaching basically fluency of what are design patterns? What are software architecture patterns at a 10,000 foot level? What are these things and what are they called and what do they do. But I do think that like that's not, that's not really a proven way to learn this. The more proven way is just like being in like company that is making software and like picking up along the way what are the things that people just like inherently. They expect everyone to know what this design pattern is. There are certainly some design patterns that are like that. They're just like adapter pattern. Everyone should, everyone knows what that is.

Anna Yoon [01:00:26]: Right.

Erin H [01:00:27]: So I think that's like, I don't know what like the personal like situation is of like what kind of environment you're in that you're like now expected to have this like management role without necessarily the on the ground. But really just like if you can be embedded in some of those conversations, like to me that's, that's where a lot of these things is, where I learned them. Rather than like a course or a, a book that can be, it can be more challenging because you're only getting the sort of clean official version of things.

Arthur Coleman [01:01:05]: Guillerm, I have to ask you a question on your question which is is this designed for someone who has been an engineer and has already done some work or is it for like a new person who's not really like coming out of school who doesn't know and they have to get trained. This is the worry. Everybody has that with AIs. Nobody's going to learn how this stuff works anymore. It's going to be like calculators versus knowing how to do real math. So can you tell me who you're targeting in this question?

Valdimar Eggertsson [01:01:37]: He made a comment in the chat saying I'm here but can't speak. So.

Arthur Coleman [01:01:41]: Okay, all right, can you tell, you can put it in the chat. Okay, so it's more for the engineer. Okay, Any, any comments on that? Baltimore or area or anyone else.

Valdimar Eggertsson [01:02:01]: There's a question of how necessary is it to have the hands on experience or know the low level details. You know when they're designing processor or whatever processors, there's like nine language abstract obstruction where you only have to know one level to be an expert and it doesn't really matter what's lower down.

Bauke Brenninkmeijer [01:02:22]: So.

Valdimar Eggertsson [01:02:25]: Yeah, I don't know. Arthur, you have experience with managing software. So like what, what do you Think.

Arthur Coleman [01:02:32]: Yeah. So first of all let me give you a little background. I've never been, I'm a product guy, I'm a chief product and engineering officer. I've only recently run engineering. I'm now I haven't been a head on coder in I don't know how long. And so I come at this without a lot of the years of writing code and working up the engineering ladder. I've done some, but it's been a long time. And so what I've done is when I go to events or I do anything I ask engineers, I ran into a problem, you know, first, first project.

Arthur Coleman [01:03:05]: And, and by the way I think the best way to learn is, is to do small projects first or take one problem that you have to have and do that as an AI driven project and learn about the things that go wrong and take them one at a time. And for me the lesson, the two lessons from my first project was regression testing and architecture. But I didn't know anything about design pattern, I never heard of design patterns all my years. So happened, I went to an age inverse event for Google, sat next to a head of engineering and I said I have this architecture problem, how do I deal with it? He said read this book, Design Patterns. And so I took that and used that, I fed it into the AI and I watched and learned and that's how I learned. You know, to those of you on the phone who are on the call, who are relatively young and new, you know, again we're going through a major change in the nature of work and you know, learning how to use a calculator is, is the same sort of example. I don't need to know how to add 2 +2. I can use the calculator and I have a mental model in my mind of how that works and I don't have to be the expert.

Arthur Coleman [01:04:17]: You learn something in school, but then you move on from basic math, you know, the commut, commutif commutivity rule to actually just not thinking about it. And so I think you have to accept in this new world order that you're not going to be the expert on everything or even necessarily anything. And the AI will be your detail and as you find holes in its work because it's an intern, basically think of it as your intern or your junior. You then go through and teach yourself. So self learning Udacity. I love Udacity by the way. I knew the founder of it. I'm a big Udemy fan and I worked at Udemy by the way.

Arthur Coleman [01:04:56]: So, you know, I think you have to come to accept that the nature of work is changing and how you think of your knowledge base and what you need to know and how you learn it is changing as well. And I don't know if anyone agrees or disagrees with that. Aaron, what do you think?

Erin H [01:05:23]: Yeah, I think it's changing. I think we have a sort of self interested stance here that it's not that the work is going away, it's that it's changing the level of abstraction and so there's different skills you need. But it's not that like the LLM is truly replacing anyone's skills, it's changing their productivity level and it still is becoming more like you need to know what good looks like rather than being able to write the good code yourself. But it is still like we're still very much on a frontier here. Like I don't know what it's going to look like in 2027. Like we could have it be. None of these problems are there anymore with Claude code. Claude code is now perfect a year from now.

Erin H [01:06:15]: And that would change a lot of things too.

Arthur Coleman [01:06:18]: Here's a question. It's not about what we do, it's about what our vendors are doing. Does anyone know one of the issues is memory and I've heard that context window and losing it and having to restart it, that's a big issue. Does anyone hear about the vendors now adding long term memory to these platforms so that we don't have to worry about context windows? Or is that still going to be an issue in 2026?

Valdimar Eggertsson [01:06:46]: I know it's like the main focus that is like the memory is the moat. That said like OpenAI doesn't have a mode or anybody has a mode to protect their thing. But how well you memorize who the user is, what he exactly wants, that's gonna make the user stick. So I think at least OpenAI and these big companies are really focusing on this now. That's what I've heard. But yeah, I just know it's more important than I realized until somebody pointed this out to me recently.

Arthur Coleman [01:07:21]: Anybody else? Mag, Andre, Anna, Lawrence? Anybody have a view to both that and how they're dealing with the context window in the short term?

Valdimar Eggertsson [01:07:37]: Okay, I have one thought to share related to what we just talked about earlier about like not knowing the hands on or not just having the high level view and not understanding the details and just there was this old computer science paper and computer science, it's one of the things that stuck with me and it's increasingly relevant with all this agentic coding. It was called programming as theory building. The founders of computer science called Peter Knight, who invented cobols or something and some principles of computer science when he was an old guy in the 80s, he made this message to the programming community saying, like, programming is not about generating texts of codes and artifacts out there. It's about building a theory in the mind of the programmer who needs to know it and live with it and maintain it. And I think we're kind of losing that. At least when I'm in a rush, having the AI build everything for me, then I get stuck. I don't know it. So, yeah, we lose the theory in our heads.

Valdimar Eggertsson [01:08:48]: And this is like why you cannot hand over a programming project to a new company or new developers, because even though you document everything really well, then you lose the theory.

Arthur Coleman [01:09:01]: Well, that's what you learn in school. We can go back like how you learn two plus two and you learn how to do various functions, but then you don't need to do that anymore. You have a calculator, but you learn the theory. And that's probably how educate will evolve. Let me move on to the next question. This is a good one. And I'm going to leave anybody on the call, please, people, jump in. There's so much we can learn from each other.

Arthur Coleman [01:09:21]: When the agent goes in a direction, boy, I wouldn't have gone in, but it seems okay. Do you deal with it? Do you let it go? Do you follow it down the rabbit hole? Or after three hours later, you discover, you know, that it's been a total waste of time and you have to start over. How do you avoid that rabbit hole trap? Give it five or ten more minutes, Benoit. I think we're pretty much at the end. Lauren, you asked a question earlier. Can I impose on you? I don't mean to point people out though. It's really meant to be interactive. I'm trying to learn for you guys.

Arthur Coleman [01:10:05]: I don't know everything. So I would love to hear if you've ever been down the rabbit hole. Lorenz. And you know, if you think that there's a way to avoid that, that you've learned going forward.

Valdimar Eggertsson [01:10:20]: Lorenz commented that he's in the library.

Arthur Coleman [01:10:23]: Oh, okay. All right, I'll turn it to you, Baltimore. I'm gonna let you in.

Valdimar Eggertsson [01:10:34]: Yeah, I hear. Just thinking about it, it's like, yeah, I just follow it down the rabbit hole and get stuck for a few hours and wish I wouldn't do that. Good to take a Break and step away from it and come back. Yeah, I have no solution to that. Do you check in?

Arthur Coleman [01:11:02]: You check in along the way. You try to remind yourself, you know, the problem is with. We talked about earlier about forgetting that whole conversation. Do you find ways to break the work down into small chunks? Which is one thing I've learned to do, so that as I go through the chunks and if I see something really dumb, I stop and go, wait a minute, guy, are you really thinking about this or do you get down the. Have you had any techniques like that that you use to prevent going too far down the rabbit hole?

Valdimar Eggertsson [01:11:30]: Yeah, I guess. Having like plans, like hierarchically like structured plans where you have inside plants and staying on track for the high level thing without going deeper and deeper. This may be related to Rule no. 7, Rule of 7 or something.

Arthur Coleman [01:11:53]: Yeah, that's what I do. By the way, if I could show a document, I'm not going to do it now. I make a plan with the machine and we agree what the plan is and it's broken down into very specific steps. And then I go one step at a time. It can't memorize a whole. It's like one thing I learned about managing engineers, which was new for me coming into engineering the first time, is they don't like you to give them the vision. They're really good. Give them one task.

Arthur Coleman [01:12:19]: And I'm not trying to insult engineers at all. What I've learned is focus on one task, let them focus on it, because that's their art form, and then move on to the next. So I treat the AI very much the same. We do one task, we focus on that task, we get it done. I go back to the plan. I literally strike out that work from the document, say we're done, and go forward to the next step. And I find that that's the way I not only keep track of the entire conversation, it also helps me structure the logs, which I do do, post conversation. And as a result, I can keep track across conversations.

Arthur Coleman [01:12:54]: I don't lose context. Sheen doesn't go down a rabbit hole too much and I'm able to back it out. And I learned that lesson the hard way in terms of when. The first time I did it, it went through a long thing. You got a long way down and a totally long two days of work to that. So that's the technique that I developed to prevent that from happening. Aaron, what's your approach?

Erin H [01:13:18]: So I'm. I'm fairly spoiled that I'm not really using AI to do Things that I wouldn't know how to do myself. So, like, or so I tend to be very opinionated, but I'm mostly creating, like, educational code or I'm creating visualizations. So I usually have, like. I will keep telling it, like, no, you did it wrong, but, like, the particular thing of, like, it's. It's going off in another direction. That seems okay. I'm not really building software where the functionality is the point of it, so I think I'm in a pretty different situation than most people here.

Erin H [01:14:02]: So, yeah, I tend to be very, like, nope, this is what I said. Do it again in my attitude and not really being like, oh, I see where. How that could work. Because it's usually something, again, extremely specific that I'm trying to get out of it. And pretty short in terms of lines of code. Like, I'm usually in the hundreds of lines of code, code maximum for what I'm working on. So I think it's probably pretty different than a lot of people who are building, like, bigger projects.

Valdimar Eggertsson [01:14:36]: Okay.

Arthur Coleman [01:14:37]: All right, we got. We're going to stop here. Benoit. I think the one thing that the last question is, does it make sense to have a part two, and there was a high exclamation point. Yes. So maybe we continue that with a very structured approach in the future. I'll talk to you about that, where we go through specific topics that each of us has found problematic, and we work through really practical solutions. But I'll come back to that.

Arthur Coleman [01:15:02]: Anyway, everybody, thanks for staying so long. I hope the extra time was worth it. I hope we gave you a few pointers that will make you more efficient at your work. This has been a fabulous session. I've learned a lot from the speakers, so thank you again. All right, all, live long and prosper. Have a great day with coding with AI. Take care.

+ Read More

Watch More

Why and When to Use Kubeflow for MLOps
Posted Jul 07, 2022 | Views 887
# KubeFlow
# ML Engineering
# Kubernetes
# mavenwave.com
MLOps - Design Thinking to Build ML Infra for ML and LLM Use Cases
Posted Mar 29, 2024 | Views 2.5K
# MLOps
# ML Infra
# LLM Use Cases
# Klaviyo
# IBM
MLOps Reading Group - December : A Taxonomy of AgentOps for Enabling Observability of Foundation Model-based Agents
Posted Dec 27, 2024 | Views 399
# AI Agents
# Observability
# AI Systems
Code of Conduct