Sign in or Join the community to continue

This One Shift Makes Developers Obsolete

Posted Mar 31, 2026 | Views 37

# AI Agents

# Software Engineering

# AI in Production

Share

Speakers

Jens Bodal

Senior Software Engineer II @ Independent

Jens Bodal is a senior software engineer based in Edmonds, Washington with nine years of experience building developer tooling, internal platforms, and web infrastructure. He spent seven years as an SDE II at Amazon, working on teams including Amazon Games Studio and the AWS Events Management Platform. His work has focused on developer tooling, CI/CD systems, testing infrastructure, and improving the developer experience for teams operating production services. He is particularly interested in developer experience and the growing ecosystem of local tools that help engineers build and run AI systems on infrastructure they control.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

AI agents are shifting the role of developers from writing code to defining intent. This conversation explores why specs are becoming more important than implementation, what breaks in real-world systems, and how engineering teams need to rethink workflows in an agent-driven world.

+ Read More

TRANSCRIPT

Jens: [00:00:00] If your specification is so detailed that there's no possible way for the agent to like derail from that plan, there's no reason to essentially read the code because your understanding of the code is less important than the specification that you put in place that it has now to

Demetrios: Dude. Well, should we talk about the conference? You came down and I'll set the scene for the listeners because you popped up, I think in Slack and you were like, Hey, I'm coming down for the conference. It looks awesome. Um, pretty excited. And I was like, dude, that's cool. Where are you coming from? And you said Seattle And I was.

Demetrios: Surprised I think that anyone was gonna travel for the conference. 'cause I had basically put a bet on everyone that came to the coding agent conference, being from South Bay and like people from sf, [00:01:00] I wasn't even expecting to travel the hour to South Bay. And then when you said that I, it started opening my eyes to like, oh, well maybe this conference is gonna be pretty cool and it's gonna be like people traveling from outside.

Demetrios: Turns out a lot of people traveled. There was a guy that told me he traveled from Germany for it. So like hat tip to him. But then I, I told you like, well I'm staying here and if you want, we're having a team dinner the night before and so come hang out with the team. You took me.

Jens: And I didn't even know what that meant.

Jens: I was like, what team? I didn't, I didn't even know if I realized that you were the actual community leader for MLOps. I thought you were just like someone I met on Slack when I joke.

Demetrios: Just were like, sure, dinner. I'm beforehand, I'm going, I'm in. That is amazing to me. You, uh, were, you were saying Yes. And, and then we met, uh, at the team dinner the night before.

Demetrios: You got to hear all of our stress the [00:02:00] night before the event. Like, oh my God, that person didn't turn in their slides. What are they gonna do? Or Dex gave us 200 slides. How is this possible? He is not gonna give a 20 minute talk with 200 slides. That's like not physically possible. And spoiler alert he did.

Demetrios: So that's wild. Uh, and he did finish.

Jens: Yeah.

Demetrios: Yeah. He finished on time and actually kind of early, so that's wild. And. What I wanted to do here is just talk through some of the stuff that you've been thinking about and I've been thinking about since the conference happened. Like there's some key pieces for me that I remember and I learned from that event.

Demetrios: I imagine there's some stuff for you. So let's like cross pollinate and talk about it. Maybe first I can talk about what I did as an asset for the post event. ~Did you see the GitHub repo I put together? ~

Jens: ~~Uh, I didn't open, I I opened the links. I didn't go to the GitHub, but I grabbed the, no, I did go to the GitHub.~~

Jens: ~Uh, ~

Demetrios: ~~so what I'll, I'll explain what I did.~~ I took the transcript 'cause we had the six hour live stream that's still up, but we're making [00:03:00] individual videos right now and it should be out by the time that folks are listening to this podcast. Uh, so if you go to our YouTube, you should see all the individual videos, but the live stream.

Demetrios: Transcript. I took that, I fed it through Claude and I asked it to identify areas where we talked about certain skills and what skills worked, what skills didn't work, and what our favorite skills are. Because throughout the day, people were talking about this skill is great, this skill is great, blah, blah, blah.

Demetrios: And then I asked it to either link to existing skills if we were talking about existing skills like Claude Ception or something. And then if we weren't and someone explained the skill, create that skill. And so we now have all the transcripts in a GitHub repo. We have all these new skills that people were talking about.

Demetrios: And since I have, we've been doing these Lunch and learns on Fridays, I've just been continuing to add to that. As we have lunch and learns and we [00:04:00] talk about new skills that are very useful for

Jens: us, I think it's awesome that you're able to now do all the things you could envision doing if you could like.

Jens: Think back to how you can parse all this data that you have. And so like I, I did something similar of course, uh, with my, my own setup, ~but like, well, so ~I was in the process of like, I was just gonna write, you know, a simple blog post about like, how my experiences went. And then next thing you know, I'm doing a lot more than I, I normally do, but I, uh,

Demetrios: you on those side quest.

Jens: Yeah. I wanna, I always go on big side quests. Uh, so I mean, I, I have a few research docs put together. I didn't have it like prepared to talk about right now, but I was working on, I might, here, I'll see if I can pull up what I have, ~but ~

Demetrios: ~yeah, what you got? And then while you're ~

Jens: ~~pulling that out, I was working on a, uh, so one of the things that I.~~

Jens: ~~I guess what, you know,~~ one thing I I found very interesting as I kind of started diving into all this in regardless, has been, uh, just like understanding like how to learn and how to understand like these concepts and like kind of [00:05:00] going back to, to like taking information in. But I dunno if you remember in the two thousands the scrolly telling style of websites, uh, do you ever remem there was a, I think it was a New York Times article or The Atlantic or something, and it was an article titled Into the Deep, um, and it's called Scrolly.

Jens: So s you know, scroll e telling and it's like a long form website where the information is, uh, shown to you in like infographics, but like in a dynamic way. Um, oh, that's

Demetrios: awesome.

Jens: And so it, it was like a fan. Anyways, it's a really cool way to mess around with vibe coding, uh, if you're like looking to get something explained to you.

Demetrios: So you ask. So when you asked Claude to explain it to you in the form of scrolly telling

Jens: sort of, ~~it's like a, I mean, I, the, the general idea for the workflow would be to, you know, I mean, I could go into like a lot of detail, but, you know, take all the, uh, like~~ the way I I processed the information was I took all the, the YouTube videos.

Jens: You take those videos, download them, [00:06:00] uh, parse 'em, like take all the key frames, uh, using, you know, ff eg or something similar and you split that up and grab the audio transcript and then, you know, correlate together that context with the context from the images to kind of fill that in. And then, you know, gather in the other, uh, information from the comments and, you know, uh, have it do additional research there to like kind of fill that out.

Jens: And then once I have that, then it's kind of, uh. I have like a few different ideas, but basically just having a focus on breaking it out into like the way that you would think about it in terms of ~like ~semantic understanding of the concept. So knowledge graphs, mind maps or topic graphical kind of ordering of the, ~the ~information.

Jens: And like once you have it all split out into like that kind of a graph type network, it's very easy to then do stuff with that data, you know? Wow. And it's all about just kind of pre-processing it. So once you have processed it and [00:07:00] you have it in a way that you can now do something with, there's, you've taken out a lot of those steps of what you want to do elsewise.

Jens: So like your skill idea is something you could expand on along with any other idea you have, you know, once you have your data in a way that you can easily mess with it. So yeah, so that's kind of the main thing I always think about though is like. No May, and it's can be too much of a thing to focus on, but like how to make it into a process so you're not like repeating the work.

Jens: Um, and it's also when you're not, you know, depending on how you're paying for tokens and stuff like that too, then it becomes a whole other conversation for how

Demetrios: Yeah.

Jens: How much you wanna mess with this, you know?

Demetrios: Yeah. Especially if it's like six hour live streams, but 10 of them and Yeah. Nonstop. Oh. So

Jens: yeah, so I, I just ran all that like through my lo local setup that I have and, um

Demetrios: Oh, [00:08:00] nice.

Jens: Yeah,

Demetrios: I really, I like that you took it, uh, step further in, you're referencing the images and you're also referencing the YouTube comments. I didn't even think about that, because that can give you a much broader perspective, and it can show you where questions are being asked or what types of questions are being asked.

Demetrios: What's the discussion in the comments.

Jens: I'll, I, I've, throughout this whole time I've been kept saying, I'm gonna like publish something finally to GitHub or something like that. I'll, maybe, I'll try to publish my little YouTube video skill flow. I like that. So yeah, you just use it as a skill. I should be able to at least pull up the research document I have.

Jens: I mean, I have a few of them from the conference now, because I, that's the other thing is like, when do you stop? When do you stop saying like, I need to stop, you know, messing with this. I, I always go a step further and then I'll usually go the step further until it's not working anymore. Or I have 30 versions and I don't know which is the one I liked anymore, and then I just kind of forget I did it in the first place.

Demetrios: Yeah.

Jens: Um, [00:09:00] let's see,

Demetrios: but you, so you've created all this stuff. Is there stuff that jumps out at you as like what you've been able to, maybe it's not what you've been able to do, but just things that stuck with you that you now do differently because of the. Conference. And so for me, like,

Jens: oh yeah, I, so basically for me, the conference absolutely was about agents.

Jens: I mean, that was, that was one of the biggest takeaways for me. And ~~the,~~ the, ultimately the biggest takeaway for me was just from a personal standpoint, it, it's very hard in this kind of environment to really understand what it is you do or don't know. And, um, you know, you get a very, there's a very mixed community in terms of like, who's open to talking about using AI to replace ourselves or, you know, whatever other existential questions we want to ask.

Jens: But I just want to learn about it. I'm just excited. I love, it's a lot of fun. And so, yeah, that was my takeaway is the agents are, you know, the way [00:10:00] everything's moving. And so I've, you know, been talking to a few different companies and that's definitely one of the main focuses that they're looking for, is people that understand the concepts of.

Jens: Basically parallelizing work and everything's moving towards orchestration and evaluation of the agents and how to, you know, gather those metrics. And it's basically like all the things that everyone would kind of, you know, eventually want to lead to with the MLOps side of things. But it's understandably a bit complex because it's not like a, you know, extremely qualitative type or quantitative type, uh, measurement you can take.

Jens: So,

Demetrios: dude, actually something that came up, I remember you mentioning, you were talking to my buddy Sharam from Cleric, and he said to you like, Hey, it's really cool that you are here learning about this because they're trying to hire, right? And they wanna hire people that are [00:11:00] thinking about this and being at these types of events just because you're already attacking the problem from a way that they, it's like they philosophy on how they think work should be done too.

Demetrios: So I thought that was, that like, was something that I probably won't forget. And because I didn't realize that it was so hard, and this is naive, I, I know people in Slack have been like, dude, we're in a bubble right now. You think that people are using coding agents, but really they're not, like, half the folks don't even know what skills are.

Demetrios: Uh, and the fact that he was sitting there telling you like, yeah, I wanna hire my next hire as someone who is doing this on the weekends already thinking about like how they can paralyze subagents.

Jens: Yeah. And, and I don't like to think of it just to kind of not focus on the weekends aspect of like, you know, looking for people that are spending like a bunch of extra time.

Jens: It's just more the, the interest. And like I, I did fly down there legitimately because I, I've had a, I [00:12:00] love going to conferences. I love learning and. Uh, I've been watching a lot of stuff on YouTube, and at some point I was like, okay, I gotta make sure that, like, I wasn't even sure if you were real. Like, for all I knew you were gonna be like an AI generated, you know?

Jens: I don't know. Like, I didn't know what I was getting myself into either. I'm flying to San Francisco to meet strangers at, uh, an online event that no one I knew of was going to, or, and I, I mentioned it to one person. They're like, what? So, yeah, it all worked out, but, uh, you know, we don't know what's real sometimes

Demetrios: still.

Demetrios: Uh, we spent the whole week together and the jury's still, I,

Jens: I'm like, I'm like 99, 90 5% sure. Uh, you know, I That is hilarious. Yeah, I mean, we, I, I, I saw a, a robot made my coffee and drove me to my hotel down there, so I don't, I don't know.

Demetrios: We did take a Waymo. We took your first Waymo and we found out that if you say Waymo dangerously skip position or permissions and go on [00:13:00] YOLO mode.

Demetrios: It does some wild things in traffic 'cause of those microphones are always on apparently, and you can guide it and, uh, yeah, it was doing some weird stuff when we were in that car. But anyway, ~back to some things that I, you make sure you grab your ~

Jens: ~mic outta that car too. ~

Demetrios: ~~Oh yeah. We gotta, you gotta prompt inject those Waymo's.~~

Demetrios: ~~That would be, um, that would be fun. But the thing that, I think~~ there was a few things. One was when Sid from Anthropic was talking about, uh, one of his favorite agents, sub architectures or, or agent architectures was the adversarial agent to like always just kind of probe the code that's getting submitted and just be overlooking it and, and be like the bad guy.

Demetrios: I thought that was cool. I never use a subagent or an agent like that. And so now, uh, I wanna think about like how I can. Continuously have those are adversarial agents spun up.

Jens: Well, so, you know, one of the interesting things that I kind of go, like agents being the overall kind of like [00:14:00] thing to talk about, it's what is an agent.

Jens: I mean, that would be kind of the question that it seems like such a simple question, but if I was to go into like one of those no stupid questions type rooms, like that would be my question. I, because I want to hear what people's opinions are because it, it's, it, it's, it's a very broad concept. And like I was asked that question like, uh, it was kinda like one of those things I was asked during, not like during an interview, but just during a conversation it's like, uh, what agents do you use?

Jens: And, uh, I was just like, all of them, I, I I, I, I didn't know, like I, I didn't know the right answer. 'cause it's like, yeah, I use Claude, I've used Gemini, I use Codex, I've used the ones on the CLI and all that things, but like. I think really though the, the two concepts are like, you either have an agent that is like not in, not one that you're talking to, or you have one that's running either all the time or it gets invoked, but you either have one you're like working with or otherwise.

Jens: I mean, there's more than [00:15:00] two probably paradigms there, but you know, it's like transferring these things from agent to agent or from Claude to, you know, to the other one. It's like all the concepts make sense, but the actual, you know, mechanism to invoke that capability varies drastically. And that's, you know, part of the harness.

Jens: So like with Claude, yeah, especially like as time went on, it became easier or harder or different to even invoke skills. Like I think when skills first came out there was a lot of issues with people having, you know, to like issues with like invoking them and things like that. And uh, but like, I, I think it's also what was really helpful going to the conference though, is also bridging this gap between people that are now very familiar with like.

Jens: Coding from the aspect of vibe coding and, you know, vibe coding can be a, maybe a taboo or tenuous topic, but like, I think of it as just like using AI to AI assisted development. So like you, for example, you, you don't have a development [00:16:00] background. I totally thought you were a developer. Like the way you, you, you know, you've, you've using, you're using Claude, you're creating apps like, you know, that's just a much more common thing.

Jens: It's becoming like a common language, but it gets confusing when new people start using your common language, but maybe they don't have the same technical background that you're expecting because of the other vocabulary that you're using. And, yeah, so like talking about subagents and that thing's like, it was kind of hard for me to understand how the conversation like has developed over time because that was like the very first thing I did when I got access to like chatt, BT was like, okay, how do I put this into my command line?

Jens: Like. I don't wanna copy paste. And then it was like, once I got that, it's like, okay, well I don't want to do that either now I want it to like automate my browser to do this. And it's like, okay, well, you know, if I'm gonna do this amount of work and this is my context window, well then I need to have one agent do this task and the other agent do this task.

Jens: And then using just standard programming, you [00:17:00] know, of software development practices, you fan out, fan in, you know, as a kind of a common type of workflow. Um, and so if you've spent time building like, you know, like AWS has a, a service called like AWS step functions and that's like a great example. Or just N eight N would probably be an even better example of just like orchestrating a workflow, um, and a step builder type thing.

Jens: And then, ~~you know,~~ depending on what the task it is, you know, you can parallelize it. ~~So it'd be like if you were gonna, sorry, I could dive in like really deep into there, but, you know, so in the same way with, uh, like there, there's two aspects.~~ So when you're talking about agents and the sub-agents, because it's.

Jens: One, it's the idea of using it. Like, yes, I would want to parallelize my work. And, but the other thing is, no, you don't always want to parallelize your work. And two, even if you want to, now you're kind of maybe struggling with the actual usage of this harness that you're using, which differs between vendors.

Jens: So you have like those two layers. And I, I can't remember where I tangent it off here, but you were talking about subagents and Claude and like, I was just like, yeah, it's, it's great to, to talk about, but [00:18:00] it's, you know, sometimes it's as simple as just saying like, use a subagent to do this. And so, and what was your discovery, I guess there, I guess coming from like a non-coding background, like,

Demetrios: well, I think there was some cool things, uh, happening with the subagents or how folks are approaching subagents and the um.

Demetrios: I think I, I mistakenly the, probably the reason you went off on that tangent is 'cause I was mistakenly saying the adversarial subagent, when really I think it's a whole different flow. It's like folks use different agents for code or PRS than they do to create the code.

Jens: Yeah.

Demetrios: And so it's like the adversarial agent is just another agent that will be in that workflow, checking the code as it's being created to try and poke holes in it.

Demetrios: Uh, so whether that's a subagent or not, I'm not sure if you wanna architect it that way. But the, uh, other thing when, since we are on the topic of subagents [00:19:00] that. Feels pretty cool is both Rob from Brumi, he was there and he's creating Brumi. That is all about subagents, right? That's, or how to make using subagents easier, depending irrespective of which harness you're using or which LLMs you're using.

Demetrios: And then the other piece is the warp talk from Zach, I thought was super cool too, because Warp has all of those ways that they made it really easy to use, uh, subagents, but kickoff, like subagents in the cloud, or maybe they're not called subagents. It would be like paralyzed agents and

Jens: Yeah, I mean, I,

Demetrios: out thing,

Jens: yeah, like this is also where the terminology can become kind of a communication barrier too, because you know, what is an agent, what is a subagent?

Jens: And so for, for the subagent part, I think subagent would be, basically you have. [00:20:00] It's a sub-process basically. So you have the main agent and it's basically that same instruction set with maybe a different prompt, but new context window that reports back. And then any other agent you have is kind of where the landscape is now.

Jens: And that would be one, one word I have for a type of agent would be a co-agent. That's what I call my agents that talk locally to each other.

Demetrios: Uh, yeah, I see that. Yeah. Then the, the stuff

Jens: because, and then, so like the, so the, the agent you were talking about with the, uh, and so the, the term, uh, and that, uh, for the adversarial agent, um, I kind of think of those as just like ephemeral agents.

Jens: So they may or may not have state, but they are basically invoked as part of a. A, a trigger. And they are basically, you know, it, it, it's kind of like LLM as a judge kind of thing, where you're having another agent [00:21:00] review another agent's work and then provide the feedback for, you know, further, further parts in the loop.

Jens: Right?

Demetrios: Yeah. Yeah, that makes a lot of sense. In different steps you would want to invoke it and you would wanna see just to like stress test things and the prompt should say, be as brutal as Lenis Augh. Yeah. On my throat.

Jens: Well, I, I, I thought that doing the, the eval stuff that they were doing in the talks, I mean, that was the, the really interesting part is seeing how to, actually, that's still kind of a black box though, is, uh, evaluating sometimes your own feelings of, you know, using like these things in the, in the terminal.

Jens: But, um.

Demetrios: Are you talking about like Jess's talk ~for? ~

Jens: ~Um, I, I need to pull up the list of talks to be honest, because it's like, yeah, it's all in my head. It's like one blob of uh, ~

Demetrios: ~~well,~~ liked Jess's kind of evals that she did. It was a little different. It wasn't necessarily on the coding agent side of things.

Demetrios: It was on vector search versus gre. Do you remember that? And she talked about how like, yeah, greep [00:22:00] does better in our test. But then she gave a huge caveat with like, but I only ran one test and I'm not like a vector search or embeddings expert. And so like, there's a lot of things that probably could be better and blah, blah, blah.

Demetrios: And after that talk, we had the break and I saw her and the quadrant guys. And the quadrant guys, obviously they're a Vector database, so they were like, well, wait a minute. What embedding model did you use? How much compute did you give this vector search that you got? We need to talk about things. The TLDR of that to me, felt like vector search is a much more powerful option.

Demetrios: You have more knobs to turn, so you can potentially get a way better solution. It just requires a little bit more time and effort.

Jens: Well, it's like the, the nuances there, again, it's so hard to, like, there's a lot of questions that they, they could ask. And I had a lot of questions too, and, [00:23:00] uh, because yeah, it does depend on the, the, the dimensions you use for the embedding model and the amount of context that gets returned back with it, and whether or not your question is even necessary for it.

Jens: ~Uh, you know, a, the, the main point of that talk right, was, uh, efficient. Like basically, uh, what do you call it, optimi, optimal token usage for basically doing code edits, right? ~

Demetrios: ~Yeah. ~

Jens: That, that was the primary, yeah, that was like, like rather than like, um, I'm trying to remember though, was that eval the one that was, uh, for doing code edits or was it one where it needs to like explain something about how the code works as part of the functionality?

Jens: ~No, it ~

Demetrios: ~code edits. I think ~

Jens: ~~I either way, you know, the, what you're going to do,~~ you know, there's always gonna be multiple ways to do it. And so doing the eval thing, I mean, there's a lot of ways that you could even consider that eval because, uh, are you concerned, you know, there's the token cost, there's the length of time it took, there's whether or not it was accurate the first time or the second time, or, you know, I mean, are we, like, you could also eval just based on like the, you know, the cost of the operation.

Jens: You know, is it cheaper to, to sign this work to, yeah. I, I don't know. There's

Demetrios: so, yeah, there's so many pieces to it, right? And yeah. Uh, yeah. And how much compute do you need? How much time do you need and all of

Jens: that. Yeah. This, and, and so like with the, you know, with the vector [00:24:00] search, ~uh, it, you can over, you know, ~you can always throw in a bunch of resources at a problem and solve it, but you know, it, it, some point you gotta realize how much resources you're throwing at it.

Jens: So it's, you know, using a vector search for doing code edits. I'm not sure. It's always gonna be a relevant thing to leverage, right? Because you might not necessarily need the context of everything that you would leverage from doing that. You would probably be much better off, um, with that, but also, so with the way that they were doing the code edit with her eval, I think because there's, uh, there's, you can use like an a ST uh, parser and then there's another, I, I can't remember the solution that, uh, they were using.

Jens: But there, I, I've seen this done a few times. What, and so, which there's a tool called Serena, MCP, and this is a very similar type of thing where it, uh, uses this like a code sandbox to make code edits, but it uses the language server to, uh, like LSP to facilitate the code edits. But it's like that same exact thing that she was doing her [00:25:00] evals on.

Jens: Uh, Serena is a similar tool and, uh, it's. It is fun to mess with. And uh, I've seen another MCP tool, uh, too out there where it like reduces the token usage, but you might start introducing like the hallucinations and that's where the evals can get complicated. And so

Demetrios: yeah, there's

Jens: no free

Demetrios: lunch.

Jens: Yeah.

Demetrios: You basically, you gain on one aspect, but you lose on another.

Demetrios: And that's the fun part about figuring it out and yeah, having folks come and say, Hey, here's what I learned in my little mini tests and my benchmarks. Uh, what have you seen or what do you know? So I think that that's a great one You were gonna talk about though, which one was, um, Ealing on the Vibes? Did you want to hit on that or, I've got a few other things that stood out to me.

Demetrios: I think there was a few cool hot takes that I like and that have stayed with me since. [00:26:00]

Jens: Yeah. So my biggest takeaway though from that was. I mean, as we were talking about with, with the agents and doing the evaluations and there's, it's just not, there's no one way to do this. No one has a solution. There are multiple people kind of doing similar things, uh, as you, you've seen with like open claw and the nano claw things like, not that those were talked about the conference, but yeah, those ideas are very simple in practice and our billion dollar ideas.

Jens: So it's, uh, and you know, that idea is just an agent in a loop, and that's what a lot of this is talking about, you know, and, and hooking them up to, to tools and things. And it's obviously more complicated than that. And the real problem comes when you want to have this, you know, outside of your own system.

Jens: Uh, I, I think one of the things I was really surprised about though is that there was very little discussion at all about, um, like the security aspects or like the, um. The major concerns. Yeah. Maybe the last there

Demetrios: talk with Sam or, [00:27:00]

Jens: yeah, friends, like some, um, I mean, I, my, I I, you could have various reasons to think why it's not discussed more.

Jens: And I guess one of 'em could be is the more, like as soon as Claw Bot came out, you know, there's more people with access now to these autonomous things and there's more problems that happen. People had API keys exposed and things like that. Um, but I just remember seeing in like December, this talk from uh, 3 9, 3 C or some security conference thing, and it just showed, uh, using the, uh, cloud code agent, uh, with like a simple MCP web server and just to use prompt injection to have it click a link to install a rat, to then get remote access to the machine.

Jens: And he showed it using like five different prompt injection techniques to have it do it. And it was like. Every time, like even in like the times where it like gave him like three, you know, responses, you know, it's not blah, blah, blah. He got it to just, you know, do it. And then once all it has to do is like, visit that page and then [00:28:00] boom, it, it gets infected and then you get full remote access and that's for like a full rat.

Jens: I'm sure depending on the exploit that you're, you know, trying to exploit, you don't even need that much, you know, like, so anyways, it's, it's very trivial and um, you know, the, a lot of this is now kind of more focusing on container containerizing stuff, these run times, but also, yeah, it's just that side of things.

Jens: That's where, you know, a lot of the e like the eval side can touch on that more. Um, I know that, and I'm sure we'll see more of that come up more in these talks though. So I, you know, I'm sure as more of these as time goes on and more people are starting to follow the SRE patterns for. Or the advers adversarial agents to be in their orchestration loop to help, kind of like keeping in the guardrails.

Jens: Uh, yeah, I just thought that the, the fact that there [00:29:00] wasn't a lot of security mentioned was, was interesting. Uh, but also one, one kind of question I still have that I haven't really seen discussed is I can't imagine that people aren't using standard like, best practices for data exfiltration and, um, you know, basically just like DevOps or, uh, um, security operations, you know, with your own like infrastructure, but, you know, such as like, uh, getting your, um, having an agent like visit a page that you don't want it to or something like that.

Jens: You know, ultimately all this stuff can get filtered by API calls and, you know mm-hmm. Or your own proxies that you have in your own infrastructure. So it's like this. Everything can be a walled garden, like if you want it to be. And, uh, I would imagine as more people understand the virtualization of these environments, the more it's gonna move naturally toward that way.

Jens: And there's [00:30:00] gonna be less people installing these with root access to the machine, and there's gonna be a lot more kind of proper facilitation of tool calling and standardization in that area.

Demetrios: Yeah. We didn't have anybody from a sandbox company give a talk, and I would've liked to have seen a little bit more of that because that's kind of, well,

Jens: wasn't, um, uh, what was the name of that company?

Jens: What's an m Uh, the, the people that were building the file system for agents Mesa, they Oh yeah. This was a, I, I talked to these guys. This was a really cool thing. Um, I, I don't know a lot about Juujitsu, uh, which is, uh, called jj, but it's a, it's a gi, it's not a GI. Is it a fork of git? It's a super set of gi I know it's at least that it fully supports Git and then it does a bunch of other things that I don't understand.

Jens: Um, but it allows the capabilities to build these kind of, um, the, the, that's how that, the, the blame feature kind of works. Uh, yeah, [00:31:00] that I, so, and, and this was also one of these ideas that, like, when I, I would love to understand why, like just simple GPG keys and signed commits and kind of following that workflow.

Jens: Isn't standard for identity with these agents. I mean, uh, I'm not doing this now, but the idea I would have is that I have one SSH key or one, you know, private key. And then I would use, uh, sub keys to delegate out my permissions to agents so that like they're authoritative on my behalf, but I can still like track my evals or my metrics and stuff to individual agents or the blame thing here.

Jens: Um, but even on the, the blame thing for the agents, it's, it's, it's still an interesting thing. 'cause as you think about subagents and you think about what is an agent and the agent identity, it can get almost like very existential because at what point have you defined, you know, the agent? Is the agent the prompt?

Jens: Is it the instructions? Is [00:32:00] it the member? I mean, it's all of it.

Demetrios: Yeah.

Jens: It's the harness, it's the day, it's the time, it's, you know,

Demetrios: so then how, yeah, how you look at that for that like. Agent blame and what it is and what you can take from it is, is

Jens: a, yeah. So that, that would be a question I would've loved to ask though is kind of where do you draw the line in terms of the traceability for, for that?

Jens: Because you know, as I've explored a little bit with the multi-agent architectures and things like that, um, I've learned what I have found works best is having essentially less direct sessions, but more quality direct sessions that then orchestrate out from there. And that's kind of the idea I play with.

Jens: Kind of like, I'll use one harness to communicate with other harnesses, but then I'll do that multiple times for different work sets.

Demetrios: Oh,

Jens: fascinating. And [00:33:00] that, but as far as then. Tying that all together with the agent. Like I'm kind of building, like I have an idea in my head for kind of how to standardize on the definition of an agent and the question of like, what is an agent?

Jens: And that's kind of where I was going with my, the blog post I was gonna write is I started kind of looking into this and then I just started doing a lot and now it's Monday. So that's how it goes.

Demetrios: Well, you were gonna, you were going on a little bit of a like different route before I threw you on this tangent of how you are doing things and, and I know you have a cool little setup with your open claw in this walled garden, right?

Demetrios: That was, I want you to explain a little bit more about how you see things with the walled garden and the proxies to help on that security side of things.

Jens: And that would be a good way to kind of tie this back though as well to how the con like. Going to that [00:34:00] conference helped relate a lot of the industry things, people you can do to what I can do at home.

Jens: And so yeah, at home I definitely do. I probably have a much. More complicated setup than is even necessary, but it's just kind of my own playground to mess around with. But like I, I haven't, I, I've used Open Claw a little bit. Not, not much, not for any reason other than I have plenty of other things that I'm already messing with.

Jens: I don't need a 50th tool, like, uh, trying to follow all the latest tools and stuff is, you'll go insane. Uh, but Agent Zero, that's one I highly recommend looking at. Um, it's on GitHub, um, open source project, but it runs in Docker. It's a really cool concept. It also uses kind of like a self-learning prompt system and I mean, they've been working on it I think since May of last year.

Jens: I'm not associated with a project, but it's essentially an open claw type thing that you can run. And that's not like my primary driver, but it is a fantastic environment to experiment with. 'cause it's really easy to [00:35:00] set up like agents or skills or actual skills or actual agents and, and mess around with it in an interface and stuff.

Jens: Um. But it kind of brought, so like the way that I am currently working with like my open claw development setup is like, I have a bunch of team up sessions and terminals on multiple machines, and they're all connected essentially through a self-hosted matrix chat server that I, I host. So I have like a, I have a Prox box box, which is a, like a prox box is basically an OS more or less for containerizing containers.

Jens: It's like a hypervisor uhhuh. And so I have like my own GI instance, my own like discord kind of type thing. It's called matrix. It's more, it's a lot more in that and then, you know, all these other tooling. So like my whole pipeline and everything I experiment with happens all locally. Um, I messed around a lot early on and put some secrets on GitHub real quick and shut that [00:36:00] down.

Jens: Um, and actually yesterday for the first time, uh, my system opened up a PR for me that I did not ask it to. Oh, wow. I made a mistake. Yeah, it opened up a, a PR on GitHub against, uh, uh, cogni, um, one of these, um, memory layer packages I've been looking at. And, uh, it was a very minor PR that it made and it was just like an unnecessary one.

Jens: But it's like, wow, that's, I didn't think that would happen with the way I've been doing this, but it did, it, it can happen. So

Demetrios: it got out of the

Jens: box. I mean, that, that's like the biggest, I mean, I leaked sec some security keys that didn't matter at all. 'cause like all the stuff, um, I mean, I highly recommend people look into some basic ways to secure your stuff and like, go pass is a great thing to use, for example.

Jens: Um, but. Uh, that was like the first thing that happened that like, I really didn't want it to happen kind of thing. Like any of the, the keys I leaked last year would've just been like whatever. It's gonna from some free [00:37:00] account. But yeah. Anyways, uh, it's just such an interesting space.

Demetrios: Yeah. Did you see so many of the open source devs are in a tough spot right now because they're just getting inundated with slop prs, and you can imagine you didn't even want yours to do that.

Demetrios: Imagine somebody that just pointed it at a certain repo and was like, let's go help this maintainer out and edit all of the code and then submit a bunch of crap prs. Was that one of the, I someone just had a blog post about that recently, I think for, uh, yeah, a few people have been talking about it. Yeah.

Demetrios: And now they're having to put like disclaimers in the different projects saying we don't. Do PRS anymore, we only do issues or things to that nature because yeah, everyone's trying to figure it out in real time.

Jens: Yeah. No, it's, I can imagine how crazy it is. Uh, and I, I, I saw the same thing. Like I tried to make a legitimate, you know, pull request to a repo in the last, you [00:38:00] know, few months.

Jens: And, you know, I immediately got a resp, like I spent a lot of time on it before I, I actually submit it. So like when I submit it, I'm ready for someone to actually review it, and then I get, you know, an immediate response from chat GPT bot or something like that, you know, with like, feedback on the, on the pr.

Jens: And it's like, I don't know what to do at this point. Like, do I respond to the bot? Do I go make edits that I'm not confident I need to make? Do I go, I try to bother the maintainers and I, I, I just, I just left it. So,

Demetrios: yeah,

Jens: I, I just, I, I just stopped. I'm like, I'm sure this problem won't even be a problem in a week because everyone's deploying, you know, three releases a day.

Jens: So, and that, that was true though, like my PR became. Irrelevant essentially a month later because like that package I was fixing is now a different version or a different package. Wow. Like these, these things change all the time. So what I, you know, what I have actually found, and what I started doing like last year as well, was, uh, I, I [00:39:00] just, you know, pull the source code for a package and I use that locally rather than like constantly going and fetching it online.

Jens: And like, I, I found it easier to generally just essentially like fork an idea. That's kind of what I consider it, uh, than like taking on a dependency that I can't trust.

Demetrios: Yeah. The dependencies can be weird there. That's,

Jens: that's, it's such a dangerous thing and it, I, on the side of security, you know, I, I was a little surprised that I, I shouldn't say surprised, but I, I'm just very hesitant when I hear these.

Jens: People talking about, you know, access to 5,000 skills or 6,000 MCP servers and, you know, all these tools and things like that. And it's like, I don't want that, I don't want access to 6,000 tools. I didn't write like, you know, I'll, I'll take some dependencies on things I trust. But, uh, yeah, I mean, if you've ever worked with JavaScript, like you gotta, you can't just start throwing things in there.

Jens: I mean, if, if every developer, if like, if you started accepting pull requests every [00:40:00] time a developer wanted to add a new dependency, like you, your package would just be a mess. You know, like at some point you gotta write your own, you know, the dependency is not worth taking on because some of these will have, they, they might not even, uh, intentionally be set up wrong, but they might have a bunch of like.

Jens: Code that's, uh, packaged with it when you just need like one small string or something like that, you know? Yeah. There's, anyways, you could keep going down that route, but that's where it's fun to use the things that other people write, but you also ha having, you gotta have, you gotta read some stuff sometimes that, that was, you know, I think Dex Dexus top was very interesting, uh, with how he talked about changing his stance on reading the code versus reading the, uh, the, the PRD or

Demetrios: uh,

Jens: uh, and

Demetrios: what was the gist of that?

Demetrios: Because I remember somebody giving, asking the question like, Hey, six months ago you told us not to read the code. Now you're telling us to read the code. What's going on here?

Jens: Yeah. And I mean, he's just talking about the kind of, [00:41:00] you know, not like the de facto way to vibe code, but in general when vibe coding, uh, he, I, I believe he had a or previously said, um.

Jens: Don't bo like just as long as you have a solid doc, like there, there's a period of time, you know, every few months kind of things shift. And there was a period of time where, uh, SPECT driven development I think was like the buzzword. And that's when, and it's a very real thing, but it's like you basically, if your specification is so detailed that there's no possible way for the agent to like derail from that plan.

Jens: There's no reason to essentially read the code because your understanding of the code is less important than the specification that you put in place that it has now adhered to

Demetrios: wait. And so that was what he walked back was the idea of, hey, if you have the spec driven development and you specified everything so clearly, you no longer need to look at the code.

Demetrios: [00:42:00] He actually came back in this talk, right? And he said, ah, maybe I was wrong. And we, we do need to look at the code.

Jens: Yeah. And to. Like the, the specification part can still, you know, if you just have one typo somewhere in that specification, basically it can derail the entire project or derail the entire specification, you know, if you then read that one little bit.

Jens: And so, uh, the other, the other aspect though of that talk was that at the time the model was different. I think Opus 4.5 had just come out and of course, so as you, you know, they had one workflow in place for how things were working really well. And then when the model updates, it becomes like, either not necessary or doesn't work as well.

Jens: Now you gotta like, rethink how you're doing it. And so they like rewrote all their prompts. I, I believe to like, kind of change how they, uh, were doing that. But I, I think either way, uh, you know, trying to. You gotta read both. You got, you have to read what matters and, uh, you can [00:43:00] throw a lot of planning at a simple solution and get it right.

Jens: You can also throw a little planning that is well thought out. Uh, one of the, oh, what I was gonna say is one of the things he, he mentioned, not during the talk though, but uh, he has previously mentioned, is this thing about learning tests. And, uh, that is a fantastic idea that people should consider. I haven't heard of the term learning test in this way, but it makes perfect sense.

Jens: But a learning test is essentially, um, like the result that you would, uh, get out of doing a spike for a body of work in like, agile, agile development. So like if I was going to, uh, let's just say, uh, create a web app to authenticate with, with Stripe. My first learning test might be that I can create an agent that calls, um, the Stripe API through an MCP call and gets my account id.

Jens: Let's just say that's a learning test. So I set up a spike [00:44:00] because my development's gonna require this workflow that works, that learning that, that, that thing I did to get there. That spike is now a learning test, which is essentially like a regression test. And that regression test can either be semantic or codified or whatever, however you can do it reliably.

Jens: But now you kind of have a regression test on the, your PRD on your, your whole idea and that that overall is gonna basically give you, because it's, you know, overall this idea of like, test-driven development is always gonna be a great idea, but you don't always know what it is you're going to test or how to test it or how to give the agent access to test it.

Jens: And I think thinking of it as a learning test is a much easier way to frame it.

Demetrios: But are you. You're specifying the learning test in the specs, or maybe

Jens: you're, you're specifying Yeah. As part of your, like, requirements basically. Um, yeah. But I, I would assume, or maybe whether or not he mentioned this explicitly, but I, the logical next step is that you [00:45:00] codify it to the best ability you can.

Jens: So, you know, if it's, especially if it's a dynamic age agent, a action, you know, the only way you can, you can't have like a hard pass fail all the time. You could if you're, you know, leveraging one of the frameworks for like structured outputs. Um, but at any rate, it's however you're gonna ev you, you have to have an eval mechanism in place for your learning test.

Jens: And the learning test can either be deterministic or not. It really depends on what it is. Um, but that overall tied together whether or not you should read the code or the spec, like you should do a mix of all of it. But I think ultimately what, what the real thing is, is. You really want to make sure it's doing what you want it to do.

Jens: And, and the really hard thing to do if you're not a, like if you haven't been coding a lot, is knowing if it's doing it well or in a way that's not gonna like, and, and that's what people, you know, find as they start doing this. And I think a great analogy or whatever the word is for it, is, uh, you know, 90% of your time is spent doing 90% of the work, or 10% of your time is spent doing 90% of the work and 90% of your time is spent doing the last 10%.

Demetrios: Exactly.

Jens: So you can [00:46:00] always get a POC working, you can always get something out there, but as soon as you wanna start adding a feature or making it a little bit more advanced, depending on what you're doing and depending on how it was set up, it could get very complicated and very frustrating to, to debug, especially if you're not looking at the code.

Demetrios: That's literally me right now. Yeah. With this little remotion app that I created for in podcast shorts. But that's a whole nother story.

Jens: Well, I, it's, it's so relevant though, because you can go down these, these, like, even if you know what I, I've seen, I, I've spent like hours talking to these coding agents for something.

Jens: I'm finally just go like, all right, I'll, I'll go. Like, I'm, maybe I'm on my phone, I can't figure it. I get home and I, you go look, and it's like the, the file had a wrong permission or it was just like something so trivial, but because of whatever was in the context window or whatever's going on, it just.

Jens: It didn't matter what you were gonna say, it was not gonna get there anytime soon.

Demetrios: Yeah. And then you're banging your head against the wall. Yeah. And [00:47:00] it's so frustrating sometimes when you can't get that. ~Yeah. Uh, ~

Jens: ~~well, like, like one example was, I didn't realize it was trying to like save, uh, one of my, uh, chat exports, uh, that like, it, it just, it's broken in.~~

Jens: Um, in chat GPT, uh, this, this thing I was trying to export. And so it just kept trying. It was determined to export this thing, and all I had to do is go look at, oh, it's broken to say, Hey, you can skip it. And it was done. That was like the last thing. And I, I, I was just like, why is this not working? And I finally just like, all right, I gotta go check this out.

Jens: ~I was like, oh, that's okay. ~

Demetrios: ~~Yeah. One thing that is fascinating though is that you are,~~ I think there's a, a few fun things that came out of this with the fact that you're spending a lot. Less time actually doing that. Like really dropping into the code base and figuring out why something is not working.

Demetrios: So whereas before you would be, if you're writing it all by hand, you're doing everything by hand and you have to have that understanding. And so maybe you can debug easier, but you're velocity might be a little bit slower, right? But now your velocity's much faster and every once in a while you do have to drop in and figure out what the hell is going on here because it's just not working so.

Demetrios: So there's this kind of like new way to think about it.

Jens: There's that aspect, and then it's also about whether or not you actually have to, if someone else has to read this code ever and if it's actually gonna be used in production too, you know?

Demetrios: Yeah. [00:48:00] Well that goes on to the idea of code reviews. I know there was a lot of conversations around that because the simple narrative, right, that we've all been hearing so much is, oh, well now the pr, like reviewing the code is the bottleneck.

Demetrios: It's not creating the code. And I think that to me is a very simplistic angle on how it's done. Yes, you generate a lot more code, but as we know, you can have agents go and look at that code, or they can be adversarial and they can try and find problems. You can have security scans already. There's lots of things you can do before you have a human look at it.

Demetrios: One thing that I thought was an interesting piece is really guarding the human's time on what they look at. And so Leo was telling me, Leo, from the MLOps community, local chapter, he was talking to me about how he heard a few people mention. You only want to [00:49:00] escalate different parts of the code to someone else when you are not sure of it, but you only wanna be really scoping down the, the parts that you're not sure of or that are really like you're shaky on.

Demetrios: And so you own the code that your agent produces and you are the first line of defense. You have to be, yeah, very confident that if you are going to submit that, that you've checked it over, you're not going to just outsource the checks to someone else. Right? 'cause that's a little bit rude. But then if there is something that looks a little fishy in what you've created, only that very scoped down piece of code is what you are asking for a sanity check on.

Jens: Yeah, that's, it was the kind of like assigned lanes for. Ownership of how that would work with the agent produced code. Yeah. And uh, and yeah, and ultimately they were responsible for what it [00:50:00] produced. And that was kind of the, the, that's a good kind of, that was a very good kind of insight into how things are working at places that are using these in production and how the ownership works is, you know, who, who's ultimately like the PR that I opened yesterday.

Jens: I mean, you could, if you want to think about it, you know, more you like who's responsible for that happening. I mean, ultimately it's me obviously, 'cause it opened up on my behalf, but it, it could have been something where despite my best, no matter what I did, it could be have been a bug in the software or something like that, where it's like, I'm gonna ignore your, you know, I'm gonna ignore the permissions you set.

Jens: And I mean, they, they do that in anyways. And people don't realize, like, you set like deny bash, so it's not gonna execute anything. Well then it just uses a different tool. To, to do the same operation.

Demetrios: Yeah.

Jens: Uh,

Demetrios: that was, I think that was Scott's, uh, from Kilo Code's talk, and he had a few hot takes. I, uh, especially in the end of his talk where he was [00:51:00] saying that they have one engineer for one feature, and so one person is responsible for one feature, which is totally different than how normally I think a lot of us think about it, where it's like you need a team and all of that fun stuff.

Demetrios: The two pizza team,

Jens: I was gonna say, uh, Sharam had mentioned the same thing at Cleric that, uh, each, each dev owns one feature. I was, yeah,

Demetrios: yeah. No. And so that potentially is a new paradigm on how folks are looking at it. But then you had Scott saying that they don't have PMs or they don't believe in PMs at Kilo Code, which I, I know, got a lot of ruffles from the crowd.

Demetrios: It was like, oh, what? No PMs shots fired and. I thought that was a fascinating piece too, because you're like, okay, well a PM and an engineer, they're kind of meeting in the middle, and so you should be expected to do a little bit of the other person's [00:52:00] job no matter where you sit.

Jens: Yeah. You know, with, with this, it's, it's just, it's fun seeing, A lot of it is just people experimenting with

Demetrios: Yeah.

Jens: Different ways to try things out and, you know, even before ai, not every team needs a TPM or a PM or a dev lead or, you know, it depends on what you're working on and what the scope is, is needed, and, you know, ultimately it's gonna come back full circle. Like best practices are gonna kind of get, you know, and, and that's what MLOps is kind of trying to achieve here though, is some level of.

Jens: Sanity in a consistent way to communicate and to solve problems. And that's ultimately what people are trying to do. And that's, that's one of the reasons I, I went there though, is I really needed to understand. What it is I do and don't understand because, uh, if, if you just spend your time getting gaslit by ai, you don't really know necessarily or you wrong Twitter.

Jens: I'm joking. But like, [00:53:00] uh, you know, you you, if you're just going in, if you're, if you're just throwing in a bunch of tokens that you're creating and you're just giving yourself a feedback loop of yourself, um, you know, you could go one way. So, like I said, I didn't know if you were gonna be real. You were real.

Jens: And the content seemed, seemed legitimate and people seemed to, to communicate with me and I didn't, you know, so it was definitely very, and I'm coming to your learning, learning experience. Yeah, you are. That'll be fun in Seattle.

Demetrios: I know, man. I'm invading Seattle for our next event. We're doing another agent's event there where you're at.

Demetrios: And that's pretty cool to get to hang out with you again. I'm gonna crash your team dinner.

Jens: Yeah, my team.

Demetrios: Your team.

Jens: Team Bodo.

Demetrios: Exactly.

Jens: Yeah.

Demetrios: Uh, there's, there's a few other quick things that I wanted to mention before we end it because there was a lot of good learnings from it and also just like cool pieces.

Demetrios: Uh, the guy [00:54:00] that created the superpowers pack, like skills pack, he came, his name's Jesse and I love those skills. I've played with them a lot. They've helped me a ton, especially the brainstorming skill. It helps before I do anything, it will just go back and forth with me and really like try to dissect why and what I'm trying to create.

Demetrios: And Jesse told me, oh yeah, the way that I created that skill was because I just channeled my days of when I used to have to. Be the boss of a lot of different interns. And so I found myself asking the interns constantly all of these questions, and I tried to bottle that up into a skill and that's how brainstorming got created.

Demetrios: So I thought that that

Jens: was fun. That's a, and what did you, what was your kind of like mechanism for iterating on the skill?

Demetrios: Well, [00:55:00] I just used that skill whenever, before I do anything, like before I create a PRD before I sit down and even like I have an idea of a feature or a product or whatever it may be that I'm trying to create, I'll use that skill because it is much better than just cloud code at getting to the root of what I'm trying to do and then creating some kind of a PRD from it.

Jens: Gotcha. Ha. Have you, uh, how much time have you spent like building skills yourself?

Demetrios: Not a lot to, uh, and I know that's, uh, that is something cool, like that's one road I want to go down, but I've, I outsourced that to Claude Ception. Yeah. And I let Claude Ception do it, and I just kind of like roll as it goes.

Jens: Yeah.

Demetrios: Um, I'm, and the other road that I really want to go down to is hooks. So actually this Friday we're doing one of the like coding agent lunch and learns that we have been having every Friday. [00:56:00] And. I want to just have the whole session be on hooks because I feel like I'm not using hooks as well as I could be.

Demetrios: I know there's a lot of really novel ways that folks are using hooks to just get better results from their sessions.

Jens: Well, and that's, uh, using Hooks is absolutely one of the most deterministic ways you can get things in place in your pipeline. I mean, I use them in numerous ways. I, uh, one of the most fun ways.

Jens: Um, so using hooks is definitely something you want to use as part of your evals. Basically, you know, that's one of your easiest mechanisms to log data, um, at set points as well as to do a lot of other things. But doing so, like, I'm trying to think of, uh, one of the things. I can't think. Um, oh, uh, that's right.

Jens: So [00:57:00] basically like, because I have all my agents kind of hooked up to this chat room thing, um, I found that, uh, I didn't have like a way, so I, I might send a, a message and then I don't always know if they're doing something and there's different ways I could set up monitoring that agent and things like that.

Jens: But I just set up a simple thing that when they, before they start work, it just sends a message to the room that it finished, and then I use that finish. So, and then also I use that, that start and then the end hook to log a metric for the amount of time it took to spend on that task. Nice. And then you can also save the prompt that you use the tool calls and you can do all, you know, toss all that stuff in.

Jens: Uh, there's a tool called Prompt Fu. Um, yeah, I think actually OpenAI bought that now.

Demetrios: Yeah. They just got bought. That's true.

Jens: Yeah. So good for them. Uh, that's a cool thing to use. Uh, good way to, I, I haven't, I, I've messed with it a bit and that's what I've been using now to start ev evaluating, uh, what I've been messing with.

Jens: And I'm hoping to share something I [00:58:00] have with what I've, I'm doing. But, um, yeah, doing the evals,

Demetrios: buddy Paul did something similar to that, except every time that an agent was doing something, it would log it on their own calendar. Mm-hmm. So you could see when they were doing different tasks and how long it took them.

Demetrios: And then they had this record in calendar form. It was like this Google calendar of everything that it had done.

Jens: That's, yeah, that's very much like, so, uh, I, I feel like I could do like my own little separate whole thing on all the aspects I have with hooks and how this works there, but yeah, it's, uh, well come on Friday.

Demetrios: What's that?

Jens: Come teach me about hooks on Friday. That's what I need, dude. Yeah, no, I'll, uh, what time is it? I, well, I'll, I'll, I'll try to read my email. I gotta get better at that now that I have to like,

Demetrios: get those agents to read it.

Yeah,

Jens: but it's

Demetrios: at nine. It's at nine on Friday. Nine your time. 9:00 AM Okay.

Demetrios: Yeah, I'll [00:59:00] be there.

+ Read More

Watch More

Before Building AI Agents Watch This (Deep Agent Expertise)

Posted Sep 05, 2025 | Views 617

# Context Engineering

# Prosus Group

DeepSeek That, DeepSeek This : MLOps Reading Group

Posted Mar 06, 2025 | Views 141

# DeepSeek

# AI

# MLOps

AI Meets Memes: Taking ImgFlip's 'This Meme Does Not Exist' to the Next Level with a Large Language Model

Posted Jun 20, 2023 | Views 675

# LLM in Production

# ImgFlip

# Memes

# Genesiscloud.com

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Rungalileo.io