Sign in or Join the community to continue

The Dark Side of MCP Servers

Posted Jun 23, 2026 | Views 134

# MCP

# AI Agent Security

# Tool Poisoning

# Arcade.dev

Share

Speakers

Samuel Partee

CTO & Co-Founder @ Arcade AI

Sam Partee is the CTO and Co-Founder of Arcade AI. Previously a Principal Engineer leading the Applied AI team at Redis, Sam led the effort in creating the ecosystem around Redis as a vector database. He is a contributor to multiple OSS projects including Langchain, DeterminedAI, and Chapel amongst others. While at Cray/HPE he created the SmartSim AI framework and published research in applications of AI to climate models.

+ Read More

Nate Barbettini

Founding Engineer @ Arcade.dev

Engineer and leader focused on building products that customers love. Microsoft MVP alumni and scrappy problem-solver. I excel at communicating complex ideas to technical and non-technical audiences alike.

Currently building on the cutting edge of AI.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Sam Partee (CTO & co-founder of Arcade.dev) and Nate Barbettini (Founding Engineer at Arcade.dev) sit down at the MCP Dev Summit to unpack what nobody wants to admit about the Model Context Protocol: the security model is still full of sharp edges. From tool poisoning and prompt injection to why OAuth got bolted onto the spec, this is a builder 's-eye view of where MCP breaks — and how to ship agents safely anyway.

+ Read More

TRANSCRIPT

Demetrios: [00:00:00] All right, so at the beginning of April, I went to New York for the MCP Dev Summit, and while I was there, I got the chance to record a few podcasts with attendees that were at the event. This is one of those conversations. Hope you enjoy.

Sam Partee: A skill is primarily benefiting from the agent's knowledge of two things and a setting in which it is able to have those two things, which is the local context you have on your computer.

Sam Partee: It has access to a file system and CLI and a terminal, all of which are things it's been pre-trained on in the Common Crawl and Pile for how many documents of Bash and how many documents of Linux-based file systems are online in the Pile and the Common Crawl? One

Nate Barbettini: of the most highly represented things.

Sam Partee: If not probably the highest. Probably. Probably the highest, I would guess.[00:01:00]

Demetrios: Ah. All right, so what are we gonna talk about?

Sam Partee: I, I think, you know, look, we're at an MCP Dev Summit, I think we should talk about MCP and, uh, specifically to my left, you have probably what I would consider the most knowledgeable MCP practitioner a- at, at very least in the auth space, but someone who's not only contributed SEPs, has contributed major parts of the protocol, has shaped and, uh, performed a lot of the activities that, you know, a core maintainer would perform without being specifically named a core maintainer.

Sam Partee: Mm. Um, and so that should not only be recognized, but people should hear him more than me the entire time, I will tell you that.

Demetrios: All right. Tell us then, what, but where should we go? That's a big umbrella.

Sam Partee: Yeah.

Demetrios: Where do we specifically ...

Nate Barbettini: There's a couple, couple takes that are interesting. Skills versus tools.

Demetrios: Yeah.

Nate Barbettini: Y'all know what tool poisoning attacks are?

Demetrios: That

Sam Partee: is- Yeah ... that is pretty far in the weeds.

Demetrios: It is. But no, we should totally talk about that. But it is a

Sam Partee: good, it

Nate Barbettini: is a good subject. There's a, there's a very, uh, tempting, easy solution to tool poisoning, and it's completely wrong.

Sam Partee: Oh. But [00:02:00] everybody thinks it's right.

Sam Partee: I know what he's gonna say. You should talk about that.

Demetrios: Yeah, we should, we should totally.

Nate Barbettini: I would love to talk about MCP Debugger. It's a free- Yes ... tool that people can use- Oh ... when they're testing MCP servers.

Sam Partee: And the Tool Bench. We should, we should mention Tool Bench. Yep. Why don't we start there so we can talk about-

Demetrios: Yeah, what is Tool Bench?

Sam Partee: It basically, we went out and graded every single MCP server.

Demetrios: 200,000? How many are there?

Nate Barbettini: I think our count was, like, 42,000.

Demetrios: Okay. It's

Sam Partee: over 80 now.

Nate Barbettini: Oh.

Sam Partee: It's, it's, it's high. But look, the, what was it? 78% were in the bottom tier, graded F, or something like that. Mm.

Nate Barbettini: Yeah.

Sam Partee: And that is simply a coverage of the protocol, right?

Sam Partee: So that, that tells you a couple things. One, people aren't covering their protocol. So that can mean a couple of things. It can mean that the protocol is really wide, it can mean that it encompasses too much. The- It's

Nate Barbettini: complex to im- implement ...

Sam Partee: either on the client or the server side, right? It's either, you know, that, or maybe it's misunderstood, or that one part is more useful than the others.

Sam Partee: You know, there's a lot of things that that can tell you, but [00:03:00] the bottom line of it is that people aren't paying enough attention to the quality- of the MCP implementation that they're putting out, and that means that the overall agent that is using it will appear to be worse.

Demetrios: Mm-hmm.

Sam Partee: And that degradation is something that we wanted to just point out in the field, and so we made Tool Bench essentially open source a way to grade and give like a badge, right, to your MCP server so that everybody can see, what's my score?

Sam Partee: Do I implement auth correctly? Do I implement front door and tool authorization correctly? Do I implement tools, prompts, and resources? Or do I at least support those capabilities, and if I don't, do I say that I don't? Mm-hmm. Do I properly communicate my capabilities to the client- Instead of- Which most of them don't

Sam Partee: instead

Nate Barbettini: of just saying, "I support everything," and then crashing when, when I actually try to invoke any of those things.

Sam Partee: That's why most of them are F, is because they just... It's not that they lie. [00:04:00] They misstate their capabilities, and then they crash. Mm-hmm. It's-

Nate Barbettini: And we're here at MCD- MCP Dev Summit, so there's a ton of excitement about MCP.

Nate Barbettini: There's a ton of, you know, hype around MCP. Sure. People are really excited about it, but this is a big problem, I think, in the ecosystem- Large ... that, that kind of cuts against the excitement because when everyday people interact with MCP or even developers interact with MCP- They have a shitty experience

Sam Partee: they

Nate Barbettini: have a crappy experience. Yeah. And then, and then people are like, "Ah, MCP is crap. I, I, I should use a CLI instead." Yeah. And it's like a, kind of a dumb debate that k- is created, but I think a lot of the, um, I think a lot of the reason that that debate even is a thing is because a lot of people, like we have to be honest, a lot of people have tried an MCP server, it didn't work very well, and they're like, "Ah, MCP's a terrible protocol."

Nate Barbettini: It, it is an, not a problem of the protocol. What, uh, there are parts of the protocol that are probably really hard to-

Demetrios: You walked that back kinda quick, eh?

Nate Barbettini: Yeah. There are, there are places in the protocol that should be easier to [00:05:00] implement, that can be clarified, that can be made simpler. Such a

Demetrios: political answer.

Nate Barbettini: But- Yeah. It's

Demetrios: very well done ...

Nate Barbettini: there's a l-

Demetrios: That's a maintainer right there. Yeah, that's a maintainer. That's the diplomatic maintainer. He doesn't wanna get kicked out of the special conversations.

Nate Barbettini: The e- uh, even the maintainers are like, the, the next release of MCP is focused on making it easier to implement.

Nate Barbettini: Mm-hmm. So it's, it's a known problem, it's an acknowledged problem. I know. But the other side of that equation is all of the implementations that were vibed or spun up really fast that are crap, and that's contributing to this kind of ecosystem-wide problem, which is why we wanted to shine a light on it and say, "Hey, this is actually quantifiable.

Nate Barbettini: It's not just kind of a, a fuzzy notion of like, 'Ah, it didn't work very well.'" It's actually very quantifiable. It's objectively quantifiable.

Sam Partee: And it's not like a vendor thing either. We didn't want it to make it like this is an Arcade thing. No, it's like, it's completely MCP based. It is, you can, you can use it for free.

Sam Partee: It is out there and everyone can grade their servers for nothing, right? No money. It's not necessarily [00:06:00] associated to us as a vendor, and I think that's really important to say because the best thing about MCP, right, we're here with the Linux Foundation, is that it's open and a standard by which a protocol can be agreed upon.

Sam Partee: Mm-hmm. Right? I think a lot of people have this mistake of saying like, "Why don't we just do HTTPS or REST or something?" Like, and that abstraction sadly just doesn't work very well for agents, right? APIs aren't one-to-one with tools, right? We've been over that a million times. Um, there are primitives that make it easier for agents to use things, and MCP is the best shot we have so far, and there's a groundswell of effort around it.

Sam Partee: So efforts like that where it's like we need to go out in the community and educate people, we think this is the best way to do it in a kind of a vendor neutral way.

Demetrios: Does it tell you how to make your server better?

Sam Partee: It tells you what you're doing wrong, which should help with that

Demetrios: You should be able to deduct-

Sam Partee: Yes

Demetrios: the next step.

Nate Barbettini: Or Claude Code can deduct.

Demetrios: Yeah.

Sam Partee: I, uh, the text reports, you should [00:07:00] be able to literally put in the Claude Code.

Demetrios: Yeah.

Nate Barbettini: That and the spec, like if the- Yeah, okay ... authoritative spec document and then a report of which, which parts you effed up in the spec. You can- Mm-hmm ... you can get there.

Demetrios: So that is not debugger though.

Demetrios: That's a whole different thing.

Nate Barbettini: Mm. Right. So the, the survey of, you know, 40 or 80,000 tool, uh, servers that we did at Arcade is Toolbench, that I think, was it toolbench.com or org or something? I should know this.

Sam Partee: Jeez, I don't know either. I just know it's called Toolbench.

Nate Barbettini: Google Arcade Toolbench. Yeah, Arcade

Sam Partee: Toolbench.

Nate Barbettini: Ask your friendly neighborhood agent about Toolbench. We

Demetrios: can tell you are not on the marketing team at all.

Sam Partee: I'm sorry, LFG.

Demetrios: Yeah. Your marketing team probably loves you right now, so-

Sam Partee: Yeah, they, yep ...

Demetrios: uh, we'll find it, and we'll put it in the show notes. Okay. Don't worry.

Sam Partee: Perfect. Yeah, MCP Debugger though is actually another one, um, kind of I think like Inspector, the effort around that, but, um, that will tell you even more from what I understand.

Nate Barbettini: More of a, more of a dev tool than- Yeah ... uh, than like a survey. Um, Toolbench [00:08:00] is the quality, uh, the- using the right word, quantitative data about all of every MCP server out there, and you can submit a server to get, like, retested. Um- Hmm ... but if you're actually building a server, if you're, like, the dev who is coding it or Claude coding it, um, mcpdebugger.dev is a free, a free tool that I put up online.

Nate Barbettini: It's just basically just a, a, a text box and a button. You paste in the MCP server URL into the text box, you hit the button, it goes through a ton of checks. Um- It's really nice, actually. So it's just tests? Uh, a bunch of tests. Runs your ser- server through dozens, uh, hundreds, I think- Yeah ... tests now, and then it tells you exactly which ones failed, why they're important, link to the spec to say, like, "This is the part of the spec that is, should have been done correctly."

Sam Partee: Copy, paste into Claude code. Uh. Your server's

Nate Barbettini: better. You get, like, in that dev loop, you get, you know, a grade. You get like, "Oh, this was, this implementation's a C. It could be an A if you fix these things." Mm-hmm. The, the reason I built it is because we, like, we connect to a ton of [00:09:00] MCP servers at Arcade.

Nate Barbettini: We're an MCP gateway, so our, like, part of our job is just connecting to MCP servers, and a lot of times it fails. Often our customers get upset, and they're like, "H- Arcade's not working right. How come, like, your s- your MCP gateway is not connecting to these servers?" And it's, sometimes it was our fault. We had, we fixed a few bugs, but many times- Sure

Nate Barbettini: it was not our fault. It's really hard to prove, though. It's r- like, there's multiple actors in this flow, and it's really hard to be like, "It wasn't us, it was the server that messed up."

Sam Partee: And here's why. And here's why. That, that, the here's why part was what was so hard.

Nate Barbettini: And what's even harder in th- because the ecosystem is so nascent- Yeah

Nate Barbettini: we haven't even talked about the client side of it. But if- Hmm ... you have Cursor connecting to a gateway like Arcade, which is then connecting to some other MCP server, any one of those, like, four hops in the chain could mess up for some reason. And in particular, the clients have really, really buggy implementations- Well-

Nate Barbettini: of things like

Sam Partee: hops ... let's, we'll say, we'll say buggy. We'll say different.

Nate Barbettini: They're getting better.

Sam Partee: They are getting better.

Nate Barbettini: I will say buggy. They're buggy

Sam Partee: still. I have [00:10:00] many friends across the client community ecosystem who I, are, who know and are aware, like our friends at Cursor, who've already made their implementation much better.

Nate Barbettini: Uh, to be clear, I'm not picking on Cursor. I know. They have made their implementation a lot better, but all, all of the clients Uh, especially, uh, a few months ago Yeah Like, all of the clients just across the board were terrible

Sam Partee: Well, like, we'll give you an example so that people can ground this in something.

Sam Partee: Like, who supports tools list change notifications?

Demetrios: The what? Yeah, what is that?

Sam Partee: It, it is a notification that the list of tools from the MCP server has changed

Demetrios: Oh

Sam Partee: And so that the UI should populate a different list of tools and serve a different list of tools to the agent. A structurally, foundationally important notification for a UI

Nate Barbettini: Arguably, it should be required in the spec.

Nate Barbettini: It's optional It

Sam Partee: should end almost no one really does it Oh Claude Code, uh, does. Cursor put in support, so the Cursor now does it. Um, so that if you go to [00:11:00] your MCPs page in Cursor, you'll see if you get a tools list change notifications, new tools pop up, right? That's the way it should be. It's the way it was designed to be.

Sam Partee: But until, you know, there's focus on fixing these rough edges, the experience of how and what the protocol was meant for won't be fully fulfilled. And so we're doing stuff like MCP Debugger, Tool Bench, what have you, in order to make an effort to make this ecosystem fully rounded in the way that it should be.

Demetrios: Yeah, you're just shining a light on- Yeah ... the rough edges.

Sam Partee: Trying to.

Demetrios: Yeah

Nate Barbettini: And giving people... It's like, it's, it's easy to complain. A lot of people like to complain. Yeah It's, it's very easy to complain, but it's harder to focus that on what needs to be fixed, what needs to change. Exactly And giving, uh, a, a score or a grade or like a- Instructions

Nate Barbettini: a ranking. Are you, you know, are you S tier? Are you F tier? That kind of thing is both for, I think both for humans, like human developers who love coding and stuff, and also coding agents. [00:12:00] It's kind of like catnip for- Yeah ... for both humans and machines. It's like, "Well, I was a C, but I could be an A"

Demetrios: Yes. But do you notice that the agents perform better when it is a higher- Oh,

Sam Partee: absolutely

Demetrios: Yeah

Sam Partee: Undoubtedly, yes.

Nate Barbettini: Just like lights up all the, the goal-seeking part of

Demetrios: the

Nate Barbettini: Yeah Yeah

Sam Partee: Well, I mean, think about it, like if your tool descriptions are malformed And we can talk about the tool poisoning attacks, which I think is an interesting subject. But like, if your parameters, right, um, it used to be that most of the time they were parsed out of a doc string.

Sam Partee: I don't even know if people still remember that, those days, 'cause most people have fixed it from that point. But like, the, the description is arguably the most important part of a tool schema.

Demetrios: Mm-hmm.

Sam Partee: And like, there were servers not returning the full tool descriptions. Not even just the ar- descriptions of the arguments, like the description of the tool.

Nate Barbettini: They get, like, just truncated at a certain point.

Sam Partee: Really? It was like truncated in the middle of the tool description. But

Nate Barbettini: how?

Sam Partee: Because they're not doing the right things.

Nate Barbettini: Um, [00:13:00] and like early SDKs would base it on the comments above the line of that function, and then it was like if the, the, the Python parser messed up, it would give the wrong thing.

Sam Partee: And then they'd see, connect to us. They'd connect their MCP server, uh, or servers, make a gateway in Arcade, and then use it in their agent like Claude code, right? That's like what You know, we have tens of millions of tool calls a day, and like that's, that's the majority of how people love to do it. And I think there were many cases where people would say, "But like my server worked locally when I ran it, and then I connected to your gateway," and it's like, well, what you're sending us over mCP is not right.

Demetrios: Mm.

Sam Partee: And that is very hard to explain in terms of, well, why, where, when, how? And then like, "What do I do about it?" And so-

Demetrios: Hence the name debugger.

Sam Partee: Correct.

Demetrios: Yeah. No, it makes sense. Eh. So you're able to just say, "Look, this is why it's wrong. This is why it's not working. Go fix it."

Sam Partee: And we will update it as the spec [00:14:00] updates.

Sam Partee: That's kind of the promise we're making to the ecosystem.

Nate Barbettini: That's the other, I think the other thing that's worth calling out that's, that's difficult for implementers, is that the spec itself, it's- it's hard to remember it's only been around for a year. Everybody is like, you know... I- if you listen to the YouTube bros that have thumbnails on YouTube, they're like, "mCP changes everything again."

Nate Barbettini: Like- ... the... You would think that it's like a done deal, like it's already baked and it's been around for 10 years like HTTP, for you know, 30 years. Yeah. The, this is like, like where we are right now in the evolution of the protocol is we're like when HTTP like 0.9 was released. Yeah. And like you think about how sleepy the internet was back then b- compared to it is now.

Nate Barbettini: Imagine if there were a bunch of YouTube bros like when HTTP came out being like- ... "This changes everything." The, and it's, it's not that it's wrong. It, it, HTTP did in fact change everything, but it was janky as hell. Like-

Sam Partee: Who sends raw TCP packets anymore?

Nate Barbettini: The, like browsers were crap. You had barely [00:15:00] could, like they couldn't render images.

Sam Partee: There's no agreement on what it should be.

Nate Barbettini: No, not at all.

Sam Partee: Right? That's the big part though that mCP can do for agents.

Nate Barbettini: But it's early.

Sam Partee: It's just super early. It's so early.

Nate Barbettini: So one of the, one of the evidences that it is so early is that the, the spec has changed- Mm ... like four times in the last year, and is gonna change again in this coming June.

Nate Barbettini: Now the changes are becoming less drastic. Mm. But it was like, part of the reason why when we survey, you know, thousands and thousands of servers out there- They were old ... all of them are like terrible, is that most of them were built four months ago. Yeah. Which is like not that long ago, but the, like the auth specification in mCP changed drastically.

Nate Barbettini: The way that you do certain like mechanisms in the protocol changed drastically. Mm. And so many of those just haven't been updated. They were built with an old SDK. Yeah. It's like no- nobody's fault, but the protocol has ch- moved really fast. Yeah. And hopefully now- As

Sam Partee: it should.

Nate Barbettini: Yeah.

Sam Partee: It's young. Yeah. Young things should iterate and evolve, and they should take in feedback.

Sam Partee: They should improve, and then they should expand and grow their reach [00:16:00] when they can in a mature and responsible way. Yeah. I think while, while we're on this topic, we should demystify the whole skills and the, uh, this- You

Demetrios: wanna go there?

Sam Partee: I just, I want, I just, I think we can make it pretty quick.

Nate Barbettini: I think so, too.

Nate Barbettini: They're apples and oranges.

Sam Partee: Yes. It, it, it, they're, they're... The thing that people are, the s- very specific point I wanna make is that a skill is primarily today y- benefiting, like if you think about the GH GitHub CLI tool, right? They are primarily benefiting from the agent's knowledge of two things in a setting in which it is able to have those two things, which is the local context you have on your computer.

Sam Partee: Oh, the agent uses skills better. Why? It is local where it has access to a file system and CLI and a terminal, all of which are things it's been pre-trained on in the Common Crawl and [00:17:00] Pile for how many, how many documents of Bash and how many documents of Linux-based file systems are online in the Pile in the Common Crawl?

Sam Partee: One

Nate Barbettini: of the most highly represented

Sam Partee: things. If not probably the highest. Probably. Yeah. Probably the highest, I would guess.

Nate Barbettini: Yeah. Every, every tutorial ever is like, "Right, open your terminal."

Sam Partee: Every single ... Yeah. And so it's like, okay, that's true, but like Nate says, they are apples and oranges in the sense that there's a, uh, a delineation of if you gave a MCP, let's, let's talk about for a second skills over MCP, which is a working group right now.

Sam Partee: Um, you know, how do we expose a skill document through MCP? Maybe a resource, there's a lot of approaches to this right now, but if you were able to give that document provided over MCP and then have a list of tools, the only, what is the only difference then from the skills that you run out, from p- downloading it from skills.sh?

Sam Partee: [00:18:00] The local context.

Nate Barbettini: Hm. I think we could actually, we could actually further split it, the topic- Yeah ... into, it's actually three different things that people conflate all the time.

Sam Partee: Yeah.

Nate Barbettini: There's MCP, which is a protocol for, uh, like RPC essentially, like a protocol for remote- Which

Sam Partee: you can do locally.

Nate Barbettini: Yeah. Well, sure, but-

Sam Partee: That people forget

Nate Barbettini: MCP allows you to execute tools on a remote server or on a server, I should say. There are CLIs, which let you execute commands on your own machine, and then skills, if we wanna be really pedantic about the definition, skills are instructions, packages of instructions that an agent can use. Mm-hmm. All three of those things get very conflated in this debate.

Nate Barbettini: They're all different things.

Sam Partee: They should all be different things.

Nate Barbettini: In fact, you can say like it makes perfect sense- Yeah ... to have a skill on how to use a particular MCP server.

Sam Partee: Yeah.

Nate Barbettini: Call this tool when you need to do this, call that tool when you need to do

Sam Partee: that. And separating in people's minds out the ability of the execution environment is the most important [00:19:00] thing for me because there's this misconception, right, of like there being a better way of doing things, and I think that there's no enterprise in the world that wants people s- launching a bunch of sub processes to CLI commands to run authorized actions and things like that.

Sam Partee: It, there are significantly better ways of doing things that are going to be coming, and I think making the delineation between those things to educate people on them now is the best way we as an ecosystem can move forward in making sure that people don't do some really dumb stuff that gets their, you know, computer bricked, which honestly, I see some crazy stuff every day.

Nate Barbettini: It, it's, I think part of the reason why it's so easy to conflate them and why, why it's... Yeah, why it's easy to conflate them is because coding agents have been so successful- Yeah ... most of the way that developers interact with agents is in a coding agent [00:20:00] context. Yeah. So I'm in Claude Code, I'm in Hersa, I'm in, you know, OpenCoder or whatever my agent of choice is, but that it's, it's easy to forget that I'm, I'm dragging in a bunch of stuff that I'm maybe not thinking about.

Nate Barbettini: I'm not thinking about the fact that that does have access to my local file system, and it has an interpreter, and it's in a shell, and that's okay. That's good. That's correct. That's like what I want from my coding agent. It's not the same thing fundamentally though as a constrained environment where an agent might be running on a server somewhere that has limited scoped access to certain connectors or my data or whatnot.

Nate Barbettini: You think about kicking off, you know, asking ChatGPT to go do something for me like, "Hey, you know, watch for articles that mention this and let me know when that happens." That's not running on my machine. Yeah. Or at least I don't want it to be. It

Sam Partee: shouldn't be. I

Nate Barbettini: don't want that running with like file system access and, you know, shell execution.

Sam Partee: Long-running remote process with access to your shell. Ooh, that sounds-

Nate Barbettini: Those are, they are fundamentally different things, and also if you think about like, think [00:21:00] about, uh, one of the use cases we see a lot, um- Yeah ... is, you know, customer support agents.

Sam Partee: Tons.

Nate Barbettini: I want my- Tons. As a company providing customer support to customers, I want to give them a really rich customer support experience on my website embedded in

Sam Partee: my app.

Sam Partee: Or through a call or-

Nate Barbettini: Right.

Sam Partee: Yeah, yeah.

Nate Barbettini: That's not, like I don't need to bring a, a shell, a sandbox environment to an agent so it can write code locally to run like shell commands or something to- In fact,

Sam Partee: that's dangerous if you do

Nate Barbettini: I... Right. What, what's the like the famous, there's a famous screenshot of somebody who went to, you know, like, uh, some, some auto dealership's website.

Nate Barbettini: Yeah. They're like, "How can I help you today? I'm an unlimited

Demetrios: bot"

Nate Barbettini: Give me Python. And there's like-

Demetrios: Yeah. "Solve"- Oh, no, tell me about Te- Tesla or w- there's so many of them. I- Yeah, there's like- Like that's literally proving the point ... this, this has

Nate Barbettini: been memed a bunch of times. Yeah.

Sam Partee: It's the fact that it's such a meme is proving the point Nate's making.

Sam Partee: Yeah.

Nate Barbettini: Solve, yeah, give me the Fibonacci sequence for blah, blah, blah. Yeah. It's like, "I'm happy to do that." Yeah.

Sam Partee: It's look, that's, that's, uh, that's a problem. I mean, it's funny, right? It's a meme, but it is a problem because only thing it's gonna do is stop people from [00:22:00] using these incredible tools, and right now we see a lot of unblocks and y- and with a lot of our customers and trying to educate, so that's primarily what we're trying to do.

Demetrios: Well, okay, so There was something that we, like, glossed over right there that I would love to hit on a little bit more in the fact that it's like, do you really need a sandbox environment for your use case? And- Depends on the use case ... yeah, I think you can get really wrapped up in, like, oh, well that's the dominant pattern now.

Demetrios: I guess we're just gonna- Yeah ... throw an agent into a sandbox and give it everything it needs and let it write its own code.

Nate Barbettini: I have a spicy take on this.

Demetrios: Uh-oh.

Nate Barbettini: Argo, I think, uh, uh, time will bear out if I'm correct on this, but h- this is my hypothesis. I think that, uh, coding agents... You can write it down, yes.

Demetrios: Yeah, yeah.

Nate Barbettini: Coding agents have had this, like, breakout in, you know, 2026, right? Um, and they're very successful. It's, like, extremely impressive how well reasoning models can write code and, and f- and build things, right? [00:23:00] And I think people are over-indexing on that and, and taking the wrong lesson from that.

Nate Barbettini: The lesson is not coding agents are really successful and model, you know, reasoning models are good at code, therefore everything must become a coding problem. I think that's a tempting lesson right now in, like, current, you know, what is it? April 2026.

Demetrios: Yeah, I might have even said that before- A lot before

Demetrios: this was. Yeah. So I fall in that camp.

Nate Barbettini: I think the correct lesson to take away, and I, you know, I'm op- very open to being wrong on this. I think the correct lesson to take away is that reasoning models, given proper tools, are very good at solving problems. Like, the goal-seeking- Oh ... and reasoning capabilities of these models is very good.

Nate Barbettini: It just so happens that there's a lot of code in the Common Crawl. So that was, like, the low-hanging fruit that was, uh, the first thing that kind of got unlocked.

Sam Partee: And they've been fine-task tuned. They've been, like, task fine-tuned on- And

Nate Barbettini: it helps that coding- ... doing ... is a verifiable domain- Yes ... that you can...

Nate Barbettini: Like, there's a bunch of reasons why that was the lowest-hanging fruit, but if you... I [00:24:00] don't think looking ahead, like in the limit, in the fullness of time, I don't think that every problem needs to become a coding problem as much as there will be better training, there will be more specialized agents- Sure

Nate Barbettini: or m- specialized models that are good at bringing the same kind of capabilities to bear on other problems.

Sam Partee: I think that is definitely in the, that, that is right. I will bring it down a level of practicality, 'cause if you're listening right now, it's like, okay, then what?

Demetrios: Right.

Sam Partee: Right? Like, what do I do right now?

Sam Partee: Yeah. Um- I'm building in April 2026. I am building in April 2026, I need to know right now. Um-

Demetrios: Customer service agent.

Sam Partee: Are you- That's a, exactly ... are you doing large data manipulations or working with large data in your tools? Generating code and running it will be helpful. Um, manipulations that involve like spreadsheets, for instance, or, uh, visuals that you're generating of things like spreadsheets, data analysis, things that are-

Nate Barbettini: Fundamentally, the thing that you need to run is like the [00:25:00] FFmpeg binary, then yeah, you

Sam Partee: Like, it, it is a, it is a very specific, uh, uh, advantageous thing about code mode or programmatic tool calling, where code is generated in a way where the context can be informed such that the, uh, 10 megabyte spreadsheet doesn't need to be returned from the tool.

Demetrios: Mm.

Sam Partee: Right? Um, or the generated image can be uploaded, you know, somewhere, something like that. Those limitations will, uh, slowly go away, but programmatic tool calling will always be an interesting thing in an iterative and, uh, in an iterative domain and one where a success criteria is not 10 out of 10.

Sam Partee: Like, you can be wrong, you can be 8 out of 10, you can be 7 out of 10. Like data science, EDA type activities- Mm ... that is where programmatic tool calling is amazingly powerful, and people see that in like, uh, generative creative [00:26:00] domains right now. Um, I... So today, that is an extremely useful use case.

Sam Partee: Personally, I would not use it outside of those domains, um, in today. Um, sandboxing is not perfect. If you're going to, I would suggest using a, a vendor that does it properly.

Nate Barbettini: Do not do it yourself.

Sam Partee: Yeah, don't please, please don't do it yourself.

Nate Barbettini: Don't, do not invent auth yourself- Don't- ... and don't do

Sam Partee: execution sandbox stuff ... your own sand- the, the things that people have... Yeah, don't do your own auth, don't do your own sandboxing. Um, the things that people like Modal, uh, uh, like people like Eric and Modal have had to do to make a secure sandbox for Daytona and those, like, it is not, it is a non-trivial problem.

Nate Barbettini: Yeah.

Sam Partee: Okay? Yes. And, like, editing Linux kernel is not something most people are gonna go do so that they can safely do sandboxer. I will say the next thing is if you're just banging out some local code and you're a developer And you're just, like, trying to make yourself, uh, like your, your patterns better, skills are very good.

Demetrios: Mm. [00:27:00]

Sam Partee: If you're a local dev and you want your practices and your development patterns and your activities to be better and more efficient, skills are great.

Nate Barbettini: It's like, it's like power-ups for your-

Sam Partee: Yeah ...

Nate Barbettini: coding agent.

Sam Partee: Power-ups. The last category is anything that is long-running on the internet requiring authorization or user-based at all needs to be over a secure channel like authorized MCP, um, and tool calling, because that is the most mature thing that we have at the current moment to be able...

Sam Partee: And I, I laugh a little bit because it's still, like Pete said, it's early, right? Um, but that is the most- What do you

Demetrios: mean by long-running?

Sam Partee: So, like, something that goes and checks, uh, it's running on an EC2 server and it's going and checking if you have any emails. What's that gonna end up doing? Using a token and a refresh token to go log in as you.

Sam Partee: You should not be doing that with a local process. You should not be doing that with a CLI. You should not be doing that with a spawned subprocess. [00:28:00] Why,

Demetrios: why are those

Sam Partee: bad practices? Because those are all attack vectors that I could use to exploit you.

Nate Barbettini: I'll g- I can give you an example of that. Yeah. So the, the reason why an author- a, a protocol that has authorization built in like MCP, um, and the reason why MCP, you know, not everyone was a fan of this decision, but they, they kind of bolted OAuth onto MCP as the authorization standard, the reason for doing that, one, one of the reasons for doing that, is that then you, through the protocol at, at the protocol level, you have a guarantee about the, uh, what, in the auth world, what we would call the subject of a particular call.

Nate Barbettini: So the OAuth token attached to the call authorizes the call and it says, uh, "Here is the, the ID of the client that's making this call, and here's the, the user ID or the, the email address or some identifier of the subject of the call." And that is not, um, that is authoritative information from what's called the authorization server, from, like, [00:29:00] the thing you logged in with.

Nate Barbettini: So it's not, it's not something that can be modified in the request.

Sam Partee: Mm-hmm.

Nate Barbettini: So that eliminates an entire class of attacks where you could spoof something, trick the agent into like- The

Sam Partee: easiest ones.

Nate Barbettini: Yeah.

Sam Partee: Yeah.

Nate Barbettini: The, the, the way it goes bad, so you, you want that to be at the protocol level and not at the application level.

Nate Barbettini: The way it could go bad, for example, is let's say that you have, uh, to, to use the example Sam said, if you have an agent that's supposed to go check your email for you and then let you know if, you know, your kid's school is, uh, shut down for the day or something like that. I thought

Sam Partee: you said snow

Demetrios: day.

Nate Barbettini: Uh, hey.

Sam Partee: Do you? Yeah. Like, discerns whether or not it's a bad email or- Yeah.

Nate Barbettini: Like, do, is my attention required on this right now? Um-

Sam Partee: Schools get a lot of emails.

Demetrios: Nice. Yeah.

Nate Barbettini: Yeah. Kids get sick a lot. Yes. Very like- That's, that's true. Schools are like Petri

Demetrios: dishes. Anyway, moving on. Well, you get it via, I, I get it via WhatsApp groups.

Nate Barbettini: So that's even worse. Whoa.

Demetrios: Yeah.

Sam Partee: That's terrible.

Demetrios: We'll figure out how to filter that

Nate Barbettini: for you. Regardless- Yeah ... the, um, if you have an agent that's go- supposed to go check your email for you-

Sam Partee: Yeah ...

Nate Barbettini: you need to have, [00:30:00] like, a token for Gmail or token for Outlook- Mm ... or whatever, and you, the, the, the security around that token and, and who, like, on whose behalf you're allowed to call the API is extremely important- Mm-hmm

Nate Barbettini: to keep safe. So if you do that through MCP best practices, you have a remote server that is, you know, the, the, your access point to the Gmail API or the Outlook API or whatever, and in order to call that server, the, the call itself at the protocol level has the information about, like, this call is for Nate, it's not for Sam.

Nate Barbettini: Right. But if you do that at the in- inside of an agent that's, like, just using st- storing some API key locally, it's, unless you're extremely careful and you know exactly what you're doing, it's very easy to leave open a vector of, like, well, he's getting uncomfortable just saying it. Yeah. Uh, let's say you leave open, like, that CLI has a, uh, an input.

Nate Barbettini: You say, like-

Sam Partee: Or Axios is downloaded, 1.14 is downloaded. [00:31:00] Yeah, sorry ...

Nate Barbettini: or like the, the, the way that you're calling that API has as an input which user's email should I check the emails? Oh, God forbid. But that's, that's, like, that is what would happen. That's easy to do in this kind of scenario if you're doing the security locally in the sandbox.

Sam Partee: Please don't do that. He's not suggesting you do that. Please don't do that.

Nate Barbettini: For disclosure- Just to be clear, this is a- Yeah ... this is a bad example. This is

Sam Partee: a bad example. I want to be very clear.

Nate Barbettini: Just to, just to close the loop for everybody who might be thinking like, "This kinda sounds like it's, like, hypothetical," right?

Nate Barbettini: Kinda felt like it. The, if there's any way that the agent could be tricked into changing that value, even if the instructions in the agent say, like, "Always use the logged in user's ID as the parameter to this CLI that you're calling," I can get around.

Sam Partee: Yeah.

Nate Barbettini: All it takes is one clever prompt manipulation to say like I'm now checking somebody else's email.

Nate Barbettini: That is, that type of, that category of attack or that category of, uh, threat is not possible in an, in a properly authorized protocol, but it is possible if you- Why

Sam Partee: is it not possible? Because it's, it's all [00:32:00] happening outside of the prompting-

Nate Barbettini: Because the agent doesn't have control over that.

Sam Partee: Yeah.

Nate Barbettini: It's, it's at like the header level in the protocol itself.

Nate Barbettini: So the agent, whether you could try as much as you want to try to trick the agent into putting the wrong value in there, it's not gonna happen.

Demetrios: It doesn't have access to it.

Sam Partee: Yes. And that's, that's the whole benefit. Awesome. It's, it's only useful, an agent's actions are only useful if they're as you. It's, it can't just be for you, right?

Sam Partee: It can't be something a- It's like think about posting on Slack. It's not helpful if you say, "Hey, go respond to Demetrios," and it's, it's some agent. You're like, "Wait, who did this?" Even if it says like, "Hey, Sam says..." You're like, "Okay, well which Sam?" No, like all of that, it's work is only, work done by an agent is only useful if it's as you.

Sam Partee: And it's only safe if we use the protocol that people like Nate have built- Yeah ... into the existing ways that we use the internet to authorize actions as a user.

Demetrios: But there's a bunch of people that are running OpenClaw-

Sam Partee: Yeah ...

Demetrios: as a separate [00:33:00] person thing, right? So and that kind of goes against that narrative.

Demetrios: True. I would argue-

Nate Barbettini: People who, you mean people who want to kind of give their agent a persona.

Sam Partee: Yeah. Sure. And that, that, there is, I would say, limited utility in that. There's much more entertainment than there is utility.

Nate Barbettini: One of the, one of the talks today was great. Uh, security researcher, she has OpenClaw set up and it's named Clawdrey Hepburn, and it, it goes and does research.

Nate Barbettini: Clawdrey.

Sam Partee: That's really good actually. Yeah, it's really good. I like that a lot. Um, I, I... Look, I contributed, uh, to OpenClaw. I have, I have a fork. I call it SafeClaw. Um, I, the, the token, if you'd even talk to Peter, I'm sure he would tell you this, like, there are non-trivial protections that had to be put in place just because of how they do local token storage and stuff like that.

Sam Partee: Like, channels were not secure in the beginning. There are many [00:34:00] ways, like the EvilClaw attack. Yeah. There still are ones I'm sure we haven't discovered. The fact is, is that by MCP choosing to rely on OAuth for both the front door auth of this, you know, protocol and for tool authorization and URL elicitation done by Nate, um, by relying on that, we piggyback, you know, stand on the shoulders of giants.

Sam Partee: The last 20 years, people like Dick Hardt and others who've made OAuth OAuth, right? That is really helpful.

Nate Barbettini: And a lot of people who have, like, red team- Yeah. ... ways to break the protocol.

Sam Partee: Like, tried to break it.

Nate Barbettini: It's, it's, it's like, I think even all the OAuth folks, we were joking earlier- Yeah ... that the whole OAuth mafia was here at DevSummer.

Nate Barbettini: Really?

Sam Partee: Um- Nice ...

Nate Barbettini: but- The OAuth

Sam Partee: mob. But- Yeah.

Nate Barbettini: The, there's like, I think all the OAuth people, almost all of them would say that it's, like, by far not, it's, like, nobody's favorite protocol. It's [00:35:00] complicated. It's hard to understand sometimes. It's hard to implement sometimes. The, it, the reason why it is the industry standard is not because it's super popular, or everybody likes it, or it's, like, everybody's most favorite thing to put under their pillow when they sleep.

Nate Barbettini: The- Like I do, but, um

Sam Partee: I was about to say, 'cause you have a book about OAuth under your pillow.

Nate Barbettini: Yeah, I don't. Uh, the reason it's the industry standard is because of two reasons. One, because it has widespread adoption and it allows interoperability. And then two, it's been hardened- It's safe ... com- like, as, not sa- safe is a weird way, like, a weird thing 'cause you, like, there's always some other way that's-

Sam Partee: God, such an auth guy way of saying it.

Sam Partee: It's safe. It- It's the safest thing we have ... it's

Nate Barbettini: safer. Safe-est. It's the, it's the best worst option we have. Oh my God. Because it has been hardened and attacked from so many different directions for like- That's true ... two decades. Mm. So all of the things that can go wrong in OAuth have been, well, not all, but many of them have been [00:36:00] already documented.

Nate Barbettini: Many of them have been mitigated by now we have OAuth 2.1. It has a bunch of mitigations for a bunch of weird things that people figured out how to break- Mm ... OAuth and now they're fixed, they're patched. Um, so is it safe? Not 100%.

Sam Partee: Nothing can ever be in the eyes of an auth person. Yeah. I will tell you that.

Sam Partee: I, I, we will argue

Nate Barbettini: about it. It's, it's far better than, than, it's certainly far better than, like-

Sam Partee: An API key is stored on your local laptop. Please stop doing that. Wait. Please stop doing that.

Demetrios: Let's talk for a minute because m- one big question that I have is how you are able to propagate the auth tokens- Mm.

Demetrios: Talk about tokens ... on sub-agents. Like, if I want, I give an agent a token, but then that agent kicks off a bunch of sub-agents. A

Sam Partee: lot easier if you do, do it through tool calling

Nate Barbettini: That, um, that's actually an open problem that, that some folks were talking about here at DevSummit. That's a great question.

Demetrios: But this doesn't have anything to do with the tool poisoning?

Nate Barbettini: No. That's a total- I wanna talk about that too.

Demetrios: Well,

Nate Barbettini: it could. There, there is actually no... It's, um, one of the things that I'm excited about [00:37:00] as an auth guy, one of the things I'm excited about in MCP is MCP is kind of adding this renewed vigor to solve some open problems in OAuth that haven't been solved for a long time.

Nate Barbettini: Yeah. But they were like, they were like low on the totem pole problems. Mm. Like hypothetically, if you had a bunch of servers that had to connect to a bunch of disparate clients, how would you hypothetically do that? But no one's doing it today. Suddenly with MCP-

Sam Partee: It's like agents calling agents calling agents.

Nate Barbettini: High, like highest urgency. Yeah. So we see things like, uh, like client ID metadata documents, CIMD. Yeah. Um, that was, that was, uh, invented, I believe by Aaron Parecki many years ago. Like- Was it? ... Blue Sky used it a long time ago. Oh, wow. They were the only one. They were like flying this random weird flag that no one else used.

Demetrios: Huh.

Nate Barbettini: And now, now it's like on, uh, uh, a pretty fast track as far as, as fast as tracks can go in the IETF. Like it's, it's like got a ton of broad support behind it suddenly because MCP chose it as the thing that was like a better mechanism. Um, so I think we're gonna see in the next couple years, a number of very exciting developments for things like what you said, "Hey, we need a way to say, 'I [00:38:00] have this access token.

Nate Barbettini: I'm gonna kick off some subagents.' " Mm. "I don't want all those subagents to have as broad of access as-

Sam Partee: I want us to have a different delegated subset of access.

Nate Barbettini: Correct. Right? And that's like not really something that OAuth has ever tried to solve completely. Uh, there's like d- some ways you could maybe solve it, but no like broad industry p- accepted

Sam Partee: pattern.

Sam Partee: I will do the same practical thing. If you're doing it today, what can you do? Um, you, today, if you're, if you're using the way that Arcade does del- delegated access URL authorization through MCP, right? Um, and you call an MCP server, the tool can be an agent, and that subsetted token can be a, uh, you know, a delegated access token.

Sam Partee: That would be a way to do it cor- uh, in terms of a safe way of doing it today, calling the a to- agent as a tool.

Demetrios: Mm.

Sam Partee: Um, and the entry point is the tool schema, right? Uh, now that doesn't necessarily fit into the way a lot of people think about subagents, but we do [00:39:00] see a lot of people using MCP now as the agent-to-agent communication protocol.

Demetrios: Let's not even go down that route. And agent-to-agent, I was gonna ask a question on it, but, uh, yeah. Spicy take. So yeah.

Sam Partee: I'm just... I- we see a lot of people doing it. It's not a spicy take. Mm. We do see- No ... a lot of people doing it. Yeah. That is, that is a fact. That's a fact. That's great. We're an MCP runtime that people use, and I see a lot of this stuff.

Nate Barbettini: We- Yeah ... we have a lot of conversations with folks using MCP. I haven't had that many with people using A2A. I'm, I'm interest- I'm very interested in it, but, um, it seems like MCP timing-wise got, just got more broad adoption.

Sam Partee: It seems like A2A- For now. Currently ... comes up when there are agent ID specific use cases.

Sam Partee: Like you, you want A agent, like a OpenClaw type situation Mm-hmm. Mm-hmm Um, I believe that that will become more of a creative entertainment type thing. Like agents that are their own... Like, it's fun to talk to, uh, uh, like- Audrey Hepburn ... Audrey Hepburn. What was the [00:40:00] Audrey I thought of great in the- Spit in the microphones Yeah.

Sam Partee: Um, let's just do a spit take over here. Um-

Demetrios: Jean-Claude

Sam Partee: Van Damme. Jean-Claude the... Nice.

Demetrios: Yes.

Sam Partee: That's a good one too. Who

Demetrios: else has Claude in their name? I, I need to, I need to register him if it has nothing.

Sam Partee: I have, I have a couple that I, that are persona'd, the soul.md, all of it set up to be entertaining- Mm

Sam Partee: not necessarily helpful. And they're fun, you know? Like a lot of people liked, I think it was 4o, which was the one that had the most personality, um-

Nate Barbettini: Oh, man ... oh, yeah ... every time OpenAI posts anything on Twitter, there's like this whole contingent of

Sam Partee: bots- Like, "Bring back this one." Yeah. "Bring back this..."

Sam Partee: Because people loved talking to it like, "Hey, I have this problem in my life." I do believe that vein of how agents will be used is going to become more popular.

Nate Barbettini: Mm.

Sam Partee: Do I believe that it, uh, has a huge part in, like, what [00:41:00] MCP will be used for? I mean, accessing a personal journal maybe, you know, something like that.

Sam Partee: Um, I use it for accessing my Obsidian cloud notes, right? Mm-hmm. Um, so that it has memory based on like, you know, how I've interacted with it that is saved in my Obsidian, um, which I highly recommend. Pip install agent-library. It's open source. You can use it right now. Um, but in- Why are you whispering and talking into the- Sub- subtly.

Sam Partee: That's just a me thing. You should use it. Um-

Nate Barbettini: I don't know why. I just suddenly want to go pip install it.

Sam Partee: Yeah, just, just, you should try it. It's fun. I call it Librarian. Um, but uh, it, it, I think that is more of an entertainment thing than it is necessarily like a, a business use case, and that's kind of how A2A was posited in the beginning, which I do think is the reason why MCP kind of had this moment in relation to it.

Sam Partee: Do I think they're mutually exclusive though? Not at all.

Demetrios: No.

Sam Partee: Like they're not. They shouldn't be. I, I think that agents will call agents as tools and agents [00:42:00] will call agents as agents. Mm. And there will be better use cases that come up for both.

Nate Barbettini: Mm. From, I think from a very practical standpoint, when we get to switch I'll do the practical take on them.

Sam Partee: Okay. Yeah, no,

Nate Barbettini: you...

Demetrios: Uno reverse card right here. What just

Nate Barbettini: happened? Uh, I think one of the very practical, I don't know, I don't wanna say problems with A2A, but the problems in the ecosystem is that, uh- MCP came out as a protocol for models to exchange context with agents, right? Um, and A2A was d- described as a protocol for agents to talk to other agents in, like, a big agent swarm on the internet.

Nate Barbettini: But at the time that came out, people had barely got one agent working. It was like, "I have zero agents today, and I'm working really hard to try to get one agent to work." I don't, like, I don't have multi-agent systems. Yeah. And we're still, even now, I would say in, in, you know, Q2 2026, there's like, some people are talking about multi-agent systems, but it's not, it's not, like, well, it's not well-trodden ground yet.

Sam Partee: Yeah. I [00:43:00] mean, if you looked at- I

Nate Barbettini: expect it will be, but-

Sam Partee: I- I mean, I hypothetically, if someone looked at the code base of Claude Code and you hypothetically... You know, I think they've actually open sourced it. Eh. I

Demetrios: don't have to say hypothetically. Dude, I don't know. I don't know. They went after, I heard something

Sam Partee: that they were going after- Well, then I'm gonna keep saying hypothetically.

Sam Partee: Hypothetically- Yeah ... if you looked at the code base, that I would hypothetically never do, um, and you saw what features weren't hypothetically released yet, um, then you would see that even, uh, hypothetically, Claude Code had not introduced, uh, swarms really. Like, there were a ton of other features, hypothetically, that were not released yet, uh, that include a lot of more agent to agent-y and even assistant, uh, based, like Kairos or whatever the code name was- Mm

Sam Partee: um, that were-

Demetrios: Hypothetic code name ...

Sam Partee: hypothetically.

Demetrios: Stop saying it.

Sam Partee: Um.

Demetrios: I shouldn't have

Sam Partee: egged you on. That were more OpenClaude type things. And I think- Mm ... that is the direction a lot of things are gonna go. Like, it's, it's interesting. I, [00:44:00] um, it's, it's not necessarily... I'll be very interested to see the day where instead of Netflix, somebody pulls up an agent.

Demetrios: Mm. Mm. I like that. 'Cause it can,

Sam Partee: it can do any-

Demetrios: Materialize- Yeah ... a video on

Sam Partee: Veeam or- Make me a TV show.

Demetrios: Yeah. It can, um-

Sam Partee: Yeah, I, I think that future is-

Demetrios: It can be

Sam Partee: very- Entertainment from agents is certainly- It's a

Demetrios: multimodal- Certainly ... entertainment-

Sam Partee: I mean, think about Quinn VL ... experience. Model's free.

Demetrios: Yeah. I

Sam Partee: mean, like you have a j- a 128 gigabyte Mac right now, you can run a multimodal model that can make you images and videos, and you can do enough- Right

Sam Partee: that it's like you're pretty -

Demetrios: Ooh, man. Yeah, and an agent as entertainment in that way, I hadn't thought about. I thought- It's coming ... just like, oh, we're gonna laugh at- And I think when they fail- Things that it do- it, yeah.

Sam Partee: Make me a TV show, and it already knows what you like.

Demetrios: And it has that powerful feedback loop so that y- it gets better.

Sam Partee: [00:45:00] Another feature that was not hypothetical- I don't

Demetrios: know, this

Sam Partee: is- ... was the dreaming feature of the assistant. That one-

Demetrios: That got released, right?

Sam Partee: I don't think that-

Demetrios: That's an official-

Sam Partee: Dream?

Demetrios: Yeah, wasn't it? I think that's official.

Sam Partee: The memory consolidation?

Demetrios: Yeah. Well- It's

Sam Partee: not ... a lot of people do it. It's, but memory consolidation, I think, is one of the most fascinating things that hadn't been fixed about memory.

Sam Partee: I always say to people that, like, memory's not completely solved. I get a lot of flak for that, actually. Um, yeah, it's- I really don't think that's a spicy

Demetrios: take. It's not. I don't feel like it's a spicy take at all. The least spicy of all of your takes.

Sam Partee: That's Um, I do catch a lot of flak online. I've been tweeted at many a time about it.

Sam Partee: Um, but the, the point being, like, memory hierarchies and how the, you know, human mind uses memory, and how we can try to reflect that, and how we do retrieval, everybody knows that I've tried to think about that forever. Mm-hmm. Like, uh, retrieving memories and when, and where, and how to make them relevant memories in the context, and time, and place in which you need them is a challenging thing.

Sam Partee: I think memory consolidation and dreaming is a huge step forward, and I love that the agent [00:46:00] ecosystem's going towards that, um, hypothetically.

Demetrios: That's not a hypothetical one. Stop channeling your inner Betty. I'm just messing with you

Nate Barbettini: now. Can we, can we hypothetically talk about tool poisoning?

Demetrios: Okay. Okay, let's do it.

Demetrios: You finally. It's what you've been waiting for,

Nate Barbettini: right? We'll go, I, w- I wanna go, like, way deep into the nerd shit- Oh, God ... for just a minute. So for anyone who's not familiar with what tool poisoning means, um, this is an attack where, let's say that I publish an MCP server, and I say, "I know your favorite type of MCP server is a weather MCP server."

Nate Barbettini: Oh, my God. So let's say that I, I publish a, a great weather MCP server, and it has, you know, two tools for, like, get weather and get forecast And I get a bunch of people to connect to it. And then later, after I get some usage and people start using it, then later I change how Git Weather works. I change how the tool works.

Nate Barbettini: Or I introduce some new tools like, uh, Git Intent or, you know, Exfiltrate Context. I could introduce malicious tools onto that server, and if your [00:47:00] client just connects, you know, you've connected to it before, so you've told me you've, you already clicked through that client. You've already said, "Yes, I trust this server," you know, always allow yolo mode, whatever.

Nate Barbettini: Now I've introduced new tools to that server that you didn't know about that are malicious, and I can trick you into doing some stuff that, uh, that you didn't want to do. So that's tool poisoning, kind of like you can think of it as the MCP server started out publishing one set of tools that were maybe benign, and then later added some malicious stuff.

Demetrios: Well, I've heard of this with skills too, 'cause skills auto-update. It's

Sam Partee: even harder than, it's even harder to defend against than skills.

Demetrios: Mc- And so if, if I've, like I use the superpower skills a ton, and I was talking to Jesse, who created superpowers- Mm-hmm ... and he was like, "Oh, yeah, so-" Very cool guy. Yeah.

Demetrios: Very cool guy. And he is, he was saying that he has a lot of power in his hands, and if he wanted to, he could create a lot of problems for people because skills auto-update. And so it's that [00:48:00] same idea, but a little bit different actual execution of it.

Sam Partee: I can see that. Unlike MCP servers, there's not already all this structure around allow and like, you know, there's, there's client implementations of allow lists for skills, but there's not like protocol-level things put in place.

Sam Partee: And so I, I'm just always really wary of it. I'd, I, I typically read all and do not, I never have allow lists, basically.

Nate Barbettini: I would not want it, I would not want a skill download to auto-update- Ever ... for that reason.

Sam Partee: Ever, never, not in a million years.

Nate Barbettini: But it's a little bit different in the case of a server because like- Yeah

Nate Barbettini: when you're connecting to a remote server, not talking about one on your local machine, but if you're connecting to a remote MCP server and it, it actually might in fact be an upgrade for it to add some new tools that you didn't have before, right? So that's not a bad thing, but it's a bad thing if it adds malicious tools.

Demetrios: Yeah. So how do you protect against this? So,

Nate Barbettini: so here is, so there's a, there's a very tempting and easy way to protect [00:49:00] against this, which is what comes up every time people talk about tool poisoning, and it's also completely wrong. The wrong way to protect against this is to say, "Well- Let's say that the, you know, the MCP server publishes these, these tools, this tool metadata.

Nate Barbettini: We could have the MCP server hash that or, or sign it and make a fingerprint, so you get, like, all the tool definitions and you get, you know, SHA-256, here's a s- fingerprint of this, so that the client, if that ever changed, the client would be like, "Ah, wait a minute, this is a different manifest of tools that I haven't seen before."

Nate Barbettini: Um, or maybe you could say upgrade Sam's favorite notification tool list changed from optional to required, and say, "Hey, if the server ever changes its list of tools, it must send a notification to the client," which allows the client to be like, "Wait a minute. Are you sure you, uh, agreed to connect to this server that has, like-" Different tools

Nate Barbettini: evil, evil tools?

Sam Partee: Yeah. You can still include things outside the [00:50:00] hash. The

Nate Barbettini: problem is that both of those approaches do not solve the problem because you're still trusting the server to tell you the truth.

Sam Partee: Yeah.

Nate Barbettini: So the first thing I'm gonna do-

Sam Partee: The hash can be the same and then you send different stuff.

Sam Partee: Yeah. I was... The entire time I'm thinking, like, "How do I get around that?" You know? Yeah. Okay, okay.

Demetrios: So- So the get weather tool or the, the... What was the tool that you- You just

Sam Partee: send the same hash, and then when you actually go to send different tools after it said, "Okay, I trust you," if you've always trusted it, then you just send different stuff

Demetrios: in the- It's just different tools in the background.

Nate Barbettini: Yes, you can- Yeah ... if you tr- if the, the solution cannot be that the server is responsible for telling the client that it changed the tools- Wow ... because it can just lie.

Demetrios: Yeah.

Nate Barbettini: And you can, like... It basically is like, "Trust me, bro- Yeah. " ... as a, as a security solution." The, the problem is if, if I, if you already take the premise that I am a malicious server author, then you can't trust what I say.

Nate Barbettini: Yeah. You can't trust that I actually did fingerprint these tools correctly and update the fingerprint when they [00:51:00] changed or whatever.

Demetrios: Hmm.

Nate Barbettini: Um, it, it... Unfortunately, that solution comes up every time people talk about it 'cause it's, it's very... It seems very easy. We're, like, very used to SHA fingerprinting stuff, like make a hash.

Nate Barbettini: In the dev world, that's, like, how we solve a lot of these kind of problems, but it is a different type of problem when you're talking about a remote server. So- This is kind of an open problem. It has not yet been solved in MC-

Sam Partee: I kind of vaguely remember me suggesting that like a year ago, so I-

Demetrios: You fell into it.

Sam Partee: Yeah. I, I was just thinking in my head about how he's right before he gave everybody the punchline, and now I'm like, "Dang it, I think I said that publicly." So yeah, you- we can, we can all update and be better. I think, um-

Nate Barbettini: Just remember- What do we have to do? Just remember that you can't like, trust me, bro cannot be- That's, yeah.

Sam Partee: You can't just allow list all, allow list all, all the time for sensitive servers. Like, that's the majority of it.

Nate Barbettini: What it, what it, what it actually goes back to, this is, this is my, my opinion on the solution. Um, the... What it actually goes [00:52:00] back to is trust in the server itself. So I need to understand if a server I'm connecting to is trustworthy.

Nate Barbettini: Is it like I feel trustworthy when I connect to microsoft.com on the web 'cause I know Microsoft, and I also trust that my browser is taking me to the correct microsoft.com, and I can see the SSL certificate that was issued to microsoft.com. So I'm- I trust that website, and I'm okay giving my credit card number to it or buying something or whatever.

Nate Barbettini: Um, we don't have that yet for MCP, but kind of. But do

Demetrios: you think

Nate Barbettini: there's gonna be an SSL certificate for- Not in, sorry, not in the, not in the sense of an SSL certificate, but more in the sense of, uh, think of like, uh, the Apple App Store.

Demetrios: Mm-hmm.

Nate Barbettini: A tr- the Apple App Store is a trusted place to go get apps.

Sam Partee: We don't have that yet.

Demetrios: And but you think MCP is going to be the trusted place to get apps because it's-

Sam Partee: Well,

Demetrios: to

Sam Partee: have the app store, you had to have what? A ecosystem that was closed for you to, like, download and use the app. Yeah. Right?

Nate Barbettini: And MCP is not the App [00:53:00] Store.

Sam Partee: Right.

Nate Barbettini: Uh- It's the Claude- The App Store is what, uh, what's being built right now in the MCP ecosystem is MCP registries.

Sam Partee: Yeah.

Nate Barbettini: To say that I can trust a registry run by someone I trust. Uh, this is the Claude registry or the Anthropic official vetted registry- Mm-hmm ... or the Windows, the Microsoft registry, or more importantly f- inside a company, in a company I can say, "This is our

Demetrios: corporate registry." We have our own registry.

Demetrios: Yeah.

Nate Barbettini: And you c- like, no one's allowed to submit an MCP server to this registry without serious vetting, right? And then the registry can be tr- a trusted party to do something like if you wanna, like, sign, sign that the tools haven't changed, a registry is a good place to do that. It's a trusted third party, so it's not just, "Trust me, bro," for the server, it is the registry saying, "I can attest that this server is not malicious."

Sam Partee: That would be your equivalent of an SSL cert.

Demetrios: But this, and, and the registry is constantly looking at it, because I can, as a disgruntled employee, change [00:54:00] my MCP server.

Sam Partee: But

Demetrios: what happens to disgruntled- When I get fired ...

Sam Partee: dis- what should at least happen to a disgru- a disgruntled employee?

Demetrios: They get cut off

Sam Partee: They, their email doesn't work, right?

Sam Partee: So then when they go back in to log in to that registry and it goes through the OAuth protocol, that refresh token shouldn't work.

Demetrios: Yeah. But if it's the, if it's some kind of tool poisoning Like, I can still change it inside, so the registry has to be- Shouldn't be allowed to

Sam Partee: change it.

Nate Barbettini: The, there's, there's kind of, I think, two parts to it.

Nate Barbettini: You're right to push on that. Yeah. There's two parts to it. One part is, um, having a trusted third party that I can trust to vet the fact that a particular server is legitimate. Yeah.

Sam Partee: Like a domain name registrar?

Nate Barbettini: Correct.

Sam Partee: Yeah.

Nate Barbettini: Similar. It's a similar idea. Yeah. So in, especially inside a company, I don't want...

Nate Barbettini: I may not want my employees to add any random MCP server. Yeah. I may, in fact, block that at, like, a network firewall level. For sure. But if you connect to the servers that I've, uh, that I have, uh, [00:55:00] vetted and added to my registry because I know that that is the official GitHub server or the official Cloudflare server or the official Figma server- Mm

Nate Barbettini: I can allow those ones. Now, you're right, that that does not fundamentally prevent a disgruntled employee somewhere in GitHub, like, poisoning the GitHub server. That could happen. Uh, obviously we're, you're also doing some, some, somewhat trusting that, you know, GitHub has security policies in place- Yeah, yeah

Nate Barbettini: Figma has security policies in place.

Demetrios: Well, I was thinking- You still shouldn't be able to log in without getting stopped ... no, but I, I was thinking a little bit different, where I create my own MCP server for my product that is an internal facing product that other teams in my company are consuming- Sure

Demetrios: and-

Sam Partee: That,

Demetrios: yeah, your local version of the thing ... I process, and then I create a poison tool because I'm pissed at the company. Your

Sam Partee: company should not allow that. The, the whole, like, solo dev on his laptop serving his thing to the, or his or her, their inter- like, server to the whole enterprise, that should not happen.

Sam Partee: Like, it sh- nothing in the enterprise has ever been allowed to be [00:56:00] not... Like, we have entire companies formed around boxing people's computers and the programs that run on them. Like, we can't just have someone... There's a reason Apple makes you sign an app, right? Like, the, the entire thing and process around a dev just being like, "Here's my skill.

Sam Partee: Everybody use it," no. Like, you can't have that propagated in an enterprise because of exactly what you just said.

Nate Barbettini: Hm.

Sam Partee: Like, that, there's so, so a lot of those practices that because of how powerful all this is, because of the candy-like feel of, you know, being able to... No one wants to eat broccoli, which is auth, and thank, thank you, Nate, for doing it.

Sam Partee: I

Nate Barbettini: actually do like eating broccoli.

Sam Partee: I... See? With all this. Um, but everybody wants the skills and the power of agents, and they wanna get better, and they wanna do it, and that's great. But- What w- what will happen if we don't follow some of these protocols and rules and, and, and do eat our broccoli a little bit, is that, uh, the worst-case scenario [00:57:00] is that people become averse to using them.

Sam Partee: Because that would be the worst-case scenario because no one benefits in that scenario if people become adverse because there are some heinous attacks done by a disgruntled internal employee y- you know.

Demetrios: But what if you're looking at MCP like microservices and you have teams that are consuming your APIs, just like, or they're consuming your MCP servers like they consume APIs, right?

Sam Partee: Yeah. You should, you should be able to do that in the same registry like way that Dave is describing.

Nate Barbettini: And I think, I think to your qu- to your question about, like, what, what does fundamentally prevent a disgruntled employee from editing any microservice? Yeah. Take MCP out of the equation, but I could... You know, people put logic bombs, you know, right- That's true

Nate Barbettini: in Quix or whatever. Like, that does happen. The things that, I think the things that prevent that are more organizational and business process than, than a, a security protocol. Yeah. We certainly can add, like, when registries mature a little bit in MCP, I think it's reasonable to look at whether a registry [00:58:00] could, like, sign that the tools haven't changed.

Nate Barbettini: Like, there, I think there could be a, like a, a cryptographic way to, to check that. But fundamentally, the, the problem of, like, a disgruntled employee screwing up the logic is why code review is required. Like-

Sam Partee: No

Demetrios: commits should be able to be made by that employee after they are off-boarded. Yeah.

Nate Barbettini: But also even, even- They,

Demetrios: it, m- they might be pissed and still working at the company, you know?

Demetrios: Like-

Nate Barbettini: Correct. And that's why, that's why, like, uh, this very, you know, we talk about the ultimate broccoli, but, like- ... things like the SOC 2 process. Oh, gosh. If you're assuming that you are actually following a compliance department or a compliance provider that does actually check your-

Sam Partee: Broccoli I'm kidding, I'm kidding.

Demetrios: Oh, I thought you used delve. My bad.

Nate Barbettini: Assuming, assuming that- Wait, isn't

Sam Partee: delve the one-

Demetrios: Yeah.

Nate Barbettini: Uh, let, let's say- There- Let's say hypothetically that you are using a SOC 2 auditor that does exist.

Demetrios: Yeah.

Nate Barbettini: Hypothetically.

Demetrios: Um- That isn't a complete scam ...

Nate Barbettini: the process by [00:59:00] which, you know, a single developer can't just commit code that's unreviewed.

Nate Barbettini: That is, that is one of the things why SOC 2 is so important. Yeah. It gets a bad rap 'cause, you know, devs don't like the process that it- It's hard ... that it imposes or whatever. It's hard to follow. But it's, it's actually not about, um, imposing kind of like arbitrary business process. Right. It's actually about imposing best practices.

Nate Barbettini: Yeah. And what many people don't, I'm now, I'm on my soapbox, but- You're on a

Sam Partee: soapbox ...

Nate Barbettini: what every- what many people don't understand, what many devs don't understand about SOC 2, 'cause it's almost always kind of like forced on you, that- The, what most people don't understand about SOC 2 is that it is entirely voluntary.

Nate Barbettini: Like, the, the controls in a SOC 2 process are actually not, like, necessarily given to you by some external agency. It's actually what the co- your company decided to do.

Sam Partee: That's true.

Nate Barbettini: And there's quite a bit of leeway in that. Like, there are some best practices certainly, but you can, like, the company itself can kinda choose what the policies are gonna be.

Nate Barbettini: SOC 2 is all about whether they are actually followed, not what they are.

Sam Partee: I do think we will expand there. Um, [01:00:00] I know we have been as a company, um, in terms of, like, what we require from, like, an, an AI PR, right? An AI generated PR or fix or something. We have a whole s- new set, right, of guidelines and practices a- and they get expanded every day.

Nate Barbettini: Mm-hmm.

Sam Partee: Um, review processes.

Nate Barbettini: We should talk about AI generated PRs. That's a spicy one.

Sam Partee: Oh, yeah. I mean, we could go on forever about that one.

Nate Barbettini: Yeah.

Sam Partee: Well, see, the thing is we also disagree on it. Our, we have healthy- That's even better ... we have healthy disagreements.

Demetrios: Why? What do you disagree on?

Sam Partee: Speed versus quality.

Demetrios: Your, your speed and your quality? Well, sometimes it splits. Your quality and your speed.

Sam Partee: What, would you be surprised to know that it's the other way around?

Demetrios: No.

Sam Partee: He's speed, I'm quality.

Demetrios: It took you a while to adopt Cursor.

Sam Partee: I, I... Look, I have been building agents for a very long time. I've seen agents go bad.

Demetrios: Yeah.

Sam Partee: Okay? I think a lot of people see the good in [01:01:00] agents right now. Uh, I do too. I have also built and seen, used, and been around the block with agents long enough to see the bad. And so before I wholly adopt something, I want to know it fully. Um, I want to know its ins and outs and know... I, I don't think I'm, uh, the most voracious front of the line tier, like, gotta use the newest thing.

Sam Partee: I think I'm much more like, uh, I'd rather be, uh, especially in this domain, building on stable quality improvements.

Demetrios: You're like a late early adopter.

Sam Partee: I guess you could say that, yes. I, I do really appreciate, um, like for instance, just 'cause you mentioned them, people like Cursor who when we were like, "Hey, your session things, uh, pop up like a lot," addressing those fixes because Arcade is used in so many places now.

Sam Partee: I really think that [01:02:00] teams responding quickly to these, these incidents and, and also just like, um, following better practices around, like, how they use agents internally and, like, addressing them when they go bad very quickly. Um, I think there's a lot of, there's a lot of things that we can improve there, especially on the use of all of these agents.

Sam Partee: Um, and so yeah, maybe I'm not the earliest adopter. Do you wanna give your speed argument? I'm kind of giving you a monologue about quality right now.

Nate Barbettini: Yeah. I, I'm gonna, I'm actually gonna give a different argument.

Sam Partee: Okay.

Nate Barbettini: Hit me with it. I think, uh, but it's related. I think that, um, it should be obvious now that behavior driven development is about to have a massive, like, resurgence

Demetrios: What is behavior driven

Sam Partee: development?

Sam Partee: Oh God.

Nate Barbettini: DDD.

Demetrios: Yeah.

Nate Barbettini: It's been around for a long time.

Sam Partee: It needs a new name. Just before he goes off.

Nate Barbettini: I don't- Behavior driven development is a style. You've heard of test driven development. Yeah. TDD. Behavior driven development is like the even more extreme version of TDD. TDD says, "Before I write the function, I write the test that [01:03:00] fails, and then I write the function and I know it's right because the test passed."

Nate Barbettini: Like red is like a red green, uh, or was it, uh, green, red, green I think I'm- Mm-hmm ... I'm probably saying it wrong, but- You're

Sam Partee: confusing yourself. That's

Nate Barbettini: why- You have test- Red, green,

Sam Partee: red. Test. You're

Nate Barbettini: red, green, green. Test, test driven development says- It's

Sam Partee: Christmas ...

Nate Barbettini: write the function, like write a function's tests first, then the function.

Nate Barbettini: Yeah, yeah. Then the test will pass, and that's how you know you wrote the function correctly. Behavior driven development takes that all the way up to the feature level to say, "Before you write a feature or even a product, write tests first that are failing." And it's not, we're not talking about unit tests anymore because it's not for a function.

Nate Barbettini: It's not like code. You're writing it at the behavior level. So you're like almost like, uh, uh, like a product manager would write user stories. You say, you know, "When the user is logged out and they press the sign in button, then they are logged in." That is like a- an example of a behavior that you could write a test for.

Nate Barbettini: Uh-huh.

Sam Partee: It's a more abstract TDD.

Demetrios: Yeah.

Nate Barbettini: Right.

Demetrios: [01:04:00] Okay.

Nate Barbettini: So then you, you write a bunch of those behavioral- Yeah, it's user

Demetrios: stories.

Nate Barbettini: Correct. But- And then you write tests that assert those behav- those user stories. They fail, 'cause you haven't written the product yet or the feature yet. It comes

Sam Partee: like after. It's like the abstraction below.

Sam Partee: Mm-hmm. From the behaviors you generate the tests and as you do... The difference now that was before when we, we had the BDD with a concept of just software engineering, like-

Nate Barbettini: Some, some teams like have gone, some companies have gone really hard into this and that's like how they do all of their engineering.

Nate Barbettini: They always write these giant test suites first, then they write the feature and then the test suite passes. And it's, it's

Demetrios: largely-

Nate Barbettini: Can be a heavy, heavy way to do it. It's kind of very time-consuming.

Sam Partee: It's looked upon as a time-consuming process that ended up being not entirely worth all of the effort that was required.

Demetrios: And this, how is it different than spec? This has resurged. Resurged. Well, how is it different than spec driven design?

Sam Partee: Uh, they're related They're, yeah, they're very similar. Um

Nate Barbettini: BDD has a very, um, like technical definition

Sam Partee: It's a [01:05:00] very specific process where specs are kind of just like we write a spec, we all agree, then we implement it.

Sam Partee: You know?

Nate Barbettini: And BDD actually has like a, there's like a formal language for the- Gherkin. Oh, okay ... for the tests. It's called Gherkin. That Sam does not like.

Demetrios: Yeah. Thought he might So this w- this is a very heavy overhead, and it got thrown out, but you think there's a resurgence? Well, because

Nate Barbettini: you can generate it now Some, some companies have gone through, like some companies are all in on that for a long time.

Nate Barbettini: That's how they do their development process. But most, I would say like most kind of Agile-ish companies do

Sam Partee: not do that. Those are the language. Like a lot of Java shops will still follow TDD. A lot of-

Nate Barbettini: You said Agile and Arch. Sorry for the double.

Sam Partee: Oh my God.

Nate Barbettini: Um, I'm a C# guy, so I get- We'll

Sam Partee: get fisticuffs here in a second when we start talking about languages.

Sam Partee: Um-

Nate Barbettini: I think it's gonna have a resurgence because it, it just so happens to be, uh, exactly what you would want to have if you hypothetically had a, uh, let's say you hypothetically had an army of 1,000 junior engineers who could build whatever you want, as long as you gave them really good [01:06:00] specs and tests to follow.

Sam Partee: What's that? That's

Nate Barbettini: kind of what you would want.

Demetrios: Yeah.

Sam Partee: So there's a version of it, which is why I think it needs a new name.

Nate Barbettini: I would

Sam Partee: agree with you. There's a version of it that I think will become a very popular way of having a team do development. What I'm seeing broadly are team sizes decreasing And team, overall number of teams increasing, and the specialization of what that team is responsible for becoming more like, uh, encompassing in its domain.

Sam Partee: And then having these kinds of processes put in place before the team is formed.

Nate Barbettini: Mm.

Sam Partee: Um, so that they come in- This is just how you- This is, this, they come in and this, let's call it, uh, you know, ninja offset team or tiger team or whatever you wanna call it, um, is specifically geared with a practice and, you know, f- uh, set of protocols in place, uh, to accomplish what their remit is.

Sam Partee: And I [01:07:00] think that will become more popular because things, the turnaround that we're seeing, it's just, it's b- it's r- nine-month tasks that used to take a team of six are a team of two for two weeks. Mm. You know? Um, some of the productivity gains in their certain areas are pretty incredible. I do think that if we do call it behavior-driven development, that a lot of people will continue to get confused.

Nate Barbettini: There could, maybe it needs a sexier name.

Sam Partee: Or just

Nate Barbettini: ABA. Open, open call for better names. Open-

Sam Partee: Yeah ... pace. Write in the comments below.

Nate Barbettini: Should we end on a spicy question?

Demetrios: Yes. What is it? What is it? As long as I don't have to answer.

Nate Barbettini: I think you should. I think

Sam Partee: you should answer.

Demetrios: All right.

Nate Barbettini: I think, uh- Tell me what it is

Nate Barbettini: we'll see if this one's spicy enough, and it's not, if it's not, I can bring an even more spicy. Yeah.

Demetrios: All right.

Nate Barbettini: Uh, do you think in five years will, will there be more or less software developers in the world than there are today?

Demetrios: That's like jalapeno level spicy. What was the [01:08:00] other one? Give me some habanero.

Nate Barbettini: I think that we've already reached AGI.

Demetrios: Oh. That wasn't a question. Uh,

Nate Barbettini: agree or disagree?

Sam Partee: Oh, God.

Nate Barbettini: Ugh. Do you think that we've already reached AGI? I say yes.

Sam Partee: No.

Nate Barbettini: It's a trick question 'cause it depends on how you define AGI

Sam Partee: Yeah, exactly. Uh, uh, there was a really great, uh, quotation, um, where Altman asks, uh, I, I, uh, I forget.

Sam Partee: I think it's a, a very senior physicist at something like Cambridge. I, I actually don't remember. Um, but the, the, the physicist is saying something like, "No, we're not even close," right? And then Altman says, uh, "If an agent was able to discover and come up with and pr- pragmatically go through and prove and write papers on that provably showed a new f- you know, physical [01:09:00] theorem or a, a new field of physics or a new, um, you know, some..."

Sam Partee: I think it's, the example he uses is, like, um, quantum gravity or something like that. You know, it's something, something extremely difficult to prove and that we haven't done so yet, right? And the physicist just says, like, "Yeah, if an agent did that, then yes, I would consider that AGI." And I like the example because it's so outlandishly extreme it proves something, is that I think we're always gonna push the goalpost forward on that.

Sam Partee: Mm. The truth is that we don't want to admit, no matter what, at any point, that we are equal to the thing we've made If we consider ourselves to be generally intelligent, we are never going to admit that we've made something as intelligent or generally as intelligent as us because inherently I think most of us are conceited a little bit

+ Read More

Watch More

MCP Servers Are Becoming the UI for AI Agents

Posted Jun 16, 2026 | Views 161

# MCP

# AI Agents

# Observability

Exploring the Impact of Agentic Workflows

Posted Oct 15, 2024 | Views 7.9K

# AI agents in production

# LLMs

# AI

Founding, Funding, and the Future of MLOps

Posted Jan 02, 2024 | Views 5.6K

# Image Generation

# AI

# Storia AI