From Single-Player to Multi-Player: Operating AI Agents at Scale
Speakers

James is the CEO and co-founder of Guild.ai, the control plane for AI agents where engineers build, reuse, and evolve intelligent software together. Previously, he was VP of Engineering at Meta Devinfra, leading the developer tools, platforms, and services that power Meta’s products, and helping build the company’s first internal AI agent. During his time at Meta, he co-created Diem and served as Head of Engineering at Instagram. Over a 40-year career, James has founded and led world-class engineering teams at companies including Yahoo (following the acquisition of his startup Luminate), Lightspark, LiveOps, Tellme (acquired by Microsoft), Netscape, Oracle, and Borland.

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
SUMMARY
James Everingham is the CEO and Co-founder of Guild.ai — the AI agent control plane for production teams. With roots at Netscape, Instagram (Head of Engineering), and Meta (Head of Dev Infra, leading a 1,000-person org), James brings rare, hard-won expertise to the challenge of operating AI agents at scale.
TRANSCRIPT
James Everingham: [00:00:00] I'm not worried about, like, how, uh, I would transfer context, uh, from one agent to an agent in my own company. I'm worried about, like, if I pull in infrastructure or agents from another company, how do I transfer context to them so they can do their, their job well?
Demetrios: probably be the least formal podcast you've ever been on.
James Everingham: It's good. I like it.
Demetrios: So yeah, hopefully whatever bar you had for the expectation, I've lowered it as far as it can go down.
James Everingham: You know, I'm your perfect, I'm your perfect, uh, guest for this because I'm gonna try to out-lower your bar of guests that you've had on.
Demetrios: Oh, perfect. Well, how do you guys think of yourselves? Because I've heard you called a control plane for agents, and I'm not exactly sure what that means now that there's all these new terms [00:01:00] being thrown around. I try and stay hip to them, but I would love to know your definitions.
James Everingham: Yeah, absolutely. Like, you know, in, in classic networking and hardware, a control plane is something that, like, allows you to sort of provide governance and control over, like, your network and your data traffic.
James Everingham: Now we're starting to see, like, AI agents working inside companies' infrastructure, and they're running autonomously. And so you need sort of the same sort of controls around this. You need something that provides governed access to the data, and you need something that provides access to the operations that these agents might do.
James Everingham: And that includes everything from beyond control to, like, security to observability to understanding, like, the measures of how these things are performing. And that ends up looking a lot like an operating system layer that will run on your infrastructure. So you, you, you, you have agents. They [00:02:00] need a place to run where they can be safely managed and controlled, um, where you're putting policy in between how they access data and execute things that you can configure and set up, and that is what a control plane does.
Demetrios: And there's obvious policy that I can think of off the top of my head-
James Everingham: Sure ...
Demetrios: especially because it's every other week that we see a post go viral like, "Oh, the agent blew up my database," and, "Oh, the agent did this that I didn't want," and those are probably the sensational stories. What other types of policies are you seeing put into place that are effective?
James Everingham: Sure. Well, I'd, I'd dimensionalize, like, these policies into things that protect you against very different things. One, like the example you used of a database outage, that is probably, uh, a policy that you'll wanna set up to not allow your agent, just [00:03:00] like a human, to have access to your critical configuration data.
James Everingham: Right now, a lot of people are running agents, and they sort of give them, you know, unbridled access to, to a lot of the infrastructure, either intentionally or unintentionally. So setting policies around what they can and can't do. Can they read this? Can they write to this? You need different, uh, actors in your system to have different policies.
James Everingham: If you think about how humans are managed in a large company, you might have different, uh... You might have different, uh, fences around, like, your finance team and your engineering team. Like, you don't want your finance team to go in maybe and start deleting code, and you don't want your engineering team to be able to get into your financial data.
James Everingham: So setting up those policies is one. Now, some of the more interesting policies when you get are things around, like, uh, uh, compliance and regulation, right? Like, you hear a lot about AI code review. Uh, that's, that's [00:04:00] one of the big, you know, impactful use cases of, of AI, is having... getting feedback on your code.
James Everingham: But if you get into a regulated company, it's not clear whether, uh, uh, an AI, uh, an AI agent can do that review or a human, you know? So, uh, setting these policies up, uh, with what matches your company's, your company's policies is prob- is pretty important. So we're... We think that this is going to be a big area for agents.
James Everingham: They're non-deterministic. Humans are non-deterministic, so just like them, you're going to need to put guardrails around them and make sure that, you know, they only can access proper things, and they do it in a way that's compliant
Demetrios: And there are certain guardrails that I look at as far as layers of guardrails.
Demetrios: Mm-hmm. Like, I don't want the model saying things it shouldn't be saying. That's one type of guardrail. But then there's doing things it shouldn't be doing.
James Everingham: That's right.
Demetrios: Right? [00:05:00] And so, and then there's also guardrails on having access to things it shouldn't be having access to. And so are you attacking these guardrails from all of those levels?
Demetrios: Do you see them as the same thing at the end of the day? Because if it doesn't have access, then it's not gonna be able to do something or say something?
James Everingham: I think they're a little bit different. Um, uh, but I do think that, you know, what you're hitting on i- is correct. So what you need in, in infrastructure that needs to be reliable and stable, you need it to be highly deterministic.
James Everingham: So in order to operate this type of technology that's not deterministic, you need a layer that will either isolate you from that or, or, or sort of facilitate it. So let me give you a few examples. So in, even with AI, if you, you can build a confidence of how accurate something is by running it through evaluations, [00:06:00] you know, where you have a set of tasks that are representative to some- something that you're trying to achieve, and you know what the answers are, and you can run it through, and you can verify or get high confidence that it's accurate based on those evaluations.
James Everingham: Now, uh, that confidence level can go down quite a bit if it's, if it's not, if it's not able to solve the problems or complete the workflows. So you need to be able to have some measure of what the confidence is of how well these things will operate. And that's something that, like, your, your, your different, different actors in your company, different operators, you know, your security team, your, your, your finance team, et cetera, they're gonna have different perspectives on, so they need to actually go in and be able to configure those things.
James Everingham: And let me give you, like, two practical examples here is, like The finance team, for example, might, they might be the ones signing [00:07:00] off on your third-party model budgets, right? So they might want to get in and set certain budgets or certain circuit breakers around agents, teams, people saying like, "This is the budget that these people can use in token."
James Everingham: And, um, and set that policy up where the security team might have some different thing where they're like, "Yeah, we, we want to sandbox this off and don't want this agent to be able to, um, access these very critical systems." Um, so those are like two th- two, two very key functions that you need to be able to do.
James Everingham: You need to be able to measure and understand how critical it is and what the, uh, confidence is of your agents being able to complete their tasks and ma- and manage them differently based on those, on those expectations.
Demetrios: I love this point on every department almost, or key departments are gonna have key policies that they want enforced, and so you're [00:08:00] almost building for different users.
James Everingham: That's e- exactly correct. Like, you know, you, you can think of, you know, you- agents as, uh, as, as the same way that you would think of employees in one degree. Now, I don't want to, I don't wanna reinforce this message that you hear a lot that, like, AI is going to replace everyone, because, like, I'm actually not a big believer of that.
James Everingham: But that being said, they do have a lot of the characteristics of people, especially with the non-deterministic. You know, if you... I'm non-deterministic. If you ask me the same question twice, I'm gonna give you different answers each time. So, you know, you, you have to take that into account for this type of infrastructure, and I think it's going to be needed, yeah, for a long time.
James Everingham: Um, I don't see a clear path to, uh, LLMs being 100% deterministic
Demetrios: Yeah, that I, I kinda threw out the window long ago [00:09:00] Yes. I don't know if that is ever gonna happen unless there's some
James Everingham: really- I don't either. I don't either ... big breakthrough. And I don't know how it can, like, you know, I'm, y- you know, it's probably almost a, a philosophical, uh, discussion at that point, but, like, you know, if you do make them 100% deterministic, you sort of also kill one of the key strengths that they are.
James Everingham: Um, so-
Demetrios: Well, and I was just talking about this with a friend on how part of the, I'm not gonna say the appeal, but we live in a subjective reality, and a lot of the pieces of the LLMs are from data that has been trained by the winners of history. So you're looking at a subjective interpretation of something that doesn't really have [00:10:00] this ground truth where you can say, "Yes, this is the way that happened."
Demetrios: A, because maybe all of the history that was written about something back in those days is written by the winners, or it just doesn't exist, and we've had to piece it together with little understanding of it. And so that, uh, makes me think like, yeah, there's not really going to be a one clear truth and deterministic way w- that you can ask LLMs and have this, like, oracle answer that's always going to give you the same, quote unquote, "right answer."
Demetrios: But that's a bit of a, a tangent that we can go down. When it, um, when it comes to policies, there was one other thing. I know that, like, policies are the, the most interesting and exciting topic in the world, but I wanted to ask you about actually having [00:11:00] policies in place so that when you have an engineering team spinning up agents, do you view policies in their workflows as, hey, you've got access at this enterprise to these different sandbox providers or these different GPU providers that you can spin up a, an agent or a series of agent workflows or workloads in these ways, so this is how it is going to be done, versus what I think a lot of folks have happening is just it's kind of like the Wild Wild West.
Demetrios: You get ahold of some agents. Maybe there's some policies on, all right, we've got some enterprise agreements in place with LM providers, but that doesn't necessarily get you all the way there, because then you have all of the other stuff around the agent that you need. And what are the policies in place for that [00:12:00] as the agent is the one taking those actions?
James Everingham: Yeah, I think so, you know, if we back, back out to the engineering teams and what we're trying to accomplish as, uh, you know, as companies, as, as an industry is, like, we're all trying to, like, move faster and more effectively. So let's start from there. Um, the way that you move faster and, uh, in, in a company is you, you make it safer for your team to be able to go and do things, and that means putting proper guardrails in.
James Everingham: So if you think about, like, I think of policy in this case almost as being enforced... My background, my, my background way back was more operating system design and languages, right? So if you think about the way an operating system is designed, you have this kernel, and the kernel is, you know, it goes through different layers where you have your device drivers and you have your applications, and you, and you basically set, you know, secure, you set APIs that sort of dictate if you can reach in or reach out.
James Everingham: And [00:13:00] if you reach in, it's very dangerous, right? But, like, if, uh, if it's reaching out to you, there's more control. So building, you know, putting a layer inside your company, um, that allows that type of separation is pretty key to moving very fast and very safely. So one way to think about that is, like, when you're defining this policy is, like, looking at your infrastructure like an abstraction not dissimilar to an operating system.
James Everingham: It's like, okay, here's my, here's my engineering, uh, team. Here's the things that, like, I want to per- I want to sort of sandbox them in. I wanna put guardrails to allow them to go do these specific things. Um, and that's much harder to do, um, when you're sort of don't have a centralized framework for doing this in a company, and that's sort of what we're seeing right now is a lot of people, you know, same pattern that you've seen pretty much with every technology, [00:14:00] uh, uh, um, change throughout history is people are just experimenting.
James Everingham: They're running things on their developer servers. Uh, we had, you know, we had, like, one, one partner we were working with that came to us and said, "You know, we had one engineer that went through our entire Anthropic budget in seven hours on their laptop, and no one knew." So, you know, these are the types of things that you are seeing without having, you know, uh, a management layer around it.
James Everingham: And so that's, that's sort of... I don't know if I answered your question, but, like, uh, I think, you know, getting back to policy is, like, you know, I, I sort of map that directly to, uh, looking at your company like a machine, like an operating system layer, and setting up the right guardrails to allow each team to go as fast as possible without worrying about breaking stuff.
Demetrios: Did you ask that company if the engineer got a raise or did they get fired? Which one was it?
James Everingham: Um, I didn't, but you're, you're, you're definitely [00:15:00] touching a nerve with a lot of these, uh, token maxing things out there. Be careful what you optimize for. I can burn a lot of tokens and, and get very little done, so
Demetrios: Yeah.
Demetrios: That's what I, I'm very good at that. Uh, as I was joking with a friend before, I'm the reason open source projects go closed source because of all those sloppy PRs that get sent from my GitHub account. Uh- Perfect ... but again, another tangent. Let's stay on track. I'm gonna keep you honest here. I also was thinking when it comes to this whole way of working and you looking at, like mentioning, "Hey, let's try and move faster here," I know that you've had some experience building the developer platform at Meta.
Demetrios: Uh, am I right?
James Everingham: That is correct.
Demetrios: They're like known for the whole move fast kind of mantra. [00:16:00] What are there, what are y- you seeing and thinking about in terms of the way that you can get developers on the train and, like, the gap between adoption and, uh, the actual productivity gains that you get from the agents?
James Everingham: So if I go back to, like, the late '80s and '90s, when I first moved to California, I was working on, uh, languages at a company called Borland, and we were sort of pioneering the IDE, you know, which was like you had separate tools for, uh, um, your compiler, for your debugger, for your editor, and you'd go back and forth, and somebody pulled it all together into one thing.
James Everingham: It was Borland with Turbo Pascal, et cetera. Right. No, no engineers thought that was a serious tool. Like, everyone was like, "This is bad. This is like, you know, real engineers aren't gonna use this, uh, integrated. You know, we're all command line." So this pattern you can sort of see with senior engineers and, and, [00:17:00] uh, and, uh, what I-- the way I describe it is, like, you have to prove the value of the tool.
James Everingham: Now, it became obvious, so, like, over time, the IDE earned the adoption of the developers. They learned that they could be more effective, but it took time. Like, you know, it's even changing behavior is hard. Uh, and, uh, even myself, you know, like, if I have my 20 years of Emacs macros, that's gonna be hard. You know, that's, I'm, I, when I was engineering, I considered myself a craftsman, and that was my tool set, and you're saying, "Throw your tool set away that you put your 10,000 hours into and try this."
James Everingham: Yeah. So fast-forwarding to now, I think that the mistake that I see is that a lot of leaders are mandating AI tool usage. Um, you want to, you want to earn the usage. You want the tools to earn the usage and by demonstrating value. So the question to me is, as a [00:18:00] leader in a company, how do you, how do you, how do you inspire your team to, uh, to use these tools in a way where, uh, it's clear what the value is?
James Everingham: I'll tell you the one way not to do it is say, "You all have to move faster and submit more PRs and generate more code," right? That's like saying, "Hey, you have this oven. Um, you're cooking at 325. Just turn that thing to 700 and cook faster." It's like, no, that's not the way this works. First off, you need to be really clear about, like, what the outcome is that you're trying to do, right?
James Everingham: One, move faster means nothing. Uh, tell me specifically what you want to move faster. Is it do you want more features? Do you want more, like, uh, experiments? Do you want... You know, get really specific around that. But the thing that I found, and even at Meta, that became really fun, uh, was that we would put, uh, challenges out there to the teams [00:19:00] that they only really could solve using these new tools, and this seemed to really, uh, resonate and get a lot of traction.
James Everingham: You know, for example, we put this, uh, challenge out to the team, can you eliminate code freeze during holidays? And, um, that ended up into something we ended up open sourcing called Diff Risk Score. And the, the team built these agents that would, uh, between your CI/CD systems and your source control, it would pull a diff, and it would use an AI analysis agent to measure the risk, understanding the engineer, the part of the code base it was written in and everything, and say, "This is either very risky or not risky."
James Everingham: It turns out a lot of the code isn't risky, so we could just let that go through. Mm-hmm. And so they were eliminating this, like, long-standing, uh, code freeze from December through January, which was, had all its own issues from [00:20:00] that. Another one was, can you make the code base onboard the engineers th- themselves and, uh, itself, and so, like, make the code base sentient.
James Everingham: And, uh- I love that ... you know, uh, one of the teams built this onboarding agent. It was amazing. So for back when I was supporting Instagram, I remember bringing in engineers and saying, like, "Hey, I want you to go work on this specific camera filter." And it's such a massive code base and steep learning curve, they'd spend a week just figuring out what files.
James Everingham: Um, you know, with this onboarding agent, you could basically say, "I wanna work on this feature," and it would say, "Oh, that's these files. Do you want a system diagram? Do you wanna know the history? You know, how, what can I tell you about this?" And that had a huge impact and, uh, and, uh, really got engineers measurably productive, uh, faster.
James Everingham: And s- so that was one thing. Now, the thing that really helped earn adoption, and it's also something that we build into our platform, we were very inspired by this [00:21:00] ourselves just when we, when we decided to build Guild, is that centralizing all of these agents in an area where engineers could go see what agents were running, what they're doing.
James Everingham: They could build on them, fork them, um, and, uh, and, and, and see that there are actually this type, these, these types of technology, uh, running in the, in the platform and seeing what they're doing. That is the thing that really, uh, really sort of supercharged the adoption inside Meta. As soon as one engineer did some very high impact thing, um, and it was published, uh, the other engineers were like, "Okay, um, I can see how I can do, I can apply that," and they could build on it.
James Everingham: And so Putting a spotlight on, like, what technology is working is another way of doing that. Sorry, that was a very long-winded answer. I can't even remember if, uh, what the question was, but I hope I answered it.
Demetrios: Well, you started out with trying to out-tangent my [00:22:00] tangent, so, uh, I think you win. But the, the funny piece is I am 100% on board with just, like, shining a light on some of the things that folks are doing.
Demetrios: It gets the creative juices flowing, and we do these Friday lunch and learn sessions where folks will just come, they'll talk about what they're doing, how they're using, whether it's the coding agents, the new features of the coding agents, or different pieces around the harness, or how they've built their own special harness, et cetera, et cetera.
Demetrios: And what I've seen is that just by me going to these, I'm not necessarily going to recreate what they're doing, but I have learned and then incorporated so much of their learnings into my own workflows that it is... It's like what you're [00:23:00] talking about. When you make it public and you showcase it, then it just gets that inspiration going, and you go, "I think I could use this in mine, too."
James Everingham: E- exactly. I mean, uh, open s- that's one of the benefits of open source. Like, my background is also open source. Back at Netscape, you know, we open sourced Mozilla and did all of that, and I think that was the thing that really, when we first open sourced it, the quality was so low that it would load and crash immediately.
James Everingham: But then that ended up being the core components for, you know, Firefox and all of the, the modern browsers, even parts of it in, in Chrome. And so, so I think that that's one way. Uh, one thing that we like is our, like, we have this, this thing called Agent Hub that basically is a centralized, and it's very, very GitHub-ish, where you can, you can e- you can look at agents.
James Everingham: Um, you can look at them across organizations. You can, you can pull them in and, uh, use them [00:24:00] in your infrastructure. And of course, you know, making them visible isn't only inspirational, it makes them more secure, right? Like- Right ... getting everyone's eyes on them, uh, you get a lot of the, the, the problems pointed out by experts pretty quickly.
James Everingham: Yeah. And then, you know, providing a safe place to run those, uh, is also key. Uh, you don't want people to just feel unsafe pulling something off the net and running it in their infrastructure.
Demetrios: Yeah.
James Everingham: You know, that, it's an interesting, uh, it's an interesting, uh, time right now. You know, we, we had a whole industry pop up, uh, for firewalls and security software for, like, intentionally bad actors trying to, like, compromise your infrastructure.
James Everingham: And now these agents are like unintentionally bad actors inside your infrastructure, and it like requires almost a new type of layer to be able to, to, to be able to, uh, protect yourself from those.
Demetrios: Yeah. It's the mutiny within.
James Everingham: Yes, exactly
Demetrios: The, there's [00:25:00] another thing that I wanted to hit on with these different products you built out at Meta-
James Everingham: Mm-hmm
Demetrios: that I find fascinating, which is the basically can I ship on Friday product, and I, I can't remember, or on holidays, whatever it was. But I have seen this pattern come about, and I'm not sure if you've been seeing this too, where folks will set up simulations, and so instead of shipping to their production code base to figure out that, oops, did something wrong there, they'll set up a simulation.
Demetrios: I think I've heard of a company that's doing this. There's like a... I was just reading about it. It's like Vera or something, and they simulate the whole cloud environment so that, all right, cool, you've got your stuff that you're ready to push. Push it, let's see what happens, and you can hopefully catch mistakes before they happen.
James Everingham: Yep. Yeah, I think that that, [00:26:00] that, that's, um, that's pretty important. Like, you know, at Meta we were pretty advanced with that. We wouldn't only run simulations. We could, at the, at the lowest network levels, we could capture, like, packet level data and then replay that actual packet level data back into the infrastructure at scale.
James Everingham: So you could actually not just simulate, but, like, replay, like, actual traffic- Oh, nice ... on a system. And, you know, uh, simulations are challenging, you know, because once again, humans are n- non-deterministic, and they're always gonna do something outside your simulation boundaries. So you still need a protection layer regardless.
James Everingham: You know, some of the modern, more like, um, uh, uh, production engineering techniques, CI/CD sort of prevents some of that. You know, put lots of little changes out instead of one big one. Um, and you get, you know, theoretically small breakage versus big breakage. Um, but yeah, that actually all needs reinvented now.
James Everingham: Agents are very different. They're different than [00:27:00] code. They're, um, they're, they're different than anything that we've done, like even, uh, even the way that they're accessing code if, or writing code. You think about it, like source control itself was designed for humans collaborating. Well, now you have agents writing a lot more code, collaborating.
James Everingham: What does that look like? That's a very different system. And searching a
Demetrios: lot more. Yeah.
James Everingham: Yeah. You need model info. You need to, you need to record, like, identity and, like, model info, and map it to evals and do all of these things that are, like, really different than you would see in a traditional engineering org.
Demetrios: I think I just was reading something about how Lovable can't use GitHub because they're creating too many repos per second or per hour that it just maxes out the capabilities of GitHub. Yep. Yeah, well- And on top of all the fun stuff that's been happening with GitHub lately, uh, I can imagine it's like, [00:28:00] wow, there's these random edge cases that you hit when you start realizing the bursty nature of the agent workloads, because it goes and it does something, and it will spin up and it needs all of this information, and it paralyzes the shit out of a job, and it will go and do it, and then it, it might just sit around waiting for a human approval for hours.
Demetrios: You know, if you go out to lunch and you think that the agent's working and then you forget, like, oops. Or you shut down the laptop, whatever it may be. And so it's very, like, off/on type of thing, and when it's on, it is turbo on
James Everingham: Yeah, I think that, like, one thing that we saw, you know, at, at Meta, you know, there's not a lot of companies with that level of scale, and including in the developer support.
James Everingham: Like, we had to write our own, like... And no source code, uh, source control system would handle Meta's code base. It's a monorepo, billions [00:29:00] of lines of code. Like, it won't... We even had to write a custom file system because it needed non-demand file systems running on your laptop to pull down parts of it, 'cause it wouldn't fit on a so- on a drive.
Demetrios: Yeah.
James Everingham: So, you know, that was with humans writing code at their speed. Now you're gonna get, um, AI writing a lot more code, so you're gonna hit those limits much faster. So I think a lot of these, uh, version control systems that people are even trying to retrofit to handle, like, new AI use cases, they're g- they're gonna break pretty quickly.
James Everingham: So that's, uh, definitely something to keep an eye on is how that evolves.
Demetrios: Well, where else do you see the agent workloads changing things? Because we mentioned, hey, the read writes, the search, the ability to just spin up sandboxes or to simulate things, whatever it may be, it feels like there's a lot of areas, and also the parallelization is another piece of it, because now [00:30:00] we're, when we're searching, or especially searching the web, we can parallelize these searches, and a human normally will search, click on a link, read, "Is this what I want?
Demetrios: Is this not?" Maybe they take it, save it for later, then go back, click on another link. An agent can ingest so much more information in a search if it's out on the web. If it's searching the code bases, it's just grepping around all over the place and it's doing much smaller type searches. And so I feel like the shape of almost every workload that we're used to y- and it's, if you're thinking about, like, databases, that too, there's all of these different pieces.
Demetrios: I don't know if you've pondered any of that, or, like, where else you're seeing the differences.
James Everingham: Yeah, I... Well, let's just ponder it right now. Yeah. So, uh, I think that, you know, one, one way my thinking has evolved around this is that, like, a lot of people are [00:31:00] thinking of agents as a single, a single unit that goes and does some big workflow optimization or I know if I were hiring a person, I want a person to go own this complete thing.
James Everingham: And, uh, I think that's a wrong way to look at agents. Um, I look at them more now as, like, capability, and I think engineering's gonna evolve to be more, uh, capability architecture, right? So, and let me give you a practical example here. Like, even on our platform, we have, uh, two agents. Well, actually, we have more.
James Everingham: We have, like, eight agents that work together, right? Like, one of them looks at our, our logs on our servers, and if it finds, uh... It does it hourly, and if it looks, if it finds an error, it automatically creates an issue for it, and then that causes another agent to wake up when that issue gets submitted, and it goes into a planning agent, and the planning agent comes up with a, a [00:32:00] plan on fixing and remediating it.
James Everingham: It also ends up in a coding agent to write a fix and a, and a testing agent that builds a test for it. So, you know, if you built one agent that did all of that, that would be pretty hard to even troubleshoot. If something goes wrong, like, how do you even find out if it just goes into a black box and comes out?
Demetrios: Mm-hmm.
James Everingham: But, um, the way that these systems work is you get a workflow and you throw a bunch of capability at it, and each agent should be specialized at that task, and you probably want them trained differently. And they may even need to use different models because, like, you know, uh, some frontier models are better at coding, um, and others are better at other things.
James Everingham: So, like, you probably want some control over that as well. It reminds me of the pattern of like mainframes back in the day. Like, uh, before, before, uh, modern data centers with racks of servers were on, everyone's trying to just build a bigger computer and like how do we, you [00:33:00] know, the Cray XMP, like, "Oh, we, we need more terminals.
James Everingham: Let's build a bigger computer." Turns out microservices and like racked servers were a much better solution, right? You could, you could build things that would scale predictably. You could troubleshoot them. You could do it, and that, and I think we're gonna see that same pattern apply to agents here. And, uh, and, uh, so more Legos, agents are more like Legos than complete machines, and, uh- Mm-hmm
James Everingham: engineering is gonna be putting those Legos together well and, uh, making sure that they're, they're working and making sure that they are producing the right results, which is the hardest part.
Demetrios: Well, we had a, a conversation with folks at iFood, which is this food delivery app in Brazil, and they were talking about how they're hitting this point now because they have so many agents in play that there's like [00:34:00] the shared memory or the shared capabilities between these different specialized agents they're trying to figure out.
Demetrios: M- A lot of times a user will speak to the food delivery agent for problems with their order, but this food delivery agent has been specialized in serving up the best recommendations that you might want for your food. And so as a user, you don't really think, "Oh, I want to go to this specific customer service agent to file my complaint.
Demetrios: I just wanna go directly to the app and give my complaint in whatever text field there is," and then on the back end you gotta figure out how to make that work, right? That's the user experience that feels the most normal to all of us because we're lazy and we don't wanna click through things. But that means that for the actual product, [00:35:00] you have to have this knowledge that, okay, maybe the food delivery agent It needs to know that your last experience was kind of bad, so we can offer you a discount on this experience, or we can do something so you have that shared intelligence and that shared memory across agents.
James Everingham: Yeah, yeah, for sure. And I-- there's two levels of that, actually. You know, there's one, like how do you safely give agents access to tools or databases that they may be able to retrieve data about, um, your business or your customer and make decisions on that. And then there is in-memory context, right? There is like, uh, okay, I wanna hand this session off to another agent, and it needs to know, like, what context this was operating in.
James Everingham: And the example you used, maybe there's one generic agent and they're trying to reuse this agent to, uh, in different flows, and [00:36:00] it's, doesn't really understand the context, right? And, uh, another reason you need a centralized platform like Guild, right, is that, you know, you can take that same agent, an engineer would go, "Oh, I see the problem."
James Everingham: They can just fork it or build on it. They can set up a different context with that workspace and say, "Here's the context. It actually inherits from this one." Um, context often disappears if an agent and its runtime disappear, so you need long, you might need long-running runtimes that stick around for, you know, a long period of time until a whole customer issue is closed so it doesn't throw away that context.
James Everingham: We're all figuring this out. It's, uh, it's great times. These are, these are... While there, while there are new problems that we're finding in AI and agentic, they're actually not new problems that we haven't solved before. So we'll, the big, the biggest issue right now is, like, how will we standardize this?
James Everingham: Like, [00:37:00] I'm not worried about, like, how, uh, I would transfer context, uh, from one agent to an agent in my own company. I'm worried about, like, if I pull in infrastructure or agents from another company, how do I transfer context to them so they can do their, their job well?
Demetrios: What are things that you think about in that use case?
Demetrios: Because I know there's obviously MCP, and you think about it like, oh, well, we will hit their MCP server, and that's how we get the proper context we need. Yeah,
James Everingham: I think, I think that's w- that's one way, although, you know, uh, we're finding that, like, MCP needs extended to be able to have, uh, you know, better, uh, better, better input and output for these types of use cases and, and maybe it will, or maybe another standard will come up.
James Everingham: But I think that one way is that you, you, you, you need to agree on what well-defined input and output is with your customers, right? [00:38:00] Like, here's, here's what to expect. Here's how we transfer it. Maybe that ends up in some new standards, like, um, you know, uh, around context. I can imagine a markup language that's, like, clearly defined that, um, results in context, uh, transfer between agents.
James Everingham: Uh- Uh-huh ... so I think that we just need to agree to what the protocol, what the standard is, and then if we can do that, you saw, like, what, what MCP did. Like, once MCP standard was out there, suddenly a lot more tools were useful to AI because people could do that. And but it's not gonna stop there. Like, you know, you need much richer context, uh, standards in order to enable this for an industry and for people to start productizing agents.
Demetrios: And when you say richer context, are you saying, like, being able to send voice notes in context or photos and that type of thing? Or-
James Everingham: Well, I think that, like, context [00:39:00] to me, uh, is probably a lot of metadata, and, um, I'm not going to make any decisions of what the actual data is, but I wanna make a agreement with my customers on what the structure of that data is.
James Everingham: So I might want to, you know, say, "This is, this is my... You know, this field is all customer data. This field is all, you know, uh, business data. Here's call history." Like, how do you set up, how do you define that in a structured way, um, uh, so that, uh, another system can take it and, and translate it into something usable?
James Everingham: So more the format, you know, than, um, uh, than, than the actual data itself you wanna agree on.
Demetrios: Yeah. Because JSON isn't good enough.
James Everingham: May- maybe it is. I don't know.
Demetrios: May- maybe, maybe not. Yeah, and I like hear the idea of customers [00:40:00] can be internal stakeholders or external stakeholders.
James Everingham: Yeah. Yeah, sure, and they can actually even be, w- uh, at, at Meta, we, in the develop- developer infrastructure teams, we identified, uh, a third customer, which were our servers.
James Everingham: It's like, oh, the servers are consuming our technology, so, like, what do they need? Well, you know, they need CPU efficiency. They need memory efficiency. Uh-huh. So you also need to think about your infrastructure itself as a customer, and now AI even is turning up the dial on that. You need to even think about your AI agents as customers, what do they need?
Demetrios: Yeah, and that goes back to the whole idea of how their workloads are different,
James Everingham: and- Their workloads are different. They need managed differently. They need different mechanisms for transferring knowledge, for orchestrating, for, um... You need, you need to be able to, to, uh, uh, observe them differently. Like, you know, [00:41:00] uh, you need session history.
James Everingham: What does session history for an agent mean? Like, you know, you need rich logs. I need to be able to tell, like, what did this agent, especially in a regulated company, it's like, what did this agent do at 2:00 PM three years ago? You know, how are you gonna answer that question? You need, you need this type of technology in order to make it work.
Demetrios: Yeah, I also have seen a few patterns where folks are committing their chat history with their PRs so that you can al- you can audit that too.
James Everingham: Yeah. And even if you take that to an extreme, which we did at Meta, is like getting back to why source control needs to be different, is that you have a combination of humans and, uh, AI authoring code together.
James Everingham: So when that PR gets submitted, how do you know what code was written by a human and what was written by [00:42:00] an LLM, or even influenced by an LLM? So that requires more than line-level tracking that most, um, uh, version control systems will do. They'll diff, they'll do a diff between lines. You need, um, you need, uh, character-level tracking.
James Everingham: I need to be able to, uh, have providence and look back and see where each character came from in order to measure this effectively, in order to train your models to get better at it. Like, you need to know what they did right, what they did wrong, what the human did. So all of this stuff needs reinvented right now.
Demetrios: What is the most interesting thing on your mind these days? Yeah, what gets you the most excited?
James Everingham: Sure. Well, you know, I think that one of the things that I'm, like, incredibly excited about right now is I feel we're, like, on the edge of, uh, of, of this agentic technology really exploding, right? Like, our vision is that couple of years [00:43:00] from now, if not sooner, uh, inside an enterprise where you traditionally had a bunch of applications or web apps and, uh, internal systems that you operate your business with, like, that's gonna get pushed down a bit, and there's going to just be thousands of agents optimizing workflows, uh, everything from software development life cycle all the way up to HR legal, right?
James Everingham: Like, and, um, I think one of the things that is going to be really cool is, like, when we start seeing shareable agents in the public. Um, that's the way these, uh, technologies from pattern matching to the past have really taken off. If you think of the App Store with Apple, like when, when people could start seeing what was useful and start building on it, suddenly that ecosystem exploded.
James Everingham: Um, so I'm really excited about that happening with agents. I do think, like, agents, to some degree, are the new applications, [00:44:00] and that, you know, once the right centralized place where people can share, build on these, um, do it in a wide public arena, uh, comes about, I'm super excited about that. That's what our agent hub is, and, um, it's, uh...
James Everingham: I, I couldn't be more excited to watch that, um, watch that evolve over time.
Demetrios: So I've heard about this marketplace for agents a few times. Actually, my friend Dan Jeffries, back when ChatGPT first came out, that was his big thing, and he went and started a company around it. And then later he came back to me and he said the problem that he found was everybody's individual workflows are so unique and so custom that you couldn't, at that time at least, throw an agent up.
Demetrios: Like, what he was doing back in those days was, oh, this newsletter agent, where it would go and it would summarize Hacker News and [00:45:00] summarize Reddit and then create a newsletter and then send it out to his followers And that was his way of doing it because he used a certain newsletter distribution platform.
Demetrios: He would also scour these subreddits and blah, blah, blah. But nobody wanted to replicate that exactly. And so how do you see that happening, e- especially inside the enterprise, right? You're going to have lots of very bespoke agents that are gonna be just like one or two dials off from what I want. Maybe it's just gonna be so good that you can just prompt it, say, "I want this agent," or you duplicate, you clone the agent, and then you say, "But except for that last part, change it XYZ."
Demetrios: I could see that happening maybe.
James Everingham: Yeah. Well, I have two answers to that. One, I, I see it the same way, and, you know, that's, that's one of the things that we [00:46:00] also saw inside Meta and that we're providing in Agent Hub is here's agents that are working, here's what they're doing. You can fork them, copy them, you can reuse them, pull them privately into your own infrastructure, rebuild them, republish them out.
James Everingham: Um- Yeah ... but also going back to the point of you wanna break down your agents and think of them more like Legos rather than, um, uh, one completed, uh, piece of architecture. So with the example that you used, I would think about that as like if you broke that up into five or six agents and each one had a different capability, how are those composed?
James Everingham: What's the orchestration? So, uh, this ends up looking a lot more like distributed software systems and, and microservices than one large, one large thing. So- Oh ... I think that what we'll see is like not just agents, um, and, you know, even Agent Hub, uh, that we're pushing out has this, is that like you wanna look at a template or a [00:47:00] project of how 10 agents are, and the integrations that they need, how they're all attached, and how they work together.
James Everingham: And then, you know, you may just wanna replace one agent or give one agent a little bit different context or completely rewrite, uh, several of them. So in order to do that, though, you know, I was a lazy programmer. I would steal open source. I would learn from it, and by the time like, uh, um, I got my task completed, I either learned enough to rewrite it or to, to augment the code that I needed to get it done.
James Everingham: And, uh, I don't think that pattern is, is, is, is, is going away anytime soon, and I think that's the thing that will allow these, these more complicated things to happen, is decomposition, um, composable, composable workflows from, from, uh, small, specialized, and, and generic agents, just like we do in regular software.
Demetrios: I [00:48:00] guess the only question that I have on it is why- Do you need a hub, and why are the agents not just able to do that themselves?
James Everingham: You mean like why can't the agents, um, rewrite themselves?
Demetrios: Yeah, exactly. Like why wouldn't an agent just on the fly be able to compose what it needs to compose, and then when it needs to, it no longer needs to have that?
Demetrios: I, I understand the idea of, all right, well, if we made something and it works, then let's keep that because it works, and so you don't need to have something that may or may not work when the agent creates it on its own, I guess would be my, my own answer to my question.
James Everingham: Well, I think that, like, we may get there, but, like, there's a couple of things standing in our way.
James Everingham: One is, um, hallucinate, [00:49:00] hallucination and non-determinate, determinism. So these things need right now set up by, uh, humans that are exercising judgment. And right now, uh, judgment is not one of the strengths of an LLM. Um, they're very goal-oriented. They will do whatever it takes to achieve their task, be it right or wrong.
James Everingham: And so, so you, you still need that. On agents, writing agents, that's actually a really cool area, and our platform, uh, actually we do that. So, um, uh, even in our, even in our agent crea- uh, our, you know, we're not really an agent creation system, we're an agent management system. Uh-huh. But, like, when you're setting up one of these workflows, we even have an agent builder agent that, like, you, you know, if it, if you look for something to achieve a task, it says, "Hey, uh, we don't have that agent.
James Everingham: You want me to build that for you?" And, uh, it will work with you to build. Um, but that's, so that's one, that's just one step away from sort of what you [00:50:00] said.
Demetrios: Yeah.
James Everingham: But you still need, you know, if you think about it, you still need evals. You need to, you need to, like, set up to guarantee that that flow is working the way as expected.
James Everingham: And so this is probably the future of engineering as, uh- Capability architecture, uh, evals, applying judgment, applying taste, um, uh, to all of these flows. Um, it's higher value work, so, like, and if, uh, and if we're all successful at it, it'll even create more of it. So, uh, but that, the nature of this work is changing in that direction.
Demetrios: And when do you see the need for a full-blown, quote-unquote, agent, whatever that means, versus, oh, I just need to write a skills or a skill?
James Everingham: Yeah, I think there's a, there's a few thing. When you say skill, I think about a, um, I think about a model of dependent thing, like Claude Skills, right? So now, as a, [00:51:00] uh, if you wanna maintain, like, uh, flexibility, you probably don't want to build your architecture, your infrastructure around one model right now, because, hey, what happens...
James Everingham: Or maybe here's another question. If, if, uh, if you build around Anthropic and then, uh, one of the other vendors or something comes out with a dramatically better model and you wanna use it, what in your stack do you have to change?
Demetrios: Yeah.
James Everingham: So, um, you know, I think that the right way to do that is to abstract that out, right?
James Everingham: And, um, uh, be able to make it so you can think of a, of a, of a model or an LLM as a runtime, um, or a, a reasoning runtime ra- and not the application itself. The problem with skills that are model-dependent is you're getting, you're getting stuck to a specific vendor, and that may not be great over time.
Demetrios: Would you then advise [00:52:00] folks to create their own harnesses also? Because it's kind of like that same argument.
James Everingham: I would not. That's, uh, I think, like, you know, and, and providing we have the same definition in our head of what a harness is, but, like, that's literally what we're building. It's unbelievably complicated to do.
Demetrios: Mm-hmm.
James Everingham: Um, that's what everyone is doing right now, though. They're all-- And some people are vibe coding their own, uh, governance layers and centralized layers, and this is, this is, you know, hard area of computer science. You know, like, um, Guild basically, you know, when you run agents, they're running in a, in a container, and it, and, and they cannot reach out.
James Everingham: They cannot escape that container. They go through a proxied infrastructure layer. If it accesses data or tries to execute something, it checks a policy. So, um, you know, it's, it's, it's, it's, it's super important to be able to have that level of control.
Demetrios: Yeah. [00:53:00] A- actually, I think Meta came out, now that I realize you saying that, another tangent, Meta came out with a SWE-bench type of thing- Yeah
Demetrios: that was very much along these lines, right? Where the agent can't get out, and you just wanna check its ability inside of the container.
James Everingham: Well, the SWE-bench, you know, we, we built, like, a lot of that internal in my team, which was, we called it Meta SWE-bench. There is a public framework called SWE-bench that, like, when Devin came out, they had, like, these 3,500, uh, Git issues that were, like, uh, GitHub issues that, like, they knew the answers for, and they'd, they'd run Devin over it and say, "Oh, we can, we can solve 70-some percent of those," and then that'd be...
James Everingham: SWE-bench, you can go and you can now publicly see the models and which one are doing better. The problem with those things are is, like, a lot of the, the, the model vendors are optimizing just for that SWE-bench. [00:54:00] And, you know, the thing that you need is you, when you're building an eval, is you need something that's representative of the tasks that you're trying to achieve.
James Everingham: And we built Meta SWE-bench because, like, the problems that we had internally were very different than you would see in the public. You know, the level of scale is sort of unimaginable. Like, I, having worked there for 10 years on this stuff, it's still, like, my head explodes when I see the sheer, like, size of the infrastructure that this stuff's running on, right?
James Everingham: And, like, you have very complicated, very complicated software, right? Like, we, we had one process called dub-dub-dub that was, like, the main, uh, uh, run, runtime for most Meta software, and it's like 500 megabyte binary with 4,000 threads. And, like, you, uh, you know, it's running all of our bespoke languages and everything, and, like, if you go and change one line of code, it can free up 10,000 servers.
James Everingham: You know? It's, like, that level. And if you, you can't [00:55:00] go vibe code in that kernel. Like, that is just not going to end well. Like, you just look at that thing funny, and it kinda, like, seizes up. So, like, you know, you, you, you have to be really careful about where you apply this stuff. And, and once again, that's why these guardrails, knowledge is important.
James Everingham: Um, l- look at the LLMs not as, like, who can I replace with? Look at it as, like, it's a strength. What is that strength, and what are the things inside my company I can throw at it, you know? It's really good at taking a lot of data and, uh, making insights and assumpt- and, uh, s- uh, making insights I might not have seen, so throw a bunch of data at it and ask it questions.
James Everingham: You know, things like that.
