MLOps Community
+00:00 GMT
Sign in or Join the community to continue

When Agents Hire Their Own Team: Inside Hypermode’s Concierge // Ryan Fox-Tyler // Agents in Production 2025

Posted Aug 06, 2025 | Views 18
# Agents in Production
# Agents hiring teams
# Hypermode
Share

speaker

avatar
Ryan Fox-Tyler
Co-founder and SVP Product/Engineering @ Hypermode

Ryan Fox-Tyler has been building Hypermode since 2023 and is currently the Senior Vice President of Products and Engineering. Prior to this, Ryan held various leadership roles at Astronomer and Manulife/John Hancock, where he successfully led teams in the development and launch of innovative customer- and partner-facing experiences. With a background in Computer Engineering and an MBA from Harvard Business School, Ryan has a proven track record of driving business agility and implementing technology-powered growth strategies.

+ Read More

SUMMARY

What happens when you empower AI agents to design, configure, and deploy other agents? At Hypermode, we put this question to the test by developing Concierge—an agent that acts as both architect and orchestrator, assembling custom agent workflows on demand. In this session, I’ll share the technical journey behind building Concierge, our “agent that builds agents,” and how it’s reshaping the way teams approach automation and task completion. Key topics will include: The architecture and design patterns enabling agent creation How Concierge leverages natural language and user intent to assemble tailored agent teams Real-world challenges: managing reliability, evaluation, and guardrails when agents are in charge Lessons learned from deploying agent-built agents in production environments The future of agentic systems: towards self-improving, self-deploying AI teams

+ Read More

TRANSCRIPT

Ryan Fox-Tyler [00:00:00]: Great to meet everyone. Like Skyler said, I know it's the end of the day so we'll try to keep this fun and interesting. My name is Ryan Fox Tyler, co founder and head of products and engineering Hyper Mode. Previously spent three years at the data orchestration company Astronomer and before that a decade a trillion dollar asset manager in the dev platform. So. So I've seen dev platforms from a lot of angles and through a lot of transitions and excited by what we're building with Hypermode, but really more excited to share some of the lessons we've learned through that. And I like to start as we think about how do people build agents and what does that actually mean in today's ecosystem and where are people spending their energy? I think about the peanut butter and jelly challenge, right. If you've not heard it before, it's this idea of you ask someone to take out a piece of paper and write down how do you make a peanut butter and jelly sandwich sandwich.

Ryan Fox-Tyler [00:00:56]: And then oftentimes it's with kids and you will reenact exactly as they've written. You'll skip all the steps that they've skipped in their writing. But I think this is true in how we describe a lot of things. It's a lot easier to describe and teach someone incrementally how to do something than it is to take out a blank sheet of paper and remember every step along the way, whether that's pulling the bread out of the package or where exactly to put the peanut butter, the jelly. These things are hard to remember up front. Humans aren't just naturally good at that, but, but through a conversation you can enunciate that really well. And we think about that a lot of how we built out hybrid mode and agent construction process and where we think the energy should be. And then there's a set of lessons that we've learned through that at a technical level to support what it means when an agent can be constructed for nearly $0.

Ryan Fox-Tyler [00:01:41]: And so we look at really this kind of loop of defining creating agents, right? You first was start define what their scope is. You connect them to the tools that they need to do their job. You train them on how to do specific pieces. And ideally you get in this loop with self learning where it just keeps getting better over time. But I think a lot of people spend too much energy at the top part of that and not enough energy at the bottom part because they're still spending so much time hand rolling each one of these pieces. What we've learned in all the work that we've done with a lot of customers is people really should be spending less time at the outset undefined and a lot more time at the training level.

Ryan Fox-Tyler [00:02:18]: That's how they're going to know what works, what doesn't, what's possible with agents today and what's frankly just not ready yet. The space is moving very quickly, what's possible in the market and the technology changes every day. And so we fundamentally believe that this should all happen in natural language.

Ryan Fox-Tyler [00:02:34]: And that is why we built concierge with this ability to actually create another agent through an agent. But it has some interesting technical patterns that emerge from that. But first, it can be hard to grasp what this looks like. Let's actually take an example and start to look at what this is. Let me switch over to hyper mode. This is our agent building experience. We think it operates best in natural language. You can eject a code at any point if you decide you want to.

Ryan Fox-Tyler [00:03:02]: But really starts here. An agent, as we know, is something that can take action. I can go to our sidekick, our personal assistant that has access to my calendar and say, can you schedule a meeting for 5pm today? Call it happy hour. It will go take that work. I don't have to give it any additional instruction. General tasks, it knows how to do and it can take action on my behalf. And so here we'll see. It's creating that Google Calendar event and is now going to be available on my calendar, that happy hour meeting.

Ryan Fox-Tyler [00:03:39]: It's really that simple. We're always surprised at how many people haven't even seen an agent that takes action like that. So we try to start people there. But the more fun is when we actually can go to our concierge and create a new agent. Let's do something a little incremental, but just to show what this looks like. Let's create a product marketer for hyper mode. And I'm not going to give a lot of information, right? This is not the peanut butter instructions where I need to go enunciate all the pieces. It's going to do the work, right? I'm going to ask the agent to come back to me and say, what information does it need to write the system prompt for my agent.

Ryan Fox-Tyler [00:04:15]: It's going to ask about whether it represents a company. I'll give it hypermote.com. it's going to go out and do research about that company, right? It's about hyper mode. Understand that? Pull it into the context. It's going to do a Better job of enunciating what Hyper mode is, what its products are, where it sits in the market than I would just coming from scratch. We will call it Launch Spark.

Ryan Fox-Tyler [00:04:41]: And.

Ryan Fox-Tyler [00:04:42]: It decided it had enough information and so it went off and created that.

Ryan Fox-Tyler [00:04:45]: I could have spent more time working with this and giving it more information, but it decided it was enough.

Ryan Fox-Tyler [00:04:50]: And so here we go. LaunchPark has been created and it should be popping up in just a second. Here's Launch Spark and it's as simple as that. We have our first agent. We can see here that it's written, it's system instruction for us. And if I scroll down it pick GPT4.1, I could swap it out for many of the other models that are available. Whether it's Gemini or Anthropics models, I've got the choice within the platform. I can give it connections.

Ryan Fox-Tyler [00:05:15]: Step one was defined. We went through that really quickly. Step two is Connect. I want to give my product marketer access to GitHub and to our product docs and referral there. It's as easy as that. I can start chatting with our product marketer and say mike check, what tools do you have access to? And here LaunchPark is going to inventory those MCP servers under the hood. It's going to connect to those and pull out those tools. Here we go.

Ryan Fox-Tyler [00:05:42]: It's got the built in tools as well as GitHub and our technical documentation with ref.

Ryan Fox-Tyler [00:05:47]: Great.

Ryan Fox-Tyler [00:05:48]: We've gotten through that connect step right? We talked about. We really want to get through those first few steps as quickly as possible. Go to train.

Ryan Fox-Tyler [00:05:55]: So let's teach it how to do something. Let's update headline for hypermot.com. can you fetch the current version? It's gonna go out, right? I can say in one context I'm gonna teach it how to do these things. I'm not upset when it doesn't get it right the first time. That's okay, right? It's a little confused, right? There's a couple of different CTAs and headlines on here, so I'll give it some direction. I'm talking about the meet your team. Can you propose three new options that are more fun?

Ryan Fox-Tyler [00:06:36]: This is a classic generation, part of the journey. This is normal, right? And so I say, cool, let's go with number three. Can you open pull request, give it some nudge. Can you find the current headline, the code base and open a pull request to make the change? I don't need to teach you how to make a pull request. It Knows that in its general knowledge, but it may run into some issues. It may have trouble looking for that current headline look. Yep. I can dive into the search.

Ryan Fox-Tyler [00:07:09]: I can see it found 134,000 results. It got a little overwhelmed with just that phrase. I can say look in Hyper Mode Inc. Hyper mode repo. Again, back to these instructions. My goal is just get it to do the task once and then we will take this and we'll turn it into a repeatable thing that my agent will have learned how to do. Looks like it's struggling, which is always fun. These systems always fail in slightly different ways.

Ryan Fox-Tyler [00:07:36]: It's just struggling with constructing this query a bit. Why don't we give it take a step back look for? And much like we wouldn't be upset with an intern that struggles, we don't get upset when the agent struggles here.

Ryan Fox-Tyler [00:07:57]: We know that when we train it how to do this once, it's going to be able to remember this.

Ryan Fox-Tyler [00:08:01]: So there we go. It found the search. It pulled the file contents. It's creating a branch. It's working on this next step. There we go. It's creating the file and it's going to open a pull request on my behalf. It is taking action on my behalf.

Ryan Fox-Tyler [00:08:16]: That's great. I could be someone that's less technical. Don't know where this is, but here I just opened a pull request that made the change to that headline. Operating as me using OAuth. All this is great. The real fun part though is when I hit this create task button, what it's going to do is going to take everything it's learned, take all those steps and turn that into a repeatable prompt.

Ryan Fox-Tyler [00:08:37]: And we think this is really that step of training our agents. It's very different than training a model. These are training agents like humans. These are standard operating procedures that we are teaching it how to do.

Ryan Fox-Tyler [00:08:48]: And so you'll know, notice that it has struggled finding the file. It said, nope, it's actually right here. This is the object that I'm going to update. These are the other pieces I hit Save task. And now when I started a new piece here, my agent knows how to do that.

Ryan Fox-Tyler [00:09:02]: So that's really the notion of getting into that loop as quickly as possible, training those agents. Let's jump back over here. So there's other patterns to this, right? It may feel like a little bit of a toy. Okay. I can let anyone create an agent in a few minutes, but there's other patterns for this dynamic creation of agents that you are going to see in real use today and emerging into new products. The first is about spawning new agent instances. This isn't creating a new agent identity, this is creating a new agent context to work on a subpar of the problem.

Ryan Fox-Tyler [00:09:32]: So in this case we're working with a company on a fraud detection agent. They may subdivide the problem to say, okay, agent A is going to go start to investigate one dimension of fraud and agent B another dimension. Maybe agency is going to be looking at the time it's taking to resolve this and looking at kind of the SLOs that they're required to operate on and all going to roll back up to that. By maintaining those separate agent instances and that dynamic nature of it, you're able to actually constrain context to, to better manage the scalability of this system and to parallelize the work in a way that a single model can't.

Ryan Fox-Tyler [00:10:04]: And so this becomes a very natural pattern for dynamic creation of agents. The other pattern we're starting to see is this idea of a conductor agent and not as a conductor working across existing agents, but a conductor that's dynamically creating an agent team.

Ryan Fox-Tyler [00:10:18]: If you say I want to create a blog post with all the editing of it, but also the creative for it, you could have a conductor that spins up four different agents separate context so they don't get confused or biased based on the original context of any one message. But also the ability to use separate models.

Ryan Fox-Tyler [00:10:36]: We may use O3 for the writer agent and cloud for opus, which is really good at editing for as an editor agent and an image generation piece as their creative agent.

Ryan Fox-Tyler [00:10:45]: That ability to dynamically create this and roll that back up is really close. And the things that we're doing and we think this becomes a natural derivative of that early work that we did to get concierge out, to put it in people's hands to be able to easily create new agents. But that's only half the story.

Ryan Fox-Tyler [00:11:02]: Being able to create agents quickly creates a lot of other challenges.

Ryan Fox-Tyler [00:11:06]: If I think back to my days running dev platform at a large financial institution, the goal really isn't to say how quickly can I spin something up, it's how do I manage and operate that. And what we found is the easier the path to production was, the more we could give leeway on that experimentation and that dynamic nature of it. What we did when we started, we're starting out hyper mode, is we built an actor based runtime that scales this at a technical level. The Actor system is not new. It's been around for a very long time as a computational model to isolate resources and to have defined interfaces between them. But the thing that's really interesting to me about what it does here is that last point on this feedback from my specialized agent is that ability to create new actors. And that gets really cool. And where we're going with agents and their ability to spawn new sub agents or new partner agents in their work is that kind of notion of this as an actor model, right? That runtime allows us to deliver a lot more features to our end users.

Ryan Fox-Tyler [00:12:07]: And so when we think about, okay, what are the benefits of that, why this is different than a library based approach to this. That actor system creates a few really core things that are important for anyone. In my mind that's thinking about scaling agents. One is that concurrency, that initial kind of just raw scale of I have an agent that is doing great work, I want to scale it to 100,000 plus parallel agent instances. And you may be thinking about, okay, that's a lot of kind of parallel chats. Maybe I don't have that many users. But this isn't just about chats, right? This is about agents that are taking action, that are processing new inputs, whether they're connected to an event based trigger or a schedule to kind of managing those tasks that we started training in on how to do. This is how we start to go to asynchronous work with our agents.

Ryan Fox-Tyler [00:12:54]: It also means that when something fails, we have false isolation and recovery. It's a core tenet of any real cloud service that I think we've lost in a lot of the work. And kind of reinventing that around agents is saying, how do I make sure that one crash doesn't impact others? How do I have isolation of my architecture from day one? And it's not something that I build out later. The third piece is really this event driven design. This is what unlocks not just that synchronous chat that we all know as kind of working with LLMs, but also that ability for asynchronous task execution, for things like queuing up where needed, for dependencies to be invoked. That becomes really a key part of a resilient system. And that's another element of the agent or the actor based or agent runtime here. And the last part is this clustered persistence run, right? You want to make sure that you have persistence for long running tasks especially, but you want to make sure that you don't have that overhead and you want to offload that into the cluster so that you have high availability.

Ryan Fox-Tyler [00:13:53]: We know the cloud is not going to have stable VMs all the time, that they're going to rotate that you need to build for those failure cases that is expected in cloud native design. And so that really becomes a key part of how we start to design agents for those underlying infrastructure primitives is making sure the runtime of absorbs some of that risk. And if we double click into one specific area of this. One of the things that gets me really excited about the actor model is this notion of passivation. And what passivation means fundamentally is that when an actor is not doing work, it's going to release resources back to the cluster. Well, this is really important because as much as we think about chat as a synchronous thing, it's actually a lot of small asynchronous events and maybe there's some longer spurts where it's doing work for you or doing research, but there's actually a lot of idle time in between that. And you don't want at the application level to have to start to worry about that idle time and manage that underlying infrastructure. What you really want is a system.

Ryan Fox-Tyler [00:14:50]: Instead of being always on compute, you want it to have these serverless aspects of it without the overhead.

Ryan Fox-Tyler [00:14:56]: The ability that you could be serverless in between chats, but you have 2 to 3 milliseconds of overhead as it scales back up to take the request and come back down. And that underlying thread state means that you're not starting from zero. You're not having to re retrieve all this information or store it into a slow data store off to the side. You really are thinking about that as part of that inherent actor that it has a state that as it shuts down and maybe it resumes on a different node.

Ryan Fox-Tyler [00:15:23]: These are common behaviors in the cloud. It's able to maintain that state. These are all things that we had to think about when we said, okay, what if we weren't talking about agents as ones and twos, but 1000 and 100 thousands of them? How would you start to think about that problem differently? Right? Because that is the future that we're heading towards. And we're seeing very quickly that as we reduce the cost of creating that agent, then the next question becomes, if 1 out of 100 of them is successful, how do I make sure I scale that? So if I come back to this original loop, this is where we want to see this happen, right? We want to spend a little bit of time on define and connect. You can always go back and revise those. But we want our humans to be spending a lot more time on the train and the self learn.

Ryan Fox-Tyler [00:16:07]: And that train also means that like you want the subject matter experts doing that training. In the fraud scenario, you want the person that's been studying fraud for 15 or 20 years training that agent on how to do it, not a data scientist. The data scientist should be thinking about how do I get the right data in there? How do I understand different practices that are going to emerge. Much like ML Ops over the years, right? MLOps is really focused on how do I build the platform for other experts to take some of these decisions and start to run some experiments. This is no different. If we want to think about the differentiation within agents, it's that train and self learn aspect, right? It's how do you make sure that you're feeding it good context and making sure that they want to make the right decisions at the right time. It's not defining agents and spending weeks doing all the plumbing. And that's the thing that we learn most.

Ryan Fox-Tyler [00:16:55]: We hope that others kind of carry on with that because I think we're really excited to see what happens with that. So the thing I'll leave you with as we look at it all is say if you really optimize the cost to experiment to be near zero and the path to production is simple. Both have to be true. If you have one without the other, there's a lot of friction that starts to emerge. Whether you have a great idea that you can't get to production or you have such a strong production story that you don't actually have a low cost to experiment. But when you really combine those two, I've seen over and over again in my career and the people that I've worked with, I'm always surprised at what they're able to build. And so that's really the core learnings for us as we at Hypermotus. We build the concierge, we built that actor based agent runtime.

Ryan Fox-Tyler [00:17:39]: It is open source. It's a framework called modis where we encourage other people to use it as well. We think it's an important primitive and we're happy to share that with everyone else and collaborate on it. So feel free to reach out if this is an area of interest. We always love talking about these kind of problems and how people think about whether lowering that cost of experimentation or increasing the scale of agents in production. So with that I'll open it up for if there's any Questions.

Skylar Payne [00:18:06]: Awesome, thanks. This was again, information dense. This was absolutely amazing. I feel like I have so many questions. I don't know that we have time for all of them. But folks, you can see on the screen there's some ways to reach out. Definitely take advantage of those. One of the things I was really curious about in creating a new agent, I often think about a lot of the context you would want to build that agent.

Skylar Payne [00:18:39]: And just curious, what are the different ways that hyper mode allows you to provide that context? I often have files, for instance, that have a bunch of ideas I've been working on. I'm just like, hey, sort through this and create something around that. Are there ways for me to like link my knowledge into that?

Ryan Fox-Tyler [00:18:55]: Yeah, for sure. Right. So one of the things we have at hypermona edition of Modis is the open source Graph database called DGraph, which is commonly used for structuring that context.

Ryan Fox-Tyler [00:19:06]: So we work with really large scale companies that are thinking about how do I scale this across my organization with context or you have companies that are working really small context. Both are valid.

Ryan Fox-Tyler [00:19:15]: You can do some really powerful things with just a little bit of context. And so we really focused on a tool based injection of that. We think that is the way to have it live and fresh at all times. If you upload a file that gets out of date really quickly. So we'd rather focus on how do we pull that from its inherent source. That could be a note taking app. It doesn't have to be these big enterprise systems. But yeah, that is everything we think about the differentiation here.

Ryan Fox-Tyler [00:19:43]: It's like what context am I bringing to the story and how do I train it on things that I think are unique in the market. And so we try to make that really easy for people to get up and running in a few minutes.

Skylar Payne [00:19:53]: Totally. One thing, as you were going through the actor model and how you apply it, one thing that struck me is I feel like I've heard opposite takes and so I'd love for you to just kind of like give your thoughts on this. But in particular you discussed how you know each actor and in this sense each agent is independent and they only communicate with message passing. So you don't have this shared state amongst them. And at the same time I feel like there's a lot of people who in the industry talking about, hey, we should have agents that are constantly like contributing to a shared state. So how do you think about that? Do you feel like they're different? These are different ideas. Do you feel like they're in opposition.

Ryan Fox-Tyler [00:20:38]: Yeah, I think it's really about separating these layers of context.

Ryan Fox-Tyler [00:20:42]: And so I think about state as like the most real time context that if there's anything that matters of that you're going to extract it pretty quickly.

Ryan Fox-Tyler [00:20:50]: And then you kind of have short term and long term memories. Right. Much like a human does. And so I think what's really important when you think about this is like how do you have repeatability within a large scale system and if every actor is constantly contributing to that short term memory, then it can make it really difficult to understand how the system's going to behave from one day to the next. And that's really important for large organizations, especially regulated ones ones is to have some of that predictability. And so by having separate layers of this, you can choose how you want to govern those pieces.

Ryan Fox-Tyler [00:21:21]: Some organizations we work with say look, I want my agents live contributing to long term context and consuming from it. And so that becomes the interface. And that could be real time, but that can still be strict APIs and messages around that.

Ryan Fox-Tyler [00:21:35]: Using the actor model still or somebody going to say, look, I wanted to contribute to a offline store where I can then review it. I can decide to upload those new memories into it. I can test and eval everything again before I publish that. It's a longer iteration loop, but it allows them to do it with control to have repeatability to screen through those memories to see how users may be trying to pollute it or attack through the memory. That is a real concern. I think it's really important that you have clear segregations within your architecture so you can make distinct choices of how you want to plug that system together.

Skylar Payne [00:22:11]: Totally, totally makes sense. I think we're just about out of time, so there's info there to reach out. I guess I'll leave with two. Two notes or comments. First is I'm super excited about Hyper mode. I was telling the guys in the chat, do I want to work at Hyper mode? Just the demos were awesome. Blew me away. Loved it.

Skylar Payne [00:22:32]: Cool. Awesome. Thank you so much for coming. Thank you for sharing. I'm super excited to see what's next with Hyper mode. And with that folks, we're gonna say bye bye to Ryan. Bye.

+ Read More
Sign in or Join the community
MLOps Community
Create an account
Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
Comments (0)
Popular
avatar


Watch More

Challenges of Working with Voice AI Agents // Panel // AI in Production 2025
Posted Mar 14, 2025 | Views 520
# Voice AI
# AI Agents
Eval Driven Development: Best Practices and Pitfalls When Building with AI // Raza Habib & Brianna Connelly// AI in Production 2025
Posted Mar 13, 2025 | Views 398
# AI Development
# RAG
# LLM
# HumanLoop
Unlocking AI Agents: Fixing Authorization to Get Real Work Done // Sam Partee // AI in Production 2025
Posted Mar 14, 2025 | Views 465
# AI Agents
# AI Applications
# Arcade.dev