MLOps Community
Sign in or Join the community to continue

How Sierra AI Does Context Engineering

Posted Dec 10, 2025 | Views 53
# AI Systems
# Agent Simulations
# AI Voice Agent
Share

Speakers

user's Avatar
Zack Reneau-Wedeen
Head of Product @ Sierra

Zack has been at Sierra for almost two years, first as an Agent PM with some of Sierra's earliest design partners, then helping build out the Voice platform, and now as Head of Product. Before Sierra, he led the product and design teams at CoinTracker and spent 7 years at Google working on Search, where he was the founding PM for Google Lens and Google Podcasts. Zack studied Computer Science and Economics at Yale and holds an MBA from Harvard.

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Sierra’s Zack Reneau-Wedeen claims we’re building AI all wrong and that “context engineering,” not bigger models, is where the real breakthroughs will come from. In this episode, he and Demetrios Brinkmann unpack why AI behaves more like a moody coworker than traditional software, why testing it with real-world chaos (noise, accents, abuse, even bad mics) matters, and how Sierra’s simulations and model “constellations” aim to fix the industry’s reliability problems. They even argue that decision trees are dead replaced by goals, guardrails, and speculative execution tricks that make voice AI actually usable. Plus: how Sierra trains grads to become product-engineering hybrids, and why obsessing over customers might be the only way AI agents stop disappointing everyone.

+ Read More

TRANSCRIPT

Zack Reneau-Wedeen [00:00:00]: It is useful to understand energy levels and think about when light hits atoms, how electrons will change energy levels, but it is more accurate to think about the electron cloud. And so that what you sort of modeled with the web is probably more accurate. But decision trees and flow diagrams are still potentially useful constructs to think about how an agent should behave.

Zack Reneau-Wedeen [00:00:33]: The difference about AI is that instead of the software being super cheap, super fast, super reliable, it's actually quite expensive to run. Kind of slow, it gets slower and slower the better the reasoning models get, and non deterministic. So it can be creative, it can be flexible, but it can also hallucinate. And so the methodology that you need at each of those different junctures in this flywheel that you're trying to build is pretty different from with traditional software.

Demetrios Brinkmann [00:01:03]: So it's like traditional software does this. Let's just take the opposite of all of those.

Zack Reneau-Wedeen [00:01:08]: A good example is like unit testing, right? So with unit tests, you run them once. If they pass, you're good with simulations, which is what we call our flagship testing product in Sierra. We might simulate a given conversation 5, 10, 15 times, and then you'll look at what data changes happened as a result of the conversation, which is a similar methodology to what we publish with TAO Bench, which is the open source framework for eval, but we'll also do using an LLM as a judge or a third agent. So you kind of have the user agent, the agent agent, and then the evaluator agent. The evaluator agent will say, did this conversation meet this checklist? And that's pretty different from unit tests or even integration tests in traditional software.

Demetrios Brinkmann [00:01:57]: Yeah. Do you even put that in the CICD pipeline?

Zack Reneau-Wedeen [00:02:00]: You do? Yes. So we have the concept of what we call critical simulations, which are simulations that run when you actually will merge a commit to main or schedule a release or what have you.

Demetrios Brinkmann [00:02:13]: Feels like that could be really slow.

Zack Reneau-Wedeen [00:02:15]: It could be. We have a high degree of parallelism. So if you have 300 simulations that you need to run, you're not waiting for them to run one after the other, they're going off and being executed in parallel.

Demetrios Brinkmann [00:02:27]: And when you say simulation, are you only talking about it for voice or are you talking about it for everything?

Zack Reneau-Wedeen [00:02:33]: It's interesting you asked that question with voice first. I feel like most of the time I get that question, it's are you only thinking about it for chat? But it is both. So it's actually pretty interesting to listen to voice simulations. The way that they work is we'll have a library of background noise, a library of voices, a library of all these sort of different entropic things that you can introduce to try to simulate what it would actually be like to get a phone call. And our customers are all over the world in all different environments, in all different industries. Exactly. So you have people calling in from the car. SiriusXM is one of our larger customers.

Zack Reneau-Wedeen [00:03:12]: And people will call in because their radio is not working and they're in the car or by the, you know, on a busy street or something. And so you have the background noise, you have different accents, you have different microphone quality. Not everyone's talking into this, you know, magical podcast mic. And so you have to deal with all that. And our simulations have kind of gone through that punch list, you know, row by row and said, okay, we're adding this to the product. And then if you run it 5, 10, 15 times over 3, 400 simulations, you have a pretty good idea that if those are all passing, your agent is doing what you would expect it to do.

Demetrios Brinkmann [00:03:47]: Do you also look at simulations a little bit like red teaming?

Zack Reneau-Wedeen [00:03:52]: Definitely. So there's, I don't know if I would say there's two classes of simulations, but there are definitely some where it's just the happy path, oh, I need to reset my radio because the encryption keys expired. And so an API call goes out and a satellite beams down the new encryption keys and you can listen to sports talk again or Howard Stern or whoever you like to listen to. But then there's also the more adversarial simulations where you're trying to, we call it abuse the agent. And so we have a number of custom models that are just about detecting abuse and those are often customer specific as well. Where some customers, they might have younger audience and things like self harm and bullying are top of mind, Some customers might have different audience and things like hacking into the system are top of mind. So you have to be pretty customer specific in how you think of it. The nice thing is this is a very valuable problem for our customers and so they're deeply engaged with us.

Zack Reneau-Wedeen [00:04:54]: And we've gone over the last six months or so making the entire product fully self serve so that it's self explanatory. You know, you can get up to speed. And while it is this deep thought process, it's kind of up to you on the spectrum of how much you want to take that into your own hands, how much you want to work with Sierra's agent development team.

Demetrios Brinkmann [00:05:13]: How are you creating these nefarious simulations?

Zack Reneau-Wedeen [00:05:17]: There are three agents Involved. There's the user agent, there's the agent itself, the AI agent, although they're all AI, and then there's the LLM as a judge. And so the user agent basically can take a system prompt as part of the simulation, but you don't have to.

Demetrios Brinkmann [00:05:33]: Massage it to get it to be, like, really bad.

Zack Reneau-Wedeen [00:05:37]: So I think to get it to be really bad, you would.

Demetrios Brinkmann [00:05:41]: Yeah.

Zack Reneau-Wedeen [00:05:42]: And typically we won't have the really bad simulations as part of the run. You can also, because you want to.

Demetrios Brinkmann [00:05:52]: Catch that before it's out or trying to get to production.

Zack Reneau-Wedeen [00:05:55]: I think the honest reason is that typically the list of really bad stuff, you just want to make it so not everyone has to look at that every couple minutes. Also, I don't think that AI models are that good at imagining some of the terrible things that people say, probably because the model providers have gotten out in front of that. That being said, we also have the concept of verbatim simulations where you can just hard code a script of what the user agent should say when the AI agent says X. And so for those ones where you really want to test against some of this behavior that you definitely want to prevent against or prompt hijacking or these kinds of things, you basically will do a verbatim simulation instead of an AI user AI simulation. Does that make sense?

Demetrios Brinkmann [00:06:51]: Yeah, the really bad stuff. I'm not necessarily thinking really bad in what humans are capable of.

Zack Reneau-Wedeen [00:06:57]: Yeah.

Demetrios Brinkmann [00:06:57]: I'm just thinking really bad of, like, as a prompt hijacking type of thing, where I'm just trying to get some data out of the model. If I know that I'm now dealing with an AI agent, let's see what I can get from it in that way. Maybe it's naive. I feel like the.

Demetrios Brinkmann [00:07:19]: Models that are out there today are not that good at trying to, like, hijack themselves.

Zack Reneau-Wedeen [00:07:24]: There's two things that come into play here as well. One is that the actual access to sensitive information is always safeguarded in a deterministic fashion. So we'll have more traditional unit tests that will make sure that the access controls to a system are no different from what you would have if you were on the website, logged in as a user. As an example. We don't leave it up to the model in any case, whether to expose sensitive information. We leave that up to deterministic systems.

Demetrios Brinkmann [00:07:57]: How does that work? Like the connection between access and model.

Zack Reneau-Wedeen [00:08:03]: So often, let's say you go to.

Zack Reneau-Wedeen [00:08:06]: I'll think like Sonos, for example, and you're logged in. The fact that you're logged in means that you have access to a set of data in Sonos database, maybe the speakers that you've connected to the networks in your home. And the AI model also has access to that same information. So you would never be able to use the model to get information that isn't already available to you as a logged in user on the website.

Demetrios Brinkmann [00:08:33]: Yeah. So it's almost like as you log in, then the model just has this sandbox of your world.

Zack Reneau-Wedeen [00:08:41]: Correct.

Demetrios Brinkmann [00:08:42]: And then if it needs to go and do something because you have a complaint, then it like gets out of your world or it.

Zack Reneau-Wedeen [00:08:48]: Well, if you have a complaint, that would be something you could do in your world as well. So the models, they're, they're taking action all the time, but they're not taking actions that you wouldn't have permission to do as a user.

Demetrios Brinkmann [00:08:58]: Yeah. And so then you're just protecting on the permission side of things, like, and that doesn't seem like it's like that's a tried and true thing that we've been doing in software and that didn't necessarily flip like you were talking about back with that cycle thing. It's not one of those slow, non deterministic. If you can make sure that the model doesn't have access to it, it just gets access to what it gets, then whether or not it gives the information is where things get wonky.

Zack Reneau-Wedeen [00:09:27]: That's exactly right. And you said earlier like, oh, so you're just kind of doing the exact opposite of the software development lifecycle. And I kind of just took it in a different direction because that's actually not true. A lot of the times you need to do the exact same thing that's tried and true from traditional software development. Sometimes you need to do the opposite and sometimes you need more of a hybrid approach. So it's more of a first principles analysis of each of the different steps of kind of plan, build, test release, optimize, analyze, plan build, test release, etc. Etc. But just kind of from the first principles of what's changed using AI systems along with deterministic systems, as opposed to only deterministic systems.

Demetrios Brinkmann [00:10:12]: I wonder how different the simulations are for voice and for text or if they are. I know that of course with voice you got to add in all these special things like you were saying, like noise, background noise. Also I, I've talked at length with people about conversational design. You can't let somebody know that the model's going off and doing something right, or you don't want to have Somebody thinking that somebody hung up, the agent hung up because the model is off and doing something. So you want to give like cues and all of that stuff. So I imagine that's one type of simulation and that you're only going to find in voice.

Zack Reneau-Wedeen [00:10:53]: Right.

Demetrios Brinkmann [00:10:53]: And then in text, you have different types of simulation. And how do you see those two being different?

Zack Reneau-Wedeen [00:10:59]: It's a really good question and one that's very close to my heart.

Demetrios Brinkmann [00:11:02]: So.

Zack Reneau-Wedeen [00:11:02]: Because my personal story at Sierra was I spent the first nine months building agents with customers. So I mentioned Sirius XM because they're one of our larger customers that we're proud to work with, but also because I helped build the very first version of their agent. Then SiriusXM, as large as they are over chat, they're even larger over voice. So just seeing that having it beat us over the head of hey, Voice is actually the huge opportunity here.

Zack Reneau-Wedeen [00:11:31]: Led me and a couple others from the team to start prototyping what would it look like to build out a voice version of their agent? And so we iterated on that and eventually kind of moved up the hierarchy of needs from just being able to go back and forth to actually responding to verbal cues and interruptions. One example is an interruption is not actually a super objective thing. If you start talking right now, you might be agreeing with me, encouraging me, interrupting me, sort of telling me to go in a little bit of a different direction. But most automated systems today, if you say something, they will just stop and start over, no matter what. You said.

Demetrios Brinkmann [00:12:16]: So painful too, as the user.

Zack Reneau-Wedeen [00:12:18]: Yes, yes. And so it actually feels like they're talking over you because you're like, you go, no, I go. And I don't know if everyone's had this where you're walking down the hall and someone's walking towards you and you're like, who's going to move? And that's what it feels like all.

Demetrios Brinkmann [00:12:32]: The time, dancing in the hallway. Yeah.

Zack Reneau-Wedeen [00:12:34]: And like, obviously once you get so deep in the matrix, you notice everything. So as an example, the two of us were just talking at the same time, but it wasn't a problem. Yeah. Another example is.

Zack Reneau-Wedeen [00:12:47]: If I pause for a very long time, like I just did in the middle of a sentence, you know, to wait for me. But if I'm done speaking, you know that it's your turn to speak. And so context is so important. Both, was this an interruption and is it my turn to speak? And you can't just boil it down to a number of milliseconds. You need to actually Think about what's being said and start planning your response before the other person's done talking.

Demetrios Brinkmann [00:13:13]: Everyone almost has their cadence. Their way that they talk is unique to them. And so I may give longer pauses and I may speak slower and I may not want to finish, but then actually I do finish. Or things like I speak weird.

Zack Reneau-Wedeen [00:13:29]: Right, right.

Demetrios Brinkmann [00:13:29]: And we all kind of have our own quirks when we speak.

Demetrios Brinkmann [00:13:35]: And you can't have a universal setting of. All right, well, the model is going to be like this. The model almost has to adapt to each individual way of speaking. But I haven't seen anywhere. And maybe you've messed around with this where, where you can have the model interpret the first whatever, 10 seconds and go, oh, okay. This person takes long pauses in the middle of sentences or this person speaks a little bit slower. And so I'm going to give them more time between words.

Zack Reneau-Wedeen [00:14:03]: Right. It's a really good point. At the very high level. You can do some of this for customer specific. As an example, you know, if a customer has mostly older callers, you might wait longer. If a customer has a lot of authentication flows, you know that it takes a while to talk through your email or your password or these kinds of things. So some of it is shared. That being said, I think to do it in a personalized manner like humans do, you probably will have the most success with some of the speech to speech models where the audio tokens themselves are the things that are being inferred upon.

Zack Reneau-Wedeen [00:14:44]: And so that actually all kind of gets boiled down into the training data. That being said, right now, most of those systems for most production environments are just a little bit too hallucinatory. Our approach has been we built a modular architecture. So on the transcription side and on the synthesis side, and by the way, even if you're using the speech to speech model, you need transcription because you can't make an API call with a bunch of voice tokens. Transcription is part of the pipeline no matter what. But we do have support for models like.

Zack Reneau-Wedeen [00:15:20]: The model that underlies the real time API, the GPT audio models, as well as the Gemini audio models, which are these kind of fully speech to speech models. And I think that's going to be the future. It's just being honest with you right now. The amount of hallucination that we've seen in our production tests is such that for the larger customers, the text to speech is still the most reliable way.

Demetrios Brinkmann [00:15:48]: Yeah, I've heard that from just about everybody I've talked to where it's like, we want it with that speech language model. But we can't have it because it's just, I can't put that into production with any amount of confidence.

Zack Reneau-Wedeen [00:16:00]: Yeah. And I would go a step further than we want it, which is, you know, that I believe in it. I think if you and I were passing notes across the table right now, it would be a much less interesting and productive conversation. And that's basically what AI agents are doing. So we're willing to make the long term investment in our architecture where we support that out of the box today, even though it's not quite ready. And we're following the OpenAI in particular, just released a new real time audio model. And every time one of those things happens, we, you know, test it out, figure out are there places where we could use this. And obviously there's a risk tolerance and risk likelihood spectrum for each of our customers.

Zack Reneau-Wedeen [00:16:41]: So I think we will find that for certain languages, for certain customers, for certain use cases, this is going to be an absolutely magical experience. And then over time it will kind of disrupt the text to speech approach. At least that's my personal belief.

Demetrios Brinkmann [00:16:55]: Do you have a pipeline that you'll run models through as soon as they come out just to see like. Because obviously when you look at benchmarks, it's just a bunch of bullshit, right?

Zack Reneau-Wedeen [00:17:06]: We do. Some benchmarks are really cool as well. Like we love our TAO benchmark. It's really cool when the Math Olympiad is won by an AI. But when it comes to putting systems in production for our customers, I think what you said is roughly accurate.

Zack Reneau-Wedeen [00:17:22]: We do immediately when a model comes out, we have a whole suite of evals that will run, but it's usually an iterative process because one model might be better if you prompt it in a different way or if you few shot it in a different way or even if you fine tune it in a different way. So it's not, you have a good instinct immediately, but it's often a multiple days, multiple weeks process to know if the ceiling of the new model is higher than the ceiling, you know, the local maximum that you've climbed to on the previous model. So it's, it's. And then I would say also for voice, language is a huge deal as well as like dialect and locale. So Brazilian Portuguese being different from Portugal Portuguese. And so you have to, we basically have a rubric and we have, you know, basically callers that we work with who can call in and test out each model.

Demetrios Brinkmann [00:18:15]: Oh, nice.

Zack Reneau-Wedeen [00:18:15]: You know, because you need people that actually speak the language that can be in realistic scenarios or else you're going to most likely overfit to some data or not be representative.

Demetrios Brinkmann [00:18:26]: Yeah. My buddy was telling me how he was doing a bunch of simulations on the OpenAI real time, and one funny thing that he found was when you speak English with an Italian accent or no, when you spoke in Spanish with an Italian accent.

Zack Reneau-Wedeen [00:18:43]: Yeah.

Demetrios Brinkmann [00:18:44]: It would switch to this, like, made up language that was like a mix between Spanish and Italian.

Zack Reneau-Wedeen [00:18:50]: Oh, wow.

Demetrios Brinkmann [00:18:50]: And when you spoke German, when you spoke English with a German accent, it would just switch to German and speak to you in German. And so it was like, what is going on speaking English? You know, I want it to be in English, but it keeps switching to German.

Zack Reneau-Wedeen [00:19:05]: Yeah.

Demetrios Brinkmann [00:19:06]: And at the end of the day you're like, but is it the end of the world? Because if the German accent is there, that probably implies they speak German. And that's what I was. Why not just speak into in German?

Zack Reneau-Wedeen [00:19:18]: Exactly. It's like, like, I don't know. Do you speak any other languages besides English? Okay, so if you go to a restaurant, let's say, and someone greets you in English, but they have an accent that suggests they speak Spanish or Portuguese, you do you kind of feel that urge of like, oh, I want to practice my Spanish, or like, you know, could we have a discussion in another language? Wouldn't that be fun? And I guess that's sort of what's going on in the model.

Demetrios Brinkmann [00:19:42]: Like, oh, yeah, I get to speak.

Demetrios Brinkmann [00:19:47]: And on the simulation, I know that our mutual friend Willem has told me about how he uses heat maps to see where the agents are good and where they're bad. And this is like on the task completion side of things.

Zack Reneau-Wedeen [00:20:01]: Yeah.

Demetrios Brinkmann [00:20:02]: Have you tried to do any of that with like. Because that feels very orthogonal to simulation in a way. It's like you, you simulate things and you see where it's good, where it's bad. But then if it's an agent, maybe it's like the next step is, can it do this? Can it actually, like complete the tasks that we ask it to do?

Zack Reneau-Wedeen [00:20:19]: It's a really important piece of it is not just what did I do wrong? But how do I do it right next time? And I think without a clear understanding of where you went wrong, which I think may be what you're getting at with the heat map, you can't really understand what the fix would be. So to give a few examples, sometimes the agent will go in the wrong direction because it doesn't have the right answer. Sometimes the standard operating Procedure is wrong. Sometimes it was asked to reason about too many things at the same time. And so you want to know those things at the point where there's an error so that you can fix it most quickly. And in simple cases the agent can fix it itself. So one example is a number of conversations from Sierra. It's remarkably low at this point with a lot of our agents, but a number of conversations end up getting transferred to a human, especially if there's something that you need to do that just requires oversight or regulated industries, etc.

Zack Reneau-Wedeen [00:21:17]: In those cases though, you can actually learn from what the human did post transfer and detect. Is there anything missing in my standard operating procedure? Is there anything missing in my knowledge base and have AI author draft article of hey, you should, maybe you should add this knowledge. We had 150 different agents all mention this thing, but the AI agent doesn't seem to know about it yet. And sometimes it's like no, no, I don't want the AI agent to know about that. And sometimes it's like oh yeah, that was missing. And it's remarkable how often it's the latter because a lot of the companies we work with, one of the things we hear a lot on our discovery calls is like, oh, I don't know if I'm ready because my knowledge base isn't in a good place. And so our response to that and one of the reasons that we built this product is, well, actually the best way to get your knowledge base in a good place is to launch an AI agent and just make sure that it's scoped appropriately. But you'll actually find the edges of what it knows.

Zack Reneau-Wedeen [00:22:17]: And our system, the Sierra platform will create basically a prioritized list of the knowledge that your agent should have but doesn't have or where it's a little bit off or implying the wrong thing.

Demetrios Brinkmann [00:22:29]: It reminds me of the continuous integration where you're seeing where's the edge case. Okay, cool. Now let's integrate that back into the knowledge base so that hopefully it's not an edge case next time.

Zack Reneau-Wedeen [00:22:40]: Yes. And I think there's two other things here. I know I'm doing a lot of twos. There's two other things here that are very, very important. One is it's a really important business problem for customers. So when you create the system that allows you to kind of pull in the right direction and just create that upward spiral of improvement, they're going to pull on it. So we have customers where they have 10 plus full time people evaluating conversations and Looking for opportunities for improvement because it's such an important business problem. And then the other piece, I know it's not a business podcast, but I think it's kind of geeky too is we have this outcome based pricing model where Sierra only gets paid when the agent does the job well.

Zack Reneau-Wedeen [00:23:25]: And so you have the incentive incentives on the customer side and you have the incentives on the Sierra side. And so as sort of a systems thinker, it's this very, very fun environment where if you create the system, the business incentives on both sides will just pull the agent in an upward spiral. And I think that's accountable for a surprising amount of our success. It's just that the incentives are so aligned and the table is set to improve the agent.

Demetrios Brinkmann [00:23:53]: I always love the idea of outcome based pricing, but feel like it is insanely dangerous.

Zack Reneau-Wedeen [00:24:01]: Why do you say that?

Demetrios Brinkmann [00:24:03]: Because you. It's on you. And if your agent up and then you don't get the outcome, you pay money. Like you eat that price of how many?

Zack Reneau-Wedeen [00:24:14]: Oh, dangerous for us.

Demetrios Brinkmann [00:24:15]: Yeah, exactly. Like there's a potential world where you don't get the outcome, but you end up spending money. Obviously it's like we're talking sense here, potential, but at gigantic scale and you don't recognize there is a problem, soon enough you can burn through some cash.

Zack Reneau-Wedeen [00:24:33]: I think though, these are two correlated events in a way, which is it kind of turns a lagging indicator into a leading indicator. We'd actually rather know that sooner rather than later that our agent isn't performing well. We'd rather feel the pain of lighting money on fire so that we have to fix it. And then when it comes time to renew a contract a year from now, we actually might feel better about our relationship with our customers. So I think our NPS with our customers is much higher because of this, because we actually find out about issues and we have to fix them because we're feeling the pain of like a hole in our pocket in the moment instead of 12 months from now where they're like, hey, yeah, it wasn't performing as well as we thought.

Demetrios Brinkmann [00:25:15]: Yeah, that's a great point. You're not going to get that conversation as often because you're going to be charging them when they have successful cases.

Zack Reneau-Wedeen [00:25:25]: Right.

Demetrios Brinkmann [00:25:26]: I could see that.

Zack Reneau-Wedeen [00:25:27]: Right.

Demetrios Brinkmann [00:25:27]: One thing that I've heard folks talk about a lot is that getting the. And especially in the customer success lane, getting it working for one or two.

Demetrios Brinkmann [00:25:43]: People when they're coming in and complaining or doing something is not, I'm not going to say Easy, because it's not easy, but it is doable. And then where it really gets hard is when you have to service a company that has millions of users and then the customer support is trying to service all of these tickets at scale. Have you. How do you think about like that? And is that right? Because I've heard it from folks. I want to like fact check it real fast and then see how you think about it.

Zack Reneau-Wedeen [00:26:16]: And you're talking about the AI agents or human agents. Yeah, because we, we definitely see that on the human side where, you know, it's not a coincidence that so many of our customers are large consumer companies where they have millions of customers. I think, I think it's true that most of our customers have millions of their own customers. Most of our customers have over a billion dollars in revenue. So that kind of stands to reason. If they're a consumer customer, they probably have millions. And they see this scale such that the power of bringing in AI just changes the nature of their relationship with their customers. Where previously maybe they had a strong incentive to hide their phone number on some backwater page of their website, now they can plaster it on the front.

Demetrios Brinkmann [00:26:59]: That's so funny, offering that company.

Zack Reneau-Wedeen [00:27:03]: Yeah. Or you're like trying to click through.

Demetrios Brinkmann [00:27:05]: Do I contact these guys?

Zack Reneau-Wedeen [00:27:07]: Yeah, yeah. It's like a meme at this point. And so I think when you think about, you know, the, the AI agent as kind of the analogy to the Internet website or the mobile app, it's also about the actual addressable market of contacts with a business growing, because now you can provide a great experience to anyone who calls in. I think though, what you said about seeing just a different distribution when you get scale, the stat that's stuck with me a lot. I spent the first seven years of my career at Google and to this day, something like 20% of all Google searches have never been made before.

Zack Reneau-Wedeen [00:27:50]: That's because different things are happening every day. New language changes, the things people are curious about change, et cetera. And so you need a system that's actually like resilient, because the same thing is true for our businesses. The reasons people are chatting in are going to change each day. You know, if someone is doing an interview in the SiriusXM studio, you're gonna get different questions from a day where they had an outage or something like that. And so each day there's just different questions and you have to build a system that's truly resilient. And one of the things that I look back on Google really fondly about is Just the amount of invention occurring in the company. You'd go to the all hands, which is called tgif, and you would hear, oh, we use machine learning to improve the way that cooling works in the data center and it's saving us a billion dollars.

Zack Reneau-Wedeen [00:28:46]: It's a lot of money. For an algorithm. Or.

Zack Reneau-Wedeen [00:28:50]: Before the latest iterations of large language models, we are understanding search queries better. We created this BERT classifier or analyzer, which can tell us a lot more about the actual semantic meaning of a search query, not just the keywords that are in it. Or our data centers, they run 20 degrees hotter than any other data centers because actually machines failing isn't a big deal for us because we have failover different from everyone else. There's this sense of invention and building things and creating differentiation at the technical level, not just at the product and marketing level. And Sierra is the first experience I've had since then where I feel like, because AI is moving so fast and because we just have some of the most genius thinkers in the world and builders in the world, you have that feeling of we're inventing new things. And I think the. What we call our agent SDK, which is the software layer that our agents are built on, both the developer ergonomics and tooling and release cycle and what we call the agent development lifecycle put into a product. But also the software constructs and the agent architecture itself are just novel and inventive and flexible.

Zack Reneau-Wedeen [00:30:05]: I think it's what I said. I definitely feel a lot of just admiration for the team, but it's also being out on the experience curve ahead of some other companies. I think just building with customers from the beginning, starting early in 2023, less than six months after ChatGPT came out. And just getting reps with customers means that the abstractions that we have are a little bit less leaky, a little bit more robust for all the different things that could come in. At least that's my theory. And then I think we have the right. Sorry to ramble a little bit here, but I think we have the right layers of the stack where you have the platform, but then you also have the flexibility of agent development. So most of our agents at this point are being built in no code.

Zack Reneau-Wedeen [00:30:53]: Many of them are being built by our customers directly. But you still have that ability to express anything you need by dropping down into code and kind of just interoperating between the two. And I think that's where you want to be working with the customers. We work with where their business leaders might have the best understanding of the goal we're trying to achieve, the metrics we care about, the standard operating procedures. But then you still need to connect to APIs. You need your engineering team to get involved. Sometimes you need to drop down and do something with not your chef's knife, but your paring knife. And so this platform was just thoughtfully architected by being so customer obsessed over the last two and a half years that what we have as a reward, as a system that can handle scale in that way.

Demetrios Brinkmann [00:31:40]: Thinking about three years ago, when you started versus now, and how much you've had to reinvent or try and create new, or try and like test out, and it's perpetual R and D, in.

Zack Reneau-Wedeen [00:31:54]: A way, I think that's exactly right. And if there's one thing that I feel like has helped me, you know, have more good ideas than I otherwise would have, it's spending those first nine months at the company, you know, working as an agent, product manager, and agent engineer, and just being so close to customers, feeling their pain, you know, having to tell them directly face to face when the agent did something they weren't expecting. Getting to celebrate and go out to dinner when it, you know, hit some business target.

Demetrios Brinkmann [00:32:24]: Yeah, nice.

Zack Reneau-Wedeen [00:32:25]: Those, those experiences are kind of, you know, in my neurons in a way that I feel like is just. I'm just so grateful for. And I think anyone building a startup, the. The more you can kind of feel what your customers are feeling, the. The more integrated your decisions will be.

Demetrios Brinkmann [00:32:41]: Huge win. I remember a conversation I was having with a buddy of mine who was saying, look, man, we have found the biggest lift on our AI systems from just gathering the whole floor of the company together, no matter what their job role is, and having a labeling party and getting a bunch of pizzas. And we just get together and we look at the conversations that we've been having with the AI agent, the agent and the user, like real conversations. And everyone just labels across like five metrics. Did it do this? Did it do this? What? What would you give it as a score? And by the end of it, the business side of the house understands the AI so much better. And they can say like, oh, yeah, okay, I see what's going on here. And then you catch all these anomalies that you probably, if it was only engineers looking at it, they would have never caught because it would seem like, yeah, that seems fine. But then you have this subject matter expert looking at it and they're going, why? What? No, we can't say this.

Demetrios Brinkmann [00:33:46]: We shouldn't be saying this, this doesn't make sense.

Zack Reneau-Wedeen [00:33:48]: Or.

Demetrios Brinkmann [00:33:49]: Or sometimes it's just like, this is stupid. No.

Zack Reneau-Wedeen [00:33:51]: Right.

Demetrios Brinkmann [00:33:52]: So I think about that and enabling the low code. No code is almost like another way of doing that same thing.

Zack Reneau-Wedeen [00:34:02]: Yeah, it's exactly right. And I think what we've seen working with hundreds of customers now is when we bring together the business side and the technical side. I wouldn't say it's the only way you can get results, but it's certainly a heck of a lot easier. And it's the only way you can get really, like the best results because these two stakeholders are so involved. And we've done that enough times now where we have some of those repeatable processes to bring everyone in the room and make sure that all of the different viewpoints are represented. The other thing, this is kind of like a pinch me moment from working at Sierra, but I don't know if you're familiar with the book Unreasonable Hospitality.

Demetrios Brinkmann [00:34:39]: Yeah. Why have I heard of that, though? I think.

Demetrios Brinkmann [00:34:43]: Why? Yeah, why. Why do I know that? It's not a. It's not a Greek guy that wrote it, is it?

Zack Reneau-Wedeen [00:34:48]: His name's Will Guidera. I'm not sure at all his actual origins. He's from New York. He ran a restaurant called eleven Madison park for a while, which Rose. Yeah, they got, you know, they became the number one restaurant in the world. Is the spoiler alert, like the five.

Demetrios Brinkmann [00:35:05]: Michelin star, where it's like you could only get three.

Demetrios Brinkmann [00:35:09]: How do you have five?

Zack Reneau-Wedeen [00:35:10]: Right, right. And it's, you know, this beautiful ballroom environment in New York, but was kind of a pretty good middle of the road bistro for a while. And he just, along with the team there, took it to the next level in terms of service and hospitality. And he's one of our advisors at Sierra and helps a lot with how we think about tone and language. And so it popped into my head because what you're saying of like, get the whole floor together and have an eval party or labeling party. We've also actually gotten to do that with him where he'll come in to the office and spend an hour just looking at conversations with us and we say, what would a three Michelin star restaurant do in this scenario? Of course, sometimes they would do things that are not within the budget of our customers, but sometimes they would do things that are totally free. It allows us to think about aspirationally, how can we make an agent where it feels like it understands you the way the person, the host who greets you when you go to 11 Madison park or one of the best restaurants in the world and that kind of feeling of service, and I think that's one of the reasons why it's fun to work on is yes, there's the part of, you know, companies are hiding their phone number and we want to just bring it up to a baseline. But then there's also that part of how can we actually make the relationship that you have with your businesses that, you know, good and great and you know, something that feels like you're going to a Michelin star restaurant?

Demetrios Brinkmann [00:36:34]: Yeah. Do you remember any of the things that he said like, oh, we could try this or maybe they would be that were the free ones not right.

Zack Reneau-Wedeen [00:36:44]: Not going out flying you dinner. Yeah, I do, I do. I remember it very vividly because one of the things that stood out kind of to your point actually is that even though he at that point hadn't had as much experience with our technology, he had such an intuition of what a good answer would be. And so I had this picture of I understand each part of the stack and that helps me throughout the stack. But then I put him in this other category of just like savant at the one thing that matters. And so we were reading through a number of voice conversations and listening to them and hearing for different tone and awkwardness and.

Zack Reneau-Wedeen [00:37:25]: The ability to kind of just connect and drop the barrier. And what actually stood out to me is there was one message in a long, I think it was like a 45 minute conversation. And there was one message where the agent sounded really empathetic and like clearly connected with the user. And we were going through each and being hypercritical. And I think on that one it was less of the I'm so sorry that this is frustrating and more like, man, this sounds really tough.

Zack Reneau-Wedeen [00:37:58]: We've been working through this and we can't figure it out yet. And it was a very difficult technical troubleshooting for a basically banking or financial services application. So not an easy problem to solve. But what really stood out was that ability to just connect and be like, I hear you, I see where I see you. Like, let's try and figure this out.

Demetrios Brinkmann [00:38:21]: Yeah.

Zack Reneau-Wedeen [00:38:22]: And the, you know, I would say I would give us, you know, maybe a constructively like a B minus on the particular conversation that, that we were reviewing. But in that particular message it was like, that's what we want. And I think someone like Will and some of the people that have really good taste, which is a one of the few like appreciating assets, you Know, in the labor market right now, maybe.

Zack Reneau-Wedeen [00:38:47]: That really good taste often comes out when you're like, not necessarily criticizing all the bad things, but being like, that's what we want. And so I was really impressed with his ability to kind of put his finger on those moments, those bright spots that we want to scale.

Demetrios Brinkmann [00:39:02]: So he tells you that, and then what do you do? How do you incorporate a lot of that in. In a way that it's not just like. You're absolutely right.

Zack Reneau-Wedeen [00:39:10]: Right. So yes, this is where the full stack understanding becomes relevant. Again.

Demetrios Brinkmann [00:39:16]: Classic.

Zack Reneau-Wedeen [00:39:16]: There's a number of different ways you could do this. We've had success.

Zack Reneau-Wedeen [00:39:23]: Or I should say one of the ways we've had success actually in this particular case is with fine tuning of models. To phrase things in a certain way. A lot of the base foundation models have been trained primarily for chat achieving. AGI doesn't really care if it's written or spoken. I think when you look at where the Frontier Labs are going, they're actually going sometimes in the opposite direction of a low latency conversational voice experience.

Demetrios Brinkmann [00:39:53]: Yeah.

Zack Reneau-Wedeen [00:39:54]: So having a little bit of control at the model level for style can help a lot on that particular problem. Now, when we do have fine tuning, if it's with data that's in any way derived from the particular customer, then it's a customer specific model. And if it's more just, you know, general tone and style, it can be useful across customers. But you also have to balance the fact that anything that is being put in the weights could be regurgitated. And so you have to make sure that there's nothing in those weights that any of the users who would interact with those weights wouldn't be able to see. So that process is, as you can imagine, much more labor intensive. And so it's really those specific circumstances like the style of voice, where it's very specific, very high leverage, very important, where it makes sense to kind of drop down to the weights. Most of the time, I would say the solutions come at a higher level in the agent architecture or in some of the underlying task prompts or changing the model itself, those kinds of things.

Demetrios Brinkmann [00:40:57]: Yeah. It's only when Will comes in for the day and he's like, that piece right there, you're like, shit, I guess we're going to have to fine tune.

Zack Reneau-Wedeen [00:41:05]: And you want to make sure it's durable too, because, you know, if like style is something that most brands don't want to change too often too markedly. Whereas if you're fine tuning this new promotional Offer into your weights. Then two weeks later when the promotion ends, your model is out of date. So the right tool for the job is definitely one of those things we think a lot about.

Demetrios Brinkmann [00:41:27]: This is probably a good segue into the ensemble of models and how you work with that.

Zack Reneau-Wedeen [00:41:32]: So it's actually mind blowing to look at one of the conversation traces with a Sierra agent. If the CRS agent says, hello, how can I help you today? And I say something like, oh, you know my, I'll try not to use SiriusXM. You know, my groceries weren't delivered and I'm not sure where they were. You know, what's going on here between me finishing talking and the agent responding, there might be 10, 15, 20 different models that get invoked.

Zack Reneau-Wedeen [00:42:05]: Some of them might be as simple as an embedding model to do retrieval, augmented generation. Some of them are frontier models, maybe even with some reasoning tokens to make sure that we're able to handle a complex reasoning space. Some of them are fine tuned, fast but cheap classification models to just, you know, understand the task at hand. And it would be very slow to run these all in series. So sometimes there are certain models that need to return a token before the next model can start. But most of the time this is happening in parallel. And so I think what's really impressive is the ability of the agent architecture to break problems down into tasks that can be handled by different models and then selecting the right models for each of those tasks and doing it super quickly. We call that whole process kind of our constellation of models that we use to respond to a message in a way.

Zack Reneau-Wedeen [00:43:01]: So it feels really simple. But I think my Google background showing underneath the iceberg, there's just a lot going on between when you type something in or when you say something and when you get a response.

Demetrios Brinkmann [00:43:13]: Okay, so this is wild, man. Tell me more about the way that you are. So you're doing everything async. Yeah, it's basically you're gathering, it's that same input data.

Zack Reneau-Wedeen [00:43:24]: Right.

Demetrios Brinkmann [00:43:24]: But then it just spurs out and there's a bunch of different variants of it. And you do have dependencies on certain parts of this output. Like does it already know the taken path? Because I can't imagine that it would know. All right, well I. How do I. How does it know that it needs a dependency and it needs like. I don't understand this. Break it down for me more.

Zack Reneau-Wedeen [00:43:48]: So this is one of the reasons why I think voice is still an area where we're doing extremely interesting things from an engineering perspective. And where we're still significantly ahead on a product perspective, it's because latency is so important. So to give you an example, retrieval, augmented generation, like, if we need to access a knowledge base to answer a question, the agent will often make a decision to look up the knowledge before it even knows, or simultaneously while it's deciding whether it would be a good idea to use the knowledge. And so oftentimes, if you're just building something, naively, you would think, oh, should I go look that up? Okay, I'll go look that up now I'll give you the answer. But what an agent is doing, because its tokens are more parallelizable and less expensive than ours, it's saying, just in case I need to look something up, I'm going to go do that. Meanwhile, I'm going to use a smarter model to reason about whether it's even a good idea to have gone and looked that up. And then if I decide, yes, I should have looked it up, then I already have the information.

Demetrios Brinkmann [00:44:50]: But you're using that as like a tool. It's a tool call that gets invoked.

Zack Reneau-Wedeen [00:44:55]: Yes, conceptually. Okay, so the lookup tool would be running in parallel with a classifier to decide if the lookup tool was even necessary. And then you already have the information. And that's just a very small microcosm. But there's tons of speculative execution throughout the each agent turn. And then you kind of like, you know, and this is one of the reasons why, like, that was not a term I was familiar with when I started working at Sierra. But we have a number of people that have, you know, worked on building compilers and, you know, very complex systems from scratch and kind of understand these concepts of software engineering. You know, Brett, our CEO, was one of the principal authors of Tornado, which is a one of, I think the first, maybe still the most popular, like async driven Python web server.

Zack Reneau-Wedeen [00:45:43]: And so we have this, you know, 20, 30 years of software engineering history and conceptual understanding that was very new to me when I started. But it's like a kid in a candy shop for someone who's just kind of a geek.

Demetrios Brinkmann [00:45:56]: Yeah, that speculate. I haven't heard that, but it makes a lot of sense. Like it's better to have it and not need it than to need it and not have it.

Zack Reneau-Wedeen [00:46:06]: Right, right, that's exactly right. And our agents are always thinking that exact same thought.

Demetrios Brinkmann [00:46:11]: And so then this is only for voice or this is for everything.

Zack Reneau-Wedeen [00:46:15]: Well, I don't think we would have been forced to build it this Thoughtfully, if it weren't for voice. But now everything is powered by this.

Demetrios Brinkmann [00:46:23]: Yeah.

Zack Reneau-Wedeen [00:46:24]: And you know, sometimes like email, you probably don't need this. Right. And you might just want to save the money on your little rag, look up and just wait. Because a 5 second email response versus a 3 second email response is not a very big deal. But in voice, 5 seconds and you know, 1.5 seconds are very different. And so you want to make sure that you're doing everything in parallel. Yeah.

Demetrios Brinkmann [00:46:47]: Because real time is. It's always been a hard problem in ML in general. I know that when it's whether it's like real time fraud detection or real time recommender systems.

Zack Reneau-Wedeen [00:46:56]: Right.

Demetrios Brinkmann [00:46:57]: Or now real time voice, it adds this level of complexity to an already complex thing where you're talking about how complex the voice is and that like rich medium, but how we all have our own ways of talking. And now to add on to that, this whole piece of we need to make sure that we're getting the right information at the right time. Am I thinking about this correctly with decision tree type of thing like that style?

Zack Reneau-Wedeen [00:47:24]: I think conceptually, if a decision tree would be useful to a human, then it might be a useful concept to one of our agents. Similarly, if a standard operating procedure would be useful to a human, it might be a useful concept. But the ways that our agents are architected is probably more similar to how you and I would learn how to complete a task or how to help someone with a task than sort of a decision tree in the sense of traditional software and some of the legacy chatbot systems. So it's useful concept in that it models a flow that you might want to follow. But a lot of times I think our agents are more correctly described as kind of goals and guardrails, where you say.

Zack Reneau-Wedeen [00:48:11]: The goal right now is to help the agent reset their radio or help them make a payment. And you mentioned fraud detection and recommender systems as other things where real time matters because there's often money involved and click through rates matter a lot and conversion matters a lot. And that's true for a lot of voice conversations as well. For example, if you're.

Zack Reneau-Wedeen [00:48:35]: Thinking about whether you want to maintain your subscription to a streaming service or whether you'd like to cancel, and the agent can make you an offer for a 50% reduction for three months, but they take too long thinking it over, it starts to get frustrating and you're like, you know what? I just want to cancel. So these things matter a lot. And so the goal might be, hey, you want to save the customer from canceling. But some of the guardrails there might be around, the types of offers that you're allowed to provide and the experience that the customer should have, and making sure you're telling the truth and only the truth and all that stuff. And so, you know, decision tree, flow diagram, et cetera, these are all useful concepts for just thinking about operations. The one that I think is the most aligned with our architecture is probably this goals and guardrails architecture.

Demetrios Brinkmann [00:49:25]: And it does make sense. Now, the constellation metaphor, in a way, it's like a web of different models in different ways, as opposed to some straight path that you take a decision tree on.

Zack Reneau-Wedeen [00:49:38]: This is going to go, you know, to the side. I won't say I'll go over people's head with this side, but I think, like, there's kind of the model of the atom where you have energy levels for each electron and then there's the electron cloud. And it is useful to understand energy levels and think about when light hits atoms, how electrons will change energy levels. But it is more accurate to think about the electron cloud. And so that what you sort of modeled with the web is probably more accurate. But decision trees and flow diagrams are still potentially useful constructs to think about how an agent should behave.

Demetrios Brinkmann [00:50:15]: And then we go back to the heat maps. That makes a ton more sense of why those are so important. You have this web that's so intertwined and you want to know where things are going wrong and you're trying to debug that could quickly turn into a nightmare.

Zack Reneau-Wedeen [00:50:32]: That's right. And the ultimate expression of that is just the LLM itself, where.

Zack Reneau-Wedeen [00:50:39]: There'S some interesting research around model introspection. But in general, most developers don't know how LLMs made the decisions they made. So what we've had to do is build a system where you know exactly why a decision was made and you can fix it without. You can plug this hole without juice squirting out the other side. And so that's, I think, the sort of special sauce, not to extend the juice metaphor too far, is kind of being able to have both at the same time.

Demetrios Brinkmann [00:51:09]: So with no code and code, how do those two mix and match?

Zack Reneau-Wedeen [00:51:14]: So when Sierra started, we were primarily a developer platform, and most of the agents were built by employees of Sierra in collaboration with our customers. And as we've scaled, we found that a lot of these concepts, because we understand them from doing them hundreds of times, can be abstracted and actually implemented through no code tools as well. And so you know, within the last six months, I'll call it, most of our agents are now being built in no code. But it is, you know, the fancy word I've learned in the last few months is isomorphic with the code. So you can go, it's never a one way door to choose to use code or choose to use no code. What this means is someone from a customer service organization at one of our customer companies can go and build a journey themselves and then they can say, oh, it would be nice for this particular tool to have it implemented in code because it actually needs to use an API in a specific way with streaming. And I just don't want to go through the process of wiring that up in no code. It will be much more idiomatic to just have my engineer write that thing and it will integrate these folks.

Demetrios Brinkmann [00:52:26]: When you say tool, like we're talking a tool call for the agent, or we're talking like, oh, this tool, this program that I'm used to using as.

Zack Reneau-Wedeen [00:52:35]: A tool call for the agent.

Demetrios Brinkmann [00:52:36]: Okay.

Zack Reneau-Wedeen [00:52:37]: There might also be areas where most of this at this point you can set up with no code, but a situation where you might have an integration. And you already have built that integration out in code, but you want to store the metadata through environment variables, basically. So what's happened over the last six months is this has.

Zack Reneau-Wedeen [00:53:01]: Gone from being mostly something you do with our team to mostly something that our customers do on their own with the support of our team. So we still have that kind of high touch, consultative, highly accountable model because we think it's just effective. But we also offer the control and customizability that you mentioned at the beginning where it's just like, here's the product, go for it. And so we've landed on kind of the best of both worlds where you can build an agent yourself, you can use our no code platform, you can also use our code platform as well and you can use the two together. And it's kind of like the productized instantiation of what you said, which is when the customer experience org or the business Org and the technology Org are working so well together. We just tried to make that a product, having seen it work so well as a process.

Zack Reneau-Wedeen [00:53:55]: That's another thing where I think the technical depth and understanding of the code was necessary to make the no code product so robust and extensible.

Demetrios Brinkmann [00:54:05]: Yeah, because if you try to start talking to the business side of the house about like, oh yeah, our evals are failing here.

Zack Reneau-Wedeen [00:54:14]: Right.

Demetrios Brinkmann [00:54:15]: Like there's a certain level that they just don't care.

Zack Reneau-Wedeen [00:54:19]: Right.

Demetrios Brinkmann [00:54:20]: And so I imagine you're abstracting a lot of things away from that type of user with the no code part, and you're allowing the software developer, if they want to figure out like, oh, let's go write a tool, or let's figure out how to make this work in whatever way with this API, then they can do that. But for the most part you're like, yes, I just care about, like, can I have it do this? Here's the, what did you call it? Goals and guardrails. Yes, here's the goal, here's the guardrails. That's all I want to give you.

Zack Reneau-Wedeen [00:54:53]: Right? Yeah, it's exactly right. And we work with companies that have, you know, very robust change management processes and they have their own concept of releases or continuous integration and deployment. And so being able to fit into that on the code side while also fitting into the quality assurance and review, oftentimes, which involves large teams doing review, filing issues, making improvements, has been a huge accelerator for the business. Because oftentimes when we were mostly a developer platform, we had companies we would sell to where they're like, hey, we really want our operations team to own this and we love your product, but just can't get on board with that. Over the last six months, now that we have equally good product for operations and CX teams that we have for developers, we've seen it be much more of a two thumbs up from all parts of the organization.

Demetrios Brinkmann [00:55:48]: Yeah, the amount of reviews that you need to have and the amount of meetings that you have to align on. Somebody told me the other day they were like, man, launch a model with one click, right? Like one click deploy. They're like, you get laughed out of the room in the enterprise if you have that. Because the one click part is not the hard part. Like, it's all those meetings and all that alignment and all of the reviews with the regular or the lawyers and the everybody that you have to talk with beforehand before you get to the click, that's the hard part.

Zack Reneau-Wedeen [00:56:22]: Right. And the business side as well usually has the outcome in mind that we're trying to drive here. And so having them on board, integrating with their change management processes is actually a huge accelerator for the deployment in a lot of cases where you can say, oh, things are moving so slowly, it's taking us months and months, but when you're actually on board with them, you get that same force but in the other direction. And so we've had customers like, you know, one of the 10 largest businesses in the United States. We went from the, you know, basically first meeting with them to live in about three months.

Demetrios Brinkmann [00:57:00]: Wow.

Zack Reneau-Wedeen [00:57:02]: Wow.

Demetrios Brinkmann [00:57:04]: I think that's a good place to end it.

Zack Reneau-Wedeen [00:57:06]: Okay, Demetrios, thanks so much for having me.

Demetrios Brinkmann [00:57:09]: Yeah, this has been awesome. I really appreciate it.

Zack Reneau-Wedeen [00:57:31]: One thing I didn't have on the list is our. We just launched, we just announced year two of our APX program, which is our program for new grads. Kind of combines engineering, product management and entrepreneurship. That's pretty fun. So I helped kind of conceive of that program and get it off the ground. And we have five people in the first year. They're amazing. So we're doubling down on it for.

Demetrios Brinkmann [00:57:59]: The second year and they stayed at the company. So yeah, break down with the company.

Zack Reneau-Wedeen [00:58:02]: They're like three months in and it's a one year program.

Demetrios Brinkmann [00:58:06]: Okay.

Zack Reneau-Wedeen [00:58:06]: Uh, we had it as 18 months and then things were moving so fast that we were just like, okay, this can only be 12 months.

Demetrios Brinkmann [00:58:14]: No way, dude. Wait, so this is somebody that got kind of gets to like parachute into meetings and is now.

Zack Reneau-Wedeen [00:58:22]: No, they're like a full time. The way it works is six months of product management, six months of engineering. Half the APX start on one, half start on the other and then they kind of flip. And you're working directly with customers, building agents, and then you have kind of some very large customers that you're working with where you're filling that role, as well as some smaller customers where you're kind of the end to end entrepreneur doing all the coding, customer management, product management. And it's a really good group, I think because it's so multidisciplinary. It attracted a lot of people that want to be founders and that are just awesome.

Demetrios Brinkmann [00:58:58]: Yeah. And you get the chops, you get to learn while you're doing it.

Zack Reneau-Wedeen [00:59:01]: Yeah.

Demetrios Brinkmann [00:59:02]: How do you think of that?

Zack Reneau-Wedeen [00:59:05]: So our two co founders, Brett and Clay, were in the first and third class of Google's APM program back in like 2002 or 2003.

Demetrios Brinkmann [00:59:14]: Yeah. Didn't Brett create something with Gmail or some. Some.

Zack Reneau-Wedeen [00:59:19]: I think his co founder created Gmail at a previous startup and he co created Google Maps.

Demetrios Brinkmann [00:59:25]: That's it. Yeah, Maps. He was the one with Maps. Right.

Zack Reneau-Wedeen [00:59:28]: There's the. He's told it on enough podcasts that I probably don't need to because everyone asks him, but basically rewrote the whole front end, which was the, I guess complicated part in a weekend after lots of different explorations. And I think is proud of the you know, reduction in the binary size that they got out of that.

Demetrios Brinkmann [00:59:49]: Nice. So then you saw that they did it and you were like, why don't we do that?

Zack Reneau-Wedeen [00:59:53]: So it's so Bread and Clay were class one and class three, and just remember it really fondly. And then I actually started my career at Google too, about 10 years later, also in the same program.

Demetrios Brinkmann [01:00:05]: Oh, really?

Zack Reneau-Wedeen [01:00:06]: And we have 10 or 11 people in Sierra who started as APMs at Google. So we had a lot of appreciation for that program, a lot of excitement. But then we also wanted to kind of reimagine it. In the era of AI, where you can be a bit more of a generalist, you want to be super technical, but you also want to put your product hat on and think. And we thought about what are some of the most successful people in these roles. It's future entrepreneurs that are going to go start their own company 2, 4, 5, 10 years from now, in Clay's case, 18 years from now. And so we wanted to create a program that we thought kind of honored both of those influences, and that's why we have kind of apx. It's kind of just the multidisciplinary intersection of a lot of different things.

Demetrios Brinkmann [01:00:51]: A lot of times you get entrepreneurial folks that have so much ownership that they, like, have a lot of opinion on how things should be done. And then I've heard from friends who have not necessarily the same type of program, but the similar things where they're like, oh, we want like an entrepreneur type to come, but then after they haven't for a while, they're like, actually, we don't want you to run the company. So do you see, like, how do you battle with that kind of, oh, the extreme ownership. And I want to do it my way, and I'm very opinionated. And this is how we need to do it, because that entrepreneurial spirit versus, like, hey, but we're also. We have the boss already. You. You're not the boss.

Zack Reneau-Wedeen [01:01:40]: I think there are two things that stand out a lot. One is kind of creating the systems and conditions where people can have that level of autonomy and ownership. It works pretty well at Sierra because as an enterprise company, we're working with a number of different customers. And so each agent is kind of a place where you can have that autonomy and ownership and customer relationship building and user researching. And so I feel like there's kind of a. I wouldn't call it a sandbox because it's in production, but there's a place where you can really you know, where autonomy is working as intended.

Demetrios Brinkmann [01:02:15]: Yeah.

Zack Reneau-Wedeen [01:02:15]: And then I think the other big piece is just having great leadership in the company, where even if you yourself feel like a leader or someone with strong opinions, you have a level of respect for the direction of the company, the thought that goes into it. And kind of, certainly in my case, and I think in each of the APX's case, view our founders view some of our early employees as role models that you kind of want to model yourself off of, which takes the edge off of kind of, you know, being more stubborn and those kinds of things. Because if you're objective about it, you know that you also need help, need to listen to people.

Demetrios Brinkmann [01:02:50]: Yeah. Especially at that point in your career, I would hope.

Zack Reneau-Wedeen [01:02:53]: Yeah, Yeah. I think.

Zack Reneau-Wedeen [01:02:57]: The humility, especially with respect to customers and also with respect to some people who have done it a few times before, that was a big reason I decided to start my career at a big company. You know, Google's full of some of the smartest people, and you can find the right people that you want to learn from.

Demetrios Brinkmann [01:03:14]: Yeah.

Zack Reneau-Wedeen [01:03:14]: And I think one of the cool things about Sierra is you have the pace and size of a startup, but you also have the experience and accomplishments and wisdom of, you know, if you were to take the people that you'd like to work with at a company like Google. And so that's been really fun.

Demetrios Brinkmann [01:03:30]: Yeah. And higher percent likelihood of. When you do get a mentor, it's a good one, because if there's a lot of people, there's also probably a bigger spectrum of what you can hit.

Zack Reneau-Wedeen [01:03:43]: Exactly. Yeah. You have to be selective in a large organization here. You probably will get lucky.

Demetrios Brinkmann [01:03:50]: Yeah. Well, all right, let's. Let's talk a little bit about.

Zack Reneau-Wedeen [01:03:54]: Is this a good height, by the way?

Demetrios Brinkmann [01:03:55]: Yeah, I think it. Wait, keep talking.

Zack Reneau-Wedeen [01:03:58]: Yeah. Can you hear me?

Demetrios Brinkmann [01:04:00]: Yeah, I think it's perfect.

Zack Reneau-Wedeen [01:04:01]: Okay.

Demetrios Brinkmann [01:04:01]: All right.

+ Read More

Watch More

Beyond Prompting: The Emerging Discipline of Context Engineering Reading Group
Posted Sep 17, 2025 | Views 966
# Context Engineering
# LLMs
# Prompt Engineering
Context Engineering, Context Rot, & Agentic Search with the CEO of Chroma, Jeff Huber
Posted Nov 21, 2025 | Views 245
# Context Rot
# Search
# AI Agents
# AI Engineering
# Chroma
Who Does What (And When) in MLOps?
Posted Oct 24, 2022 | Views 1.1K
# Maturity Level
# Team Agreement
# MLOps Journey
Code of Conduct