MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Reliable Voice Agents

Posted Nov 18, 2025 | Views 12
# AI Voice Agent
# Voice AI Simulation
# Coval
Share

speakers

user's Avatar
Brooke Hopkins
Founder @ Coval

Brooke Hopkins is the Founder and CEO of Coval, where her team is building the enterprise-grade reliability infrastructure for conversational AI. Coval provides simulation, observability, and evaluation tools that help companies rigorously test and monitor voice and chat agents in production.

Previously, Brooke led evaluation job infrastructure at Waymo, where her team was responsible for the developer tools that launched and ran simulations to validate the safety of autonomous vehicles. This work was foundational to scaling Waymo’s simulation capabilities and ensuring the reliability of self-driving systems.

Now, she’s applying those learnings from testing non-deterministic self-driving agents to the world of voice AI, bringing proven simulation and evaluation methods to a new class of complex, real-time AI systems.

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Voice AI is finally growing up—but not without drama. Brooke Hopkins joins Demetrios Brinkmann to unpack why most “smart” voice systems still feel dumb, what it actually takes to make them reliable, and how startups are quietly outpacing big tech in building the next generation of voice agents.

+ Read More

TRANSCRIPT

Brooke Hopkins [00:00:00]: We are recording.

Demetrios Brinkmann [00:00:01]: Yeah, yeah, we're recording. We're on.

Brooke Hopkins [00:00:02]: Oh, okay.

Demetrios Brinkmann [00:00:06]: We'll try again. So we did the Agent Builder Summit. You gave a workshop before that. It was awesome. I had a blast. Hopefully you did, but I wouldn't expect you to tell me if you didn't right now in front of everyone.

Brooke Hopkins [00:00:28]: No, it was amazing. We built a voice agent from scratch with Live Kit and then actually went and built out evals as we went along, which was really fun because it was actually how you build agents in production. Like, how do you actually take a voice agent, build a test, edit it, build a test, append new capabilities, add in different things and improve the agent over the course of the workshop.

Demetrios Brinkmann [00:00:53]: Yeah, there was three eyes that left and I was like, how was it? And they both looked, like, shell shocked because they were like, it was really good. I wasn't expecting that, but it was. It was good. And they, like, almost couldn't talk because of how good it was. And so that was a little bit of like, yes. Yeah. You know, it's Goodwin.

Brooke Hopkins [00:01:12]: That's amazing.

Demetrios Brinkmann [00:01:13]: Yeah, yeah, it was cool. It was very cool to see. And there was a lot of voice AI companies at the Agent Builders big event too, which is cool because I tried to get a few of them there to make sure and showcase that, like, this is happening and a lot of really good products are being made with voice agents.

Brooke Hopkins [00:01:38]: Totally. I think I was really excited how many people were interested in voice AI yesterday, because in my head, I think over the past year, it's certainly become, you know, last summer it was still very niche of a couple of companies are playing around with voice AI. People were saying, oh, this is finally working. So this is really exciting space. But it was small startups experimenting and now I think we see large companies, almost every company is thinking about, how does voice AI fit into my agentic kind of vision or our agentic plans? And that's really exciting to see everyone from big companies to small startups thinking, how can I incorporate voice AI into my product? So we saw a lot of that yesterday.

Demetrios Brinkmann [00:02:21]: And I told you, I go nuts when I have to deal with the old voice. Like, what is it? Ivr? Yeah, I go absolutely crazy whenever I get on a call and it's that. And so I hope that transition that you're talking about of every company having this now happens sooner than later.

Brooke Hopkins [00:02:45]: Well, it's funny because I think people, when they say, I don't want to be talking to a bot on the phone, are really saying, I don't want to be talking to a bad bot on the phone. But really, if you could go. I think similar things were said about websites. And why would I want to go to a website versus just calling someone on the phone and making it happen? Or like, I don't trust a website with my credit card, or I don't trust things that are on this website. And then obviously there's a shift where millennials would much rather go to a website than talk to anyone. And I think that with voice AI, it's actually creating this really natural interface where you can really quickly just imagine, you can just call and change your appointment versus going to a website, logging in, forgetting your password, finding a new time, et cetera.

Demetrios Brinkmann [00:03:29]: Funny that you mentioned how much easier that is when we created these websites. And you would think like, oh, we've got it all on the website. Why don't you use the website? You can change everything there. You can do it. But then the actual act of doing it is so painful sometimes for that very fact, like clicking around, you don't know the interface, you forget your password, you gotta sign in, all that. So, yeah, 100% agree.

Brooke Hopkins [00:03:54]: Yeah. And I think that we forgot that actually talking is a very efficient way of doing things. Humans have created speech over thousands of years and very efficiently communicate information. And so there's a reason why when you're working on something, you're like, it's easier to just talk over the phone about this. And I think voice AI has this capability to create this very natural interface, Especially in age of AI, when you're talking to either a humanoid robot. But even if you're just trying to get a task done, you can say, okay, here's what I'm thinking. I need to book a reservation or I need to get this task done. Voice can be really natural.

Brooke Hopkins [00:04:33]: And so I think what we're going to see is that almost every company is going to have a voice interface of some kind for the interactions that make sense in a similar way that happened to mobile after web, where you had. Not everything is on mobile, but a lot of things are on mobile in different forms. And so I think every company will have to think about what interfaces make sense via voice for us and how can we make that really seamless so that instead of having to wait on the phone for 20 minutes to talk to a human, you actually have like a really seamless, fast interface to get things done.

Demetrios Brinkmann [00:05:06]: I know that the customer support is a huge use case. What other ways are companies thinking about voice?

Brooke Hopkins [00:05:14]: Yeah, I think what's interesting about voice right now is 80% of the use cases kind of fall in the same couple buckets where customer support is overwhelmingly the use case. This is because I think the infrastructure is already there. So it's an interesting case of you already had some level of automation with ivr. It's over the phone. So it's actually a very defined channel. You have one person talking to another person so it's much easier to navigate. You have a pretty predefined set of things in place. And most companies for customer support will have standard operating procedures.

Brooke Hopkins [00:05:51]: Like if you're a customer support person, these are the things you have to say. So I think that's why it's taken off so much is because it's. And also it's a very high, large cost center for a lot of companies. But then we're also seeing things like healthcare. So being able to like file claims, book appointments, all of these things that have to happen over the phone because these legacy industries had very little software permutation, permeation because of there's just so many providers, there's so many like such high need for security and all this. But voice AI is cool because you have this universal API where you can have a human talking to an automated system. So you don't need to have software integrations. I think we see this with like logistics too.

Brooke Hopkins [00:06:41]: So truck drivers, there's so many different parts of the logistics system and a lot of those don't have, you know, a truck driver isn't going to integrate a software system for if they're just doing, maybe they have 50 trucks or maybe they have 50 drivers. They're a small business and they're not necessarily looking to adopt all of these new software systems. And voice AI again is really cool because it doesn't matter whether or not all of them adopt it. One of them can adopt it, automate some things and then still be talking to a human on the other side.

Demetrios Brinkmann [00:07:16]: Wow, the universal API is such a good metaphor. It's like you can use tech now, but you don't have to have everyone else using tech because everybody can talk, mostly everybody. And so me sometimes better than others or better. No. I wanted to make a joke there but didn't come out. But the, the idea is like having almost like this behind the scenes going on where you can leverage tech. And the folks that are using the tech don't even need to know. Yeah, in a way they'll know because it's AI.

Demetrios Brinkmann [00:08:02]: And so yeah, I'm talking to a bot, but they don't need to know that. Like, well, you might have a really cool complex system in the background and they can just sink into it with this universal API.

Brooke Hopkins [00:08:13]: Exactly. And so it doesn't. Yeah. And so you can have like doctor's offices or claims providers or truck drivers or H vac system providers all of a sudden be able to use this voice interface where especially if you're, let's say you're driving around in the car all day servicing different H vac systems, you're not necessarily in front of a computer, you're not necessarily looking to acquire a bunch of complex software integrations and.

Demetrios Brinkmann [00:08:42]: Setups, UIs that were just so painful.

Brooke Hopkins [00:08:45]: Exactly. But you can get in the car and say like, ok, what's my next appointment? Or have someone answer the phone for you while you're installing something. And now you haven't lost that next client. And I think there's also, there's all of these use cases that haven't been explored as much because there, there weren't even really like automated systems there before. So you know all these people that are in cars all day, how can you help augment those? Like if you're like, if you're a delivery person or if you're a police officer or if you are like doing, if you're like a Uber driver or like how can we integrate this into Waymo and self driving cars? How could you control that interface with, with voice AI.

Demetrios Brinkmann [00:09:30]: Yeah. Yeah. I was thinking about how you might want to help. I know I've heard of a company that is doing something around for doctors because they have to write a lot of crap after they see a patient and a lot of their time is spent writing things down about the visit that they just had. And so you can kind of extrapolate that out with a lot of different jobs. You have to have this component of. I need to write down what I just did and can we have it be a voice agent that interviews you about what you just did and then writes it down for you so you can just answer questions and you don't have to sit there and go into the UI and type it down or whatever. Or even better, like can the voice agent just be with you in whatever you're doing and then write it down and say, hey, here's what I got.

Demetrios Brinkmann [00:10:27]: Do you want to change anything?

Brooke Hopkins [00:10:28]: Totally. Yeah. I think that's definitely a big. It's like so much of our day to day lives happens via voice, right. You're like talking to your coworkers, you're talking to someone and then you want to recap it. You're like talking about the. You're trying to communicate your plans, you're trying to communicate changes. And so I think all of these places that you're talking to other people, there's opportunities for Voice AI to do, to free up a lot of that time to then do other more interesting things.

Demetrios Brinkmann [00:10:59]: Yeah. So now there's a big idea that we want to hit on, which is how to make this reliable.

Brooke Hopkins [00:11:07]: Exactly. I think that's the biggest, that's the biggest thing that people are nervous about with Voice AI is like, that's great that I could automate this. But these are my customers or these are my business partners or people that are important to what I'm doing every day. I don't want the AI to all of a sudden go rogue, which it.

Demetrios Brinkmann [00:11:26]: Has been known to do.

Brooke Hopkins [00:11:27]: Exactly. I don't want to be promising refunds. I don't want to be giving discounts. Giving discounts, saying the wrong thing. And I think that's where something that's really hard is that you have to. For every step in the conversation, it's an opportunity for failure. Right. So if you have, if you have a 10% failure at every point, it's pretty high.

Brooke Hopkins [00:11:52]: But let's say you have some percent of a failure at every point. That failure could potentially compound throughout the stack. So the chance that any one turn fails then all of a sudden becomes very high. But so you might say, how will Voice AI ever be reliable in that case? But we've actually built out systems like this across all sorts of other parts of software. So for example, like the Internet or a web application, every single stack, part of the stack is super unreliable. Servers are unreliable, network connections are very unreliable. All of your APIs and all that jazz, but you create reliability out of redundancy. So you can have these fallbacks.

Brooke Hopkins [00:12:37]: You can. We've actually gotten to six nines of reliability for large scale web applications. And so I think the thing that I'm thinking about all day, every day is how do you create that same amount of reliability for agents while also still making them really autonomous?

Demetrios Brinkmann [00:12:51]: It reminds me a little bit, and I'm no expert and I definitely wasn't around in tech. I just heard stories of like databases being super unreliable back in the day. And so that's why you had to create different kinds of databases that would. Maybe you shard databases and Cassandra comes up. Or maybe you have these new ways of thinking about databases so that you have a database that's Specifically never going to fail. And you have cockroachdb that comes up, you know, it's meant for that reliability. And so now when we're at the beginning of this, in the voice spectrum of how it's going to take over you, you like are gonna probably start seeing some of that I imagine like you're gonna start seeing, okay, this, this voice agent is specifically, it's like the cockroachdb of voice models. It's not gonna be, have any problems.

Demetrios Brinkmann [00:13:50]: Or maybe what you're talking about too is also each piece of the system needs to be like a cockroach and that it doesn't fail or if it does fail, we have ways so that the end user doesn't realize it fails.

Brooke Hopkins [00:14:07]: Yeah, exactly. And that redundancy or fallback mechanisms, but yeah, I think that's something that people forget is that building reliable web applications feel so obvious today. The idea that your application would go down if you have 10,000 users go to your site. It's actually just by default. If you deploy on AWS server cell and use serverless, et cetera, it just works. It just works out of the box. Everything is pretty secure out of the box. You have to try pretty hard to make a terrible web application today.

Brooke Hopkins [00:14:46]: But I mean that's not to say that it's still easy, but there's a lot of things that have just become infinitely easier than they used to. And even using RDs from AWS or any of these cloud databases, there's redundancy, they have leader replicas and then read replicas and all these things that you don't even have to think about, you don't have to think about your server racks, you don't have to think about all these things. But today with voice AI, you still very much have to think about a lot of these things. Of if my TTS provider and my OLM provider and my STT provider all have some variance in the reliability of how of their latency, you have to be ready for all three of those things to not respond for two seconds. That's going to be a six second latency. And if you don't respond for six seconds, I'm going to be like, hello, Anyone there? What's happening?

Demetrios Brinkmann [00:15:49]: I'm totally checked out.

Brooke Hopkins [00:15:50]: Yeah, exactly. It's super unnatural.

Demetrios Brinkmann [00:15:54]: Six seconds is an eternity in a conversation. Yeah, maybe for like normal.

Brooke Hopkins [00:16:02]: Yeah, that's like if I don't, I don't respond for six seconds. Hello, I see one there. So you can, and that's not Even, you know, API reliability like 99th percentile of like 1 second delay is notable. And so I think when you're building voice AI applications, you have to think about your latency budget across everything and not only your latency budget on average, but your latency budget in the 99th percentile. Because if you have a 20 turn conversation where, if you have a 30 minute conversation, think about how many turns that is to the conversation and how many opportunities for failure. And if any one of those fails, like if you, if the agent doesn't respond for six seconds at any point along that conversation, there's a chance the user just hangs up or like thinks that nothing is happening or etc. And now that whole conversation has failed.

Demetrios Brinkmann [00:17:01]: But the fail back or the ideas that you've seen work, is it just like if something fails, you have a go to line that you say sorry, something's not working, I'm working to make it happen better or I don't know what. How do you deal with that if you're noticing? All right, there's these failures that we kind of have figured out, but we know since there's such a high unpredictability we're going to have some at some point and so we need to like give the user some idea that we're still there. It might be taking six seconds, but I have a quick canned response that I can use.

Brooke Hopkins [00:17:46]: Well, so this is where you actually need to have if OpenAI doesn't respond or this particular model provider model instance, you can use similar models of the same family and it hit those. But you can also have a fallback of if OpenAI doesn't respond, then I'm going to use Gemini. You can have these quick fallbacks or whichever one responds first. I use that. This fallback logic is actually pretty non trivial. That's why voice AI orchestration libraries like PipeCat or LiveKit are really useful, I think even more so than agentic applications because a lot of this real time streaming voice orchestration is pretty non trivial. So how do you actually create these fallbacks? You can actually configure them with these libraries to fall back. They handle all the fallback orchestration so that you don't have to implement that from scratch.

Demetrios Brinkmann [00:18:44]: Yeah, that's cool. And the other thing is like isn't it a big time vulnerability if at any one point you fail and then it's almost like this can ruin everything maybe for the user, let's say that the user is patient enough to stick around. But do you have potential of like losing the context or Is it not? Like I, I think you were telling me that on the reliability aspect you have like all of this, it's like more high stakes. But is it just because of the user that is not going to be patient or is it because there, if you lose something then you potentially can lose the whole conversation and you got to like start over from scratch?

Brooke Hopkins [00:19:34]: Yeah, it's more that the user is impatient. So I think that's why we do a lot of these benchmarks around not only model latency but also model variation in latency. So the thing that's just as important as the, as I said, the 50th percentile is also the 99th percentile. Because if the 50th percentile is amazing but 99th percentile is much higher, then that, that person's going to be really impatient. Where you don't respond for 30 seconds all of a sudden is a crazy amount of time that that person is just going to assume that the, the whole thing failed.

Demetrios Brinkmann [00:20:13]: Oh yeah, yeah. If you don't respond for 30 seconds.

Brooke Hopkins [00:20:16]: You'Re, somebody's hanging up, somebody's hanging up.

Demetrios Brinkmann [00:20:18]: Six seconds. It's awkward. 30 seconds, game over.

Brooke Hopkins [00:20:21]: Game over.

Demetrios Brinkmann [00:20:22]: Yeah, you see that?

Brooke Hopkins [00:20:24]: And so I think that's similar. So that's just one type of failure. But then there's all these other types of failures. I think these are the ones that get more airtime for enterprise leaders where you are issuing refunds or you're saying factually incorrect information.

Demetrios Brinkmann [00:20:40]: Air Canada.

Brooke Hopkins [00:20:41]: Yeah, exactly. You're saying your voice selling someone the car for $1.

Demetrios Brinkmann [00:20:47]: Chevy. Saying that Tesla is better than Chevy. Yeah, all those, yeah, all those problems come up and like you said, it's a very valuable conversation that you're having so you could it up big time.

Brooke Hopkins [00:21:04]: Yeah, totally. And, and it's with customers too, which is a business.

Demetrios Brinkmann [00:21:09]: That's it. It's like a high stakes.

Brooke Hopkins [00:21:12]: Yeah, exactly. It's high stakes. And so I think that's where it's not only just performance but now you have this like quality aspect that I think is very different than SAS tools previously as well. Where before it was like the button should work every time, but that's really a performance thing. Like what percentage of the time is a button working? And it's usually pretty deterministic but with voice AI it's now super non deterministic and what quality looks like is very different. There's obvious things where it goes right and obvious things where it goes wrong. And there's this hole in between of wasn't quite perfect. And so how do you do instruction following evaluation or did it do the right thing at the right time? Did it take the right action on your behalf via function calling? And all of this jazz which opens up a whole can of worms of how do you make sure that not only your performance and you're saying the right thing, but you're saying the right thing at the right time.

Demetrios Brinkmann [00:22:07]: But you tested this, right?

Brooke Hopkins [00:22:09]: Yeah.

Demetrios Brinkmann [00:22:09]: With all the different providers.

Brooke Hopkins [00:22:11]: Yeah. So we've done benchmarks across all of these different areas. So tts, like how do you measuring latency, measuring word error rate or the accuracy of the model cost and how do you balance these. I think there's a very common balance across all of these models is like cost, latency and quality.

Demetrios Brinkmann [00:22:33]: That's the Venn diagram.

Brooke Hopkins [00:22:34]: Yeah, that's the Venn diagram. And you can kind of like some models are just going to be better at everything, but really there's no one size fits all. Every model is going to have its trade offs. And so how do you choose between for tts, for stt, for LLMs, Voice activity detection. The really interesting thing about voice AI is that you have so many models all working together in order to orchestrate this. It's not just one LLM, you have all of these models chained together in order to produce a really high quality output. That's why we've done a lot of the benchmarks is because we want to be able to show as you're evaluating all these different models that are constantly, every single week there are new models coming out. OpenAI just released a new real time model.

Brooke Hopkins [00:23:19]: Should I use that one? And all of these TTS providers and STT are constantly releasing new things. And so I think with voice AI more than ever, having real time benchmarks that are constantly monitoring the performance is super important for being able to understand what's happening.

Demetrios Brinkmann [00:23:40]: I guess that's the important part for the majority of people. It's like how much is it going to cost, how fast is it going to be and how well is it going to actually say what it needs to say? But then on the how well does it actually say what it needs to say? There's a lot of nuance there, I would imagine.

Brooke Hopkins [00:23:54]: Definitely. Yeah. So that's why instruction following benchmarks are really interesting of like is it following the right set of steps for a five minute conversation, for a 10 minute conversation, for a 30 minute conversation? And that's where I think with real time models it's still not obvious that you should use real time over cascading Models because the controllability isn't there. The instruction following is getting better and better. But if it's not following the instructions once it's happening, it's kind of just happening. There's all sorts of ways you can inject context, which is interesting and I think we're still early days in different voice architectures with real time.

Demetrios Brinkmann [00:24:35]: What do you mean by inject context?

Brooke Hopkins [00:24:37]: So for example, as the conversation is going, you can have a background agent saying is this conversation going well or are we missing any information? And so you can imagine that you have a real time model that's going back and forth and then you have another model that's saying what other information do I need? The person asks a question and then it goes and does a lookup and then feeds back that context into the next turn. I think regardless of if you're doing real time or a cascading model, cascading is where you have text to speech LLM. Sorry, speech to text LLM, Text to speech versus using a voice to voice model. Regardless, you still have this kind of as the conversation progresses, you want to inject the right context for what that turn is trying to do. And so I think that's where actually voice AI has kind of been very early to context engineering because there's just far too much context to feed into every single turn.

Demetrios Brinkmann [00:25:41]: That's exactly what I was saying. Like how do you keep that context on and the, the most important things right on a 30 minute call?

Brooke Hopkins [00:25:49]: Yeah, exactly. Like you're exceeding the context window. But also even beyond that it you start to very quickly have this needle in the haystack because there's all sorts of extra conversation or if you're doing a rag lookup, you can't, you can't. Those are like, tend to be costly lookups. And so doing that in a really efficient way that you can still fit within that one second latency budget when you're already using the latency budget for lots of other things. I think has always been a very hard thing for voice AI. And so there's a lot of interesting things are happening around people having graphs. So as you move through the graph, it's dynamically injecting context around.

Brooke Hopkins [00:26:36]: Once you get to appointment scheduling, the availability has already been preloaded or making sure that you inject the right step for the next turn. So instead of saying at the beginning, throughout this whole conversation, I want you to first ask the person for their name and their phone number and their availability and then send them an email follow up putting that all in one prompt might not perform as well as having a graph that's heuristics based. And you say you're on. You're at node one on this graph. And then after the next turn you're like, okay, now you're at node 2, now you're at node 3. And this is the step that you should be doing right now.

Demetrios Brinkmann [00:27:16]: Yeah, I could see that. For if you're trying to also close somebody on something, close like a ticket or close a deal or whatever. And remember, you're stuck at node 4. Like, please try to move to node 5. It seems like you're not getting past node 4 for some reason. What's going on? Yeah, you can also add a little context there. So I could see that that seems like it could be very valuable. Now, what else was there on the reliability side, I think that there's like.

Brooke Hopkins [00:27:51]: So many interesting voice AI architectures that are coming out of, I think in the earlier this year or even last year, most people are just trying to make things work end to end. You were just trying to say, does the voice AI application respond to you with something relevant in a reasonable amount of time? And now I think that is a. That was kind of step, like, that's the first problem. Yeah, exactly. The lowest bar. And I think now we're seeing. It's been really cool to watch the industry evolve over the last year where that's like the first thing everyone cares about. Even in the workshop yesterday, as we were setting things up, we talked a lot about latency.

Brooke Hopkins [00:28:31]: And the reason we talked about latency is nothing else matters if you can't even respond. Right. That's the step one. And it's still not trivial to get really low latency for voice AI, then step two is, can you finish the conversation, can you have some direction to the conversation and get to the end? And then can you do more and more complex actions? I think that's where we're seeing a lot of really interesting voice architectures of having this graph method or having background agents or having multiple agents responsible for different things, having one agent that's responsible for responding, another agent that's responsible for coming up with context. All of these different things I think people are coming up with is it's been cool to watch these architectures kind of evolve real time in front of us.

Demetrios Brinkmann [00:29:32]: Yeah. Zach yesterday was saying that they have a constellation of models or agents. And a lot of times if when the conversation is happening or when the other. When the user is talking and they think that they Might need something, they'll just fire it off. Because in the unlikely case that they might need it, they would rather have it and not need it than not have it and need it.

Brooke Hopkins [00:30:01]: Yeah, exactly. It reminds me a lot of Instagram in the early days was able to have insane latency, not because their latency was better in loading images than anyone else, but because they did predictive loading. So they were. Even before you entered your password, they were pulling up all of your, like, pulling up the first images, and they are pulling up all sorts of content that you would see immediately as you logged in, or preloading the one to the left, the one to the right, the previous and the next one. I think there's tons of opportunity there to do similar things with voice AI where it's. I think this conversation is going in this direction. So I'm just going to preload this type of information or I'm going to start thinking about what to do next.

Demetrios Brinkmann [00:30:48]: Oh, yeah. I didn't even think about the idea of, hey, the conversation seems to be moving to node 5. Let's get all of that stuff ready right now. Because it may take us a minute. I was thinking about, like, in the moment of I'm a slow talker, especially for a computer. I talk very slow. Right. That didn't come out right.

Demetrios Brinkmann [00:31:11]: That made me sound like a computer.

Brooke Hopkins [00:31:12]: But you are.

Demetrios Brinkmann [00:31:14]: Especially as a robot.

Brooke Hopkins [00:31:16]: As a robot, I found it really hard. This entire conversation has just been generated video.

Demetrios Brinkmann [00:31:23]: Yeah.

Brooke Hopkins [00:31:24]: Brooke and Demetrius have not actually been sitting here.

Demetrios Brinkmann [00:31:28]: It would be amazing once it gets this good. But until then. We were talking about this earlier. It's still a little uncanny Valley on the video generation.

Brooke Hopkins [00:31:37]: Yeah. It's like a little awkward head bob.

Demetrios Brinkmann [00:31:43]: Now you're trying to act like it. Yeah. So we are human, as far as we know.

Brooke Hopkins [00:31:51]: Yeah.

Demetrios Brinkmann [00:31:53]: We're here. And anyway, I was thinking, like in the moment that I'm saying things, then it's grabbing it and it's like, oh, well, that. The whole thing that Demetrios just said was two seconds long. They can get a lot of information in two seconds and then they'll have that ready in case they need it and then be able to respond. If the whole process took three seconds, it's a second response. At least what I feel is a second response. Right. So I like that preloading idea.

Brooke Hopkins [00:32:28]: Yeah. I think there's also really interesting things around more as you so following the right steps, when the steps are known, like with standard operating procedures, customer service, these types of things. But what about when you want Special exceptions or like negotiating or these really autonomous behaviors. I think those are really interesting and how people can start to handle this is how an agent can negotiate with. These are the bounds in which I'm kind of comfortable. This is how I would be reasoning about this social interaction. And how could you start to get agents to navigate those types of situations autonomously is really interesting.

Demetrios Brinkmann [00:33:07]: Yeah, I'm always worried about the negotiation. I always feel like I'm not going to get a good deal here or the agent is going to just go to the lowest amount.

Brooke Hopkins [00:33:18]: Yeah, well, I think it's, I think there's a lot of interesting techniques from sales of or like recruiting. Right. Like large companies when they're hiring, have hiring bands and kind of you can create like incentives and trade offs. So I think, I'm not sure it's still very early days in that area. But like how do you give agent set of tools where maybe they have a negotiation budget for customer service or maybe they have a. Like this is the amount of refunds we can give out. Based on all the conversations you've had today, how many of these, which ones are the most deserving of those refunds or memory across conversations. I haven't seen that happening today.

Brooke Hopkins [00:34:03]: But I think that's a really interesting.

Demetrios Brinkmann [00:34:06]: Area where like a cross conversation, it's like, oh yeah, Susie was just dealing with this problem too.

Brooke Hopkins [00:34:12]: Yeah. And we solved it this way. Yeah, I think that's actually another area that is more people are exploring more is kind of reinforcing certain pathways. So like I know I solved this problem this way previously and that worked. How can you in, how can you embed that within your architecture so that you can have memory of. Maybe it's memory or reinforcement learning or there's lots of ways that you could implement this, but how can you remember what things worked and then use that to make the agent better and better over time?

Demetrios Brinkmann [00:34:51]: Yeah, that. Better and better over time. And knowing, yeah, I've seen it this way and I can do it this way. So that's how I'm going to do it. I, I remember so again referencing the conversation yesterday with Zach. He was like still to this day like 80% of the Google searches have never been searched before.

Brooke Hopkins [00:35:14]: Yeah.

Demetrios Brinkmann [00:35:14]: And so the thing is that a lot of them are like similar but not the same. And so he said that's when the agents will get a little bit messed up because they think they know and then they don't because they're like, oh yeah, this is the same as Susie. And yeah, no but actually it's a little different in this way. And the voice agent will be like, no, it's Susie. No, no, I know how to do this. This is how to do it. And so that, that can get super tricky, I imagine.

Brooke Hopkins [00:35:44]: Definitely. I think memory in those cases is still really, really hard. Yeah, 80% of use cases haven't been searched before. It also resembles kind of really autonomous systems like self driving, where a lot of the scenarios that you've seen, especially down to like you've never seen before, this exact scenario, right? Like that exact person with that exact dog, with this exact street at this, with this lighting and this weather conditions, all this, with that car is every single step is new. And how can you create a system that's really autonomous while also being incredibly reliable? I think there's a lot of parallels with voice AI of you're going to have the same situations where you haven't heard this voice before, but this background noise with these frequencies, with this exact asking for this thing. And yet you want to. So you want to be able to autonomously navigate that with incredibly high reliability. And I think that's just like a very exciting area is like, how can we create these really truly autonomous systems?

Demetrios Brinkmann [00:36:48]: Yeah, that's such a great point. It's such a like great parallel.

Brooke Hopkins [00:36:54]: And I think people kind of again, the like sentiment towards voice AI is just so like reminds me so much of how people talk about self driving where they're both saying, either you know, tomorrow we're going to have self driving cars, like no one will be driving a car. Like in, in one year from now, no one's going to be driving a car, no one will have a license. It will all just be like driving autonomous cars. There won't even be regular cars allowed down the road. And then there's this other extreme of people saying self driving is never going to work. Look, it failed right there. It will never work. We'll never have autonomous cars.

Brooke Hopkins [00:37:32]: And the true answer is somewhere in the middle. It's going to take longer than we expect. It's going to look different than we expect. But getting really autonomous systems that are reliable is totally in the realm of possibility for voice. And so I think we're both underestimating and overestimating voice at the same time, where we throw voice at something and say like it should, it sounds really realistic, so it should just work for everything today. And then on the flip side, people are saying it failed on this one case so we can never trust it. And I think there's like there's somewhere in between. Right.

Brooke Hopkins [00:38:04]: It's going to take a few years, but. And, like, iterations, but we'll totally get there.

Demetrios Brinkmann [00:38:18]: What is it? ISR oh, my God.

+ Read More

Watch More

Building Reliable AI Agents
Posted Jun 28, 2023 | Views 1.3K
# AI Agents
# LLM in Production
# Stealth
Building Reliable Agents // Eno Reyes // Agents in Production
Posted Nov 20, 2024 | Views 1.6K
# Agentic
# AI Systems
# Factory
Building Conversational AI Agents with Voice
Posted Mar 06, 2024 | Views 1.6K
# Conversational AI
# Voice
# Deepgram
Code of Conduct