Designing Voice Agents For Trust // Allegra Guinan // Agents in Production 2025
speaker

Allegra is a technical leader with a background in managing data and enterprise engineering portfolios. Having built her career bridging technical teams and business stakeholders, she's seen the ins and outs of how decisions are made across organizations. She combines her understanding of data value chains, passion for responsible technology, and practical experience guiding teams through complex implementations into her role as co-founder and CTO of Lumiera.
SUMMARY
Voice agents are increasingly handling our most sensitive data, from healthcare records to financial information. We inherently trust voices more than text; a psychological bias that creates a unique responsibility: we must design voice agents that honor the trust users naturally place in them. This talk explores how thoughtful design choices shape responsible voice AI deployment. We'll examine how interface design affects meaningful consent, how conversation flows impact privacy, and how voice patterns influence trust. Drawing from real-world examples, we'll cover practical design principles for voice agents handling sensitive data. As voice becomes the primary interface for AI systems, getting these design fundamentals right isn't just good UX, it's an ethical imperative.
TRANSCRIPT
Allegra Guinan [00:00:11]: Thank you. Hi everyone. Welcome to Designing Voice Agents for Trust. A lightning talk addressing the top misconceptions and recommendations for trustworthy AI voice agents. My name is Allegra. I'm the co founder and CTO of Lumiera. We are a boutique advisory firm focused on responsible AI strategies for for senior leaders. In today's session, we'll cover the basics of how voice agents work, five misconceptions about building voice agents, and then five recommendations when designing for trust.
Allegra Guinan [00:00:41]: You'll walk away with an understanding of the importance of design in voice interfaces and how to put trust first while avoiding common pitfalls when designing voice agents. Obviously this won't be comprehensive as I have less than 10 minutes, so I'll do my best. Before we can talk about trust, we need to define what a voice agent actually is. It's a system that listens, understands, decides and responds in natural language. These systems can turn human speech into structured, actionable interactions. The most common setup is sometimes referred to as chain or modular architecture. The system converts voice input to text using a speech to text model and then interprets it with an LLM, converts text to speech using a text to speech model and then responds back in spoken output. And that's the flow you can see at the top of this diagram.
Allegra Guinan [00:01:30]: And more recent applications are bypassing that middle step and using speech to speech or STS models the flow at the bottom of the diagram. And this naturally decreases latency but isn't necessarily the best choice for complex deployments. And these speech to speech models are still nascent, which means more susceptible to issues, so. So choose intentionally. There. Voice agents are commonly deployed in customer service, consumer electronics, smart devices, healthcare for things like transcription, an important call out here. Voice agents are not text based agents with a voice interface. And I'll get into that a bit later.
Allegra Guinan [00:02:07]: Okay? With that very base understanding, let's get straight into it. Misconception number one is that AI voice agents are unbiased and that gendered voices are hard, harmless. So when people talk about voice, they're also talking about the AI's vibe. And getting this wrong is off putting. You unconsciously judge the AI based on the voice and if it doesn't meet expectations, you hesitate. And there are studies out there that show we lean more towards positive or neutral tones. And you may notice AI voice assistants commonly default to female names and voices, such as Alexa, Siri, et cetera. This is a significant bias rooted in historical and societal factors linking female voices with traits like cooperation, while associating male voices with authority.
Allegra Guinan [00:02:49]: So Designing AI voice agents based on these assumptions that those are the best to use, risk perpetuating these harmful stereotypes, while also excluding a potentially large group of users that don't resonate with those narrow range of voices, which can affect trust in your product. To mitigate this, you can offer customizable or gender neutral voice options and implement adaptive strategies to tailor voice characteristics based on user interactions, rather than relying on style static gender assignment for voices and also being sure to support varied accents, dialects and speech styles. And it's really not just about matching the voice or gender to the users, it's more about whether the voice fits the job. For example, we have expectations around a commanding voice for something like a supervisor role versus a helpful assistant tone. And if there's a mismatch with those subconscious expectations, it can feel like the AI is less competent. So even if it's technically fine, it messes with cognitive trust. And it's amazing how much comes across just in voice. So like the pitch, the speed, whether you sound confident or hesitant.
Allegra Guinan [00:03:51]: So the agent needs a consistent character that fits the brand in what it's doing. Misconception number two. More options equal better user experience. I'm sure some of you have built a chatbot with a single massive prompt that tried to handle everything. It feels logical, especially when you see how capable models are getting. And it can work at first, but then reality hits and what's actually happening in production is agents are getting stuck. They go down these rabbit holes where they start chatting instead of accomplishing the tasks. They lose track of where they are in multi step processes, and it can be difficult to integrate additional functionality for risk of breaking the conversation flows.
Allegra Guinan [00:04:29]: So what you have to be careful of here is that when users deviate from your expected paths, which they will for sure, you need a way to guide them back. So this monolithic model can kind of get lost trying to handle everything at once. So the solution here is delegation architecture, having a frontline agent that manages the continuous user interaction that's optimized for conversation flow. And this agent uses tool calls to interact with specialized backend agents for complex tasks. And you want to break down large tasks into small clear goals. So instead of just handle customer service, you could have identify user intent or collect this required information. And I would really recommend starting with limited tools to maintain focus. So not giving your frontline agent access to everything on day one, but instead adding capabilities gradually as you understand the actual conversation patterns of your users.
Allegra Guinan [00:05:21]: And the key insight here is that conversation management and domain expertise are different skills. So your frontline agent should be great at talking to humans, and your backend agents should be great at solving specific problems, rather than trying to have one model do both. Okay. Misconception number three. Speed isn't critical for voice. Many people think that because users expect AI to think or process information, voice interactions can tolerate delays. And this assumption is especially common when we're coming from text based systems where a couple second delay is hardly noticeable and we think the user will understand that the system is doing something complex. But in reality, those awkward silent moments during tool execution completely break conversation flow.
Allegra Guinan [00:06:05]: Imagine you're having a human conversation, somebody asks you a question and you just stare at them for five seconds before responding. That's what slow voice agents feel like. And that's when you lose user confidence and you get this interruption chaos when they don't know if the system heard them, so they repeat themselves or they try to interrupt, and then the system may not be able to handle that conversation pattern. So the solution here is responsiveness and confirmation. So for natural conversation, you need to respond in under a second. This is about you. This is not about user preference. It's really about the illusion of maintaining a conversation.
Allegra Guinan [00:06:38]: And tool actions need to be announced and voice agents need to provide immediate confirmation. So instead of going silent while it's processing, the agent should say something like, let me check on that for you, or I'm looking up your account information. So you're basically buying time while the tool executes and the user knows what's happening. And from a technical perspective, you can optimize your pipelines through things like model selection here and audio streaming and edge deployment to make sure you're focused on that low latency. And the key insight here is that in voice interfaces, perceived responsiveness is more important than actual processing speed. So users will forgive a system if it takes three seconds to think if it tells them what it's doing. They won't forgive a system that leaves them hanging in silence, even if it's just for one second. Misconception number four.
Allegra Guinan [00:07:23]: Text to voice is simple translation. Back to what I said at the beginning. Voice agents are not text based agents. With a voice interface, it's common to think that converting existing text content like an FAQ is into a voice interface is efficient and effective. But this doesn't account for natural speech patterns. We don't speak the way we write, and we can't design voice systems based on text conversations. There are so many variables to account for, like think about your own personal speech, your intonations Your accent, filler words you use, the background noise. All of this needs to be accounted for.
Allegra Guinan [00:07:54]: And the mitigation here is conversation first design. So training on actual conversations with human agents, expecting the unexpected, thinking of all those different variables and testing them, and making sure that AI can gracefully recover from a misunderstanding, it makes it feel more reliable. It's also an opportunity to reengage this failing forward mindset, always guiding the conversation back. Agents should also do that. That's part of the conversation first design. And it doesn't have to be perfect. When users don't see AI as perfect, they might actually use more clear language, deliver more specific requests, which could lead to a better outcome. Finally, last misconception, trust is inherent.
Allegra Guinan [00:08:32]: So people trust voice more than other interfaces. That doesn't mean they will naturally trust your voice assistant. And you have to earn that trust. And that's the final recommendation. Trust first, security. So security is normally invisible unless you design for visibility. And your trust, which is rooted in emotion, depends on things like feeling your data is locked down. That means super clear policies on how your data is used, being really careful about logging sensitive information.
Allegra Guinan [00:08:58]: And people don't trust agents that make the same mistake twice. So the AI needs to get smarter over time. Feedback mechanisms are not decoration. They should be intentional design choices for you to make informed iteration decisions. So to wrap it up, voice agents should embrace diversity, use specialized architecture, show appropriate uncertainty, design for spoken conversation, and prioritize transparent security. Trust is earned through consistent, respectful and transparent interactions. Designing AI voice agents for trust is way more than just the tech specs. Trust is the interface.
Allegra Guinan [00:09:32]: Hopefully you've taken away what I promised at the beginning. I know I sped through that. So if you have any questions or want to dive deeper or are interested in learning about what we're doing to help organizations, please feel free to reach out. Thank you.
Skylar Payne [00:09:46]: Awesome. Thank you so much. I love that. You know, even though this was like focused on voice agents, I found myself thinking about like, oh, this is also important for like non voice too.
Allegra Guinan [00:09:56]: So 100%. I think all of this could probably be applied to any AI agent that you're building or any application in fact.
Skylar Payne [00:10:05]: Cool. So it looks like there's some ways to get in touch here. So folks, please, you know, get in touch if you have questions. We might have a question in the chat. Okay, yeah, there's a question from someone in the chat that we can kind of run through real quick before you hop off. So Chris asks. Most people that call in for customer service are confused, frustrated, or require a walkthrough of the process. How do we handle all the variables with the stable and controlled talk track? Are we silently switching between voice agents depending on circumstances?
Allegra Guinan [00:10:41]: Really good question. And I think this is where training on actual conversations is super important. And again, going back to the really clear goals, rather than having this very open ended voice agent that you're deploying that can, you're assuming can handle any of these variables, especially someone that's upset, all of that emotion, having very clear goals to guide the person that they're speaking to to answer the questions that they can then take action on. So if they need some piece of information from the customer that's calling in to go make some tool call or fill out some form pull information, being able to guide the conversation to accomplish that specific goal rather than having it be really open ended and I think having fallback mechanisms always in place and making sure that you have human in the loop. One of the other things you can do here is have a complimentary agent. So I mentioned like the front line and the back end ones, but having one that's sort of on the side monitoring conversation for drift to see if it's going off track so that you can, you know, if you need to have some sort of intervention.
Skylar Payne [00:11:43]: Awesome. Thank you so much. I think that's the end of our time here, but we definitely appreciate you coming and sharing. Again folks, make sure to use those get in touch links. Make sure to scan this QR code to check out the AI Leadership Accelerator program. There's a great slide deck on the other side of that QR code, so feel free to check it out. And with that we're going to say goodbye and take care.
Allegra Guinan [00:12:10]: Thanks everyone.
