Sign in or Join the community to continue

Real-Time Voice Agents in Production: Lessons from Building Human-Like Conversations // Panos Stravopodis

Posted Nov 25, 2025 | Views 71

# Agents in Production

# Prosus Group

# Voice Agents

Share

Speaker

Panos Stravopodis

Co-Founder & CTO @ Elyos

Panos Stravopodis is the Co-founder and CTO of Elyos AI, a London-based startup building AI voice agents that bring real human-like conversations to call centres. Before founding Elyos (a Y Combinator-backed company), he led large-scale engineering organisations as VP of Engineering, where he managed 150+ engineers building B2B SaaS platforms in Climate Tech and FinTech. With over 15 years of experience in distributed systems, MLOps, and product engineering, Panos is passionate about bringing together large language models and real-time voice synthesis to solve real customer service challenges. When he’s not scaling AI infrastructure, he’s usually sailing or experimenting with photography.

+ Read More

SUMMARY

At Elyos AI, we’re not building AI to pretend to be human - we’re building AI that can perform like one: completing real tasks, with real quality and real transparency. Our agents will tell you they’re AI. We’re not replacing humans - we’re building trust that AI can help humans work better. In this talk, I’ll share what it actually takes to make that vision real in production. From designing low-latency pipelines, to managing dialogue context and emotional tone across long calls, we’ve learned that delivering natural, useful conversations is as much an infrastructure problem as it is a language one. We’ll cover: How Elyos built a resilient real-time stack combining STT, LLM orchestration, and neural TTS. Engineering patterns for error recovery, context engineering, context retention and conversational coherence. The metrics (and human feedback) that actually predict trust and task completion. If you’re building or deploying AI agents that interact with people, this is a behind-the-scenes look at what breaks, what works, and how we’re bringing transparent, high-performance voice AI to the real world.

+ Read More

TRANSCRIPT

Panos Stravopodis [00:00:05]: Hi everyone. Very excited to be here. I'm Panos, co founder and CTO at Alias AI. We are building AI agents for home services. So plumbers, electricians, H vac installers, fire and security companies, you name it. And today I'm here to talk about what we have been building, how we build it, and a couple of things that we found out as part of this journey. So first of all, we're alios. We offer end to end customer experiences for home services.

Panos Stravopodis [00:00:37]: As I said, what we do is that we really integrate with the CRMs and the ERPs of our customers and we are able to track all their communications like in one platform. And we of course offer them every end to end experiences like booking appointments, doing invoicing, doing scheduling, taking payments, everything you would expect like a human agent to be able to handle. We do that by building highly specialized workflows for our vertical. And that's something I want to cover as well as part of the tips and tricks around building agents in a bit. Let me take it back and start from one of the most typical scenarios that we see at Elias. And this is out of our schools, right? Typical scenario is that somebody on a Saturday evening realizes that the boiler is not working. They have a family, of course, and there's no hot water or heating. The first available appointment is probably on Monday.

Panos Stravopodis [00:01:41]: That is like 36 hours away and they need to figure out a way to tackle all that. Right. In this scenario you would expect like the customer to try and google a couple of options and then start calling each of these businesses trying to see if they offer like emergency appointments. The important thing here is that they need someone to attend like asap. Sometimes might be like within one or two hours, sometimes maybe within four hours, depending on what, like the type of service they're being offered. So what do we do in this scenario is that we are able to answer all calls, emails and messages 24 7. So you're going to call one of our customers, the agent's going to pick up, it's going to help you triage the issue because sometimes it might be, for example, you have a fault in your boiler and it can be easily resolved. So our agents can really do the triaging, understand what are the possible resolution paths.

Panos Stravopodis [00:02:45]: If you need a call out, it's going to go and book your appointment, send you an invoice and take the payment before the engineer is dispatched. And then at the back of that, the agent is going to kick off an on call process to notify the engineer. Right. So we are able to handle all that. So after you book your appointments, we're going to start a property on call process. So trying to find an engineer that can do the task. If they say yes, all good, we're going to sign the job and we're going to track resolution time after that. So taking a step back, everything we talk about is of course like real time.

Panos Stravopodis [00:03:30]: You have customers that might be really stressed as part of this journey and of course you have voice agents that need to perform really well at what they're doing. So in the beginning you might think a voice agent is well, especially if you're using cascade architecture, not an end to end real time model. You assume that, well, we're going to get speech to text components, we're going to connect it to an LLM and then we're going to connect it to text to speech and that's all done. So it's a bit more complicated than that because the real world interactions always come with noise either of course, like as part of the phone call, or it might be that the communication is not very clear. So it's pretty possible that the LLM is not going to be able to understand what's going on. Humans are really used to converse in a very specific way and each turn probably takes about 400 milliseconds. Agents of course, need more time. Right, so this is something you definitely need to see how you can address.

Panos Stravopodis [00:04:43]: And then the conversations can be much more complicated than simple tasks. If you have a workflow, you expect that. Okay, well I'm going to start like from the leading node and I'm going to be able to move through all the different steps. Real conversations are not as simple. It's very often that we see users calling for one thing and then they realize that you actually need to do something completely different and we need to be able to understand intent and also if things change as part of the call, adjust to them very, very quickly. So on the same diagram, if we try to add some average latency values to each of these components, we're going to see that straight away that even best case scenario you wouldn't get the 400 milliseconds at least. Easy, right? So we assume that speech to text. Speech to text is going to be anywhere between 100 and 300 milliseconds.

Panos Stravopodis [00:05:45]: I have to say this is one of the most consistent components. So it should definitely be like there. So even if you use like all the different providers of Deepgram speechmatics, Gladia, they all like come in this like place assembly AI, they all are in this range. Now, the LLM, I'm going to talk about it in more detail, like in a couple of minutes, but you should expect a range to. Well, and this is of course like the time to first token. We've seen a lot of different times and essentially latency values depending on the provider and depending on which model you're going to use. Even like let's say with OpenAI, we've seen that GPT 5 and 5.1 are much slower than 4.1. So these are always things that you need to keep an eye out and make sure that you find the perfect model for this use case.

Panos Stravopodis [00:06:43]: And then text to speech is another interesting component. Reason is you should expect again a range here, but depending on the voice that you're using, depending on the language, depending on where you are hosted, you will see things kind of starting from 200 milliseconds, going all the way up to 4, 5 or 600 milliseconds. So in the best case scenario, the, the P90 range here would be somewhere between 500 to 1200 milliseconds, which is a bit challenging. So let's try to break it down and see what other things we can do to figure out how to improve it. So the magic answer here is orchestration. The way we see success is essentially in four pillars. So it's latency, consistency, context and recovery. With latency, of course we covered the core components, but there's a bunch of other things that we need to take into consideration.

Panos Stravopodis [00:07:46]: The first one, of course, is like depending on the framework that you're using, or if you're building your own solution, getting workseconds together, you need to make sure that you can accommodate from warm stats, right? So your workers need to be ready to go. You need when the call starts, to have figured out already which providers you're going to use, which providers are performing really well at this point, how do you make sure that you're not going to drop the call in terms of like, at least from your infrastructure point of view, right? So assuming that you have a deployment, let's say, and you have like workers kind of spinning up and down, right? Making sure that you have a very good consistent way to handle all that and you can drain all these connections because you might have calls lasting for 15 or 20 minutes, it's not very easy to just say, well, okay, let's have a caller call back. What we found was having regional clusters works really, really well. So try to host your pipelines close to your customers and close to the telephony providers. Um, it's very common that for example if you're using like let's say Twilio, the default region is of course in the U.S. but then if you host it like in Europe, probably your telephony hoop is going to go all the way to US East 1 and come back to Europe. You definitely don't want that because it's adding like the total latency. Another interesting thing we found was like keeping your tools close to the orchestration layer.

Panos Stravopodis [00:09:20]: It's super important so you should avoid when you can like having network latency introduced. Even if you're using like microservices, try to reduce that and see if you can actually keep your code base really close to your orchestration layer. If you're using for example light git, definitely try to see if you can just reuse your code base straight from your worker definition. And another interesting one was like the LLM provider routing, right? So you'll be surprised how often depending on the use you'll see different providers even if we're talking about the same model. So in an OpenAI scenario you have of course like the OpenAI host endpoints, you're going to have also the Azure endpoints, right? And depending on the part of the day you're going to see that essentially latency can spike quite a lot. So having something in place to monitor when you should use which location and how can you quickly move between these endpoints is really, really useful. On the text to speech. So as I mentioned speech to text models are pretty consistent LLMs with the routing you can figure out how you can actually choose the right location, make sure that they're always as efficient as possible.

Panos Stravopodis [00:10:49]: Now on the text to speech it's very interesting because we've seen a lot of different spikes. So in general they're pretty consistent and they're doing pretty well. However, there are times that they make the conversation very weird and the reason is if the generation of the voice is not consistent when you need it, the conversation starts becoming clunky. Right. So you need to make sure that you test like the different providers but also you have ways to assess if the generated voice is really good. We've built something house actually like to do that and we monitor of course like all calls after the conversation has ended. But it's pretty, it's really, really important. Most of the times like people spend time like monitoring like time to fest token like on the LLMs and making sure that this is optimized but like this text pitch like this is super, super important.

Panos Stravopodis [00:11:46]: And of course invest early in observability. The sooner you realize when your pain points are, the better it's going to be. And I have to say most providers like Speech to Text foundation models, like LLM models and Text to Speech, they're pretty good at supporting you when you know exactly what's going on and you provide request IDs and all the details that you need over there. Moving to the next below consistency, that's a very interesting one because as we know, LLMs are stochastic, right? And they create because of this. They can figure out a lot of things, but the reality is that you need to make sure that you make these models as deterministic as possible. You should define your expected outcomes and allow for humans to be part of your loop if things don't go well. And always try to minimize the flows that you have per journey. So be very concise, don't try to do 10 different things like in one runtime.

Panos Stravopodis [00:12:56]: Most of the times it might work, but you're always going to have like this variability that you don't know when it's going to fail. The next one is an interesting one. And I'm saying here, avoid unnecessary rag. With models being so fast and able to think, you might want actually to use the just in time context technique that I'll explain in a bit rather than like doing very complicated rag. And the reason is by default your rag is going to have an accuracy linked to that. So by fetching the wrong part from your pipeline, you risk introducing like many, many errors in the whole conversation. So unless you really need to do it, try to think how you can actually fetch the right information in a more efficient way. We found that a lot of times you can make this pretty deterministic, right? Or even inject it as part of the prompt when you know what the intent is going to be.

Panos Stravopodis [00:14:04]: And the last bit, which has been very useful to us is always benchmark your workflows against human agents. There's an assumption that human agents might be able to perform better than AI agents, but if you do a head to head comparison, the results will definitely surprise you. Let's talk about context engineering real quickly. Right, so I mentioned the just in time context. And what that means is pretty much always start with a minimal prompt. Even when you're using SAP agents. Try to strip it down to like to the very, very basic and then figure out a way to inject based on the intent and the classification you might be running behind. The scenes, what exactly is the next step and what the model should have as part of the conversation.

Panos Stravopodis [00:14:55]: Try to avoid putting everything in context and also try to be very smart about using SAP agents. One of the problems with the current architecture of agents, sub agents, is that the main agent is not always aware of what's happening in this sub agent. Still they kind of complete the tasks and in the same time it's very possible that you're going to lose some of the context. So make sure that you always have control of the general context. Like somewhere maybe in your back end might be in a different model, but there are certain ways that you can achieve that. Try to use smart summarization. And by smart I mean make sure that you have a way to understand what's important and what's not in your scenario and always ensure clean handoffs. Right? So if you have especially like different workloads or sub agents doing things, make sure that you define really well what's the expected outcome and how do you see that coming back? One interesting thing we found was like if you leave tool calls in the context for a long time after this context being used, you risk confusing of course, the model and creating like, well, getting essentially your context to rot.

Panos Stravopodis [00:16:18]: So always have a way to summarize the results of the toolkit and especially if you don't need it, just take it away completely. Always assume that your context is going to be out of date soon, so have a way to validate that it's still going to the right direction and still using the important information. And especially for voice track, sentiment and tone, it's very important because you can understand a lot about how the conversation is going to last bit I want to talk about is recovery. It's really useful to track state in your backend approach everything as a state machine, so all your workflows and all your journeys, they should be state machines really, because then you can use really clear errors to guide the LLM through the recovery and you can increase the probability of getting a really good outcome. Always do post runtime reconciliation. You know what you're trying to achieve and you can have one or more models at the back of conversation making sure that everything is correct. And if it's not, like what are the steps to take to reconcile all that? Don't annoy users when input is unclear. Speech to text models, for example, are pretty bad at getting emails and postcodes, especially in the uk, right.

Panos Stravopodis [00:17:46]: So if you see that things are not going as expected, then the model needs to ask like a couple of times like about postcode, have a path of uncertainty and make sure that you don't need to spend too much time. Always have an escalation path to a human. That's very, very important. Sometimes, especially when callers might be stressed, they want to have this option and it's very important to offer this. And it can be also an escalation path that not necessarily to transfer to a human, but more like you have a human in the loop. Or essentially they can give you like instructions for the workflow or they can prove something as part of the workflow. And the last one is run a parallel stream with a fast. With a FAST LLM as a judge, the idea of this one is parallel to your conversation.

Panos Stravopodis [00:18:34]: Have a fast model, always going through what's going on and making sure that you're still going to the right direction. Right. If not, use your context to pull the model back to what you expect essentially to happen. Before we move to questions, these are the key metrics that I think everybody should be monitoring. Of course, as we said, time to first token is super, super important. But then when we are talking about the conversation, groundedness is super critical. Making sure that everything the agent is doing makes sense. You don't have hallucinations, you're moving according to the different scripts that you have and the different workflows.

Panos Stravopodis [00:19:18]: And then of course, when we're talking about conversation quality, interruptions, repetition of words, and after the outcome, did it work? Did the agent actually do what it was supposed to do? These are all like very, very important. Two other things to keep in mind, general sentiment, both for the caller, but also how the caller felt their request was handled is something you need to keep an eye on because it can give you like a lot of insights on how you can quickly improve what the agents are doing. And the last bit is the most common failures, right? By just focusing on these scenarios, you can improve the quality of the agents very, very fast. And if you can automate that as well, that's super, super important. I think that's all good. Let's move to questions.

Adam Becker [00:20:13]: Awesome. That was incredible. Panos, thank you very much. I'm sure that we're going to give a minute for all of the folks to submit questions, but until they do, voice agents is a topic that is just so close to my heart. I have about 555 questions for you.

Panos Stravopodis [00:20:33]: That's great. I mean, while we wait, you can fire away.

Adam Becker [00:20:38]: Actually, the first one is coming in. Reem is asking how did you handle the interruption from clients While the AI is speaking.

Panos Stravopodis [00:20:45]: Yes, that's one of the hardest problems. As everybody knows in the industry. There's a bunch of things you can do over there. Right. So it's more about how do you handle the text to speech triggering rather than the actual interaction. If it makes sense. Right. Because what you want to do is you need to have a way to make the generated voice to pause in a natural way because you're always going to have these interruptions.

Panos Stravopodis [00:21:14]: Right. And it's absolutely fine. But of course, as most of our agents are half duplex, it means that essentially only one of the participants can actually talk at the same time. By default, of course, Live Kit has a few techniques that they're handling, like interruptions. I would say the thing people should be doing differently is when to trigger like text speech. And with some of the newer models they already have. So I think Cartesia as well, like introduced something in the space that essentially you can have some back off as well. So it can start generating like voice.

Panos Stravopodis [00:21:56]: But it's very good at pausing the phrase rather than like pausing a letter as they currently do. Because you cancel the whole text speech. You can figure out how you can complete like a full word or sentence, depending on how far you want to go.

Adam Becker [00:22:14]: Okay, I think I got maybe 50% of that.

Panos Stravopodis [00:22:18]: So.

Adam Becker [00:22:19]: So are you saying that the idea is you can combat the interruptions by allowing the agent to kind of like pause, not mid word, but let's say at some point in the phrase, and therefore give an opening for a human to interrupt at a place that feels more natural? Or it's just that the interruption then doesn't sound nearly as like audibly corrupt.

Panos Stravopodis [00:22:43]: Exactly. That is the latter. So essentially similar to this conversation here. Right. So if you try to interrupt me, you're going to say a word you're not going to say or a sound? Pretty much similar to what the models are doing now. Right. So if you can find out to. Pretty much.

Panos Stravopodis [00:22:59]: It's similar to having an observer. Right. So you have an observer pattern where essentially you monitor the conversation and you just make sure that you. You cancel text to speech when the full word or sentence, depending on how far you want to go, is actually spoken. And that makes the interaction a bit better. So you're not handling the interruption per se, you're handling the perception of the interruption. If it makes sense.

Adam Becker [00:23:20]: Yeah. Yeah, that's very cool. Nice. Subtle. I like that. Let's see. Reem, if that answers your question. Awesome.

Adam Becker [00:23:29]: Otherwise, feel free.

Panos Stravopodis [00:23:31]: I have, I have a slide like Here at the end, which is like my LinkedIn. So if anyone has more questions, by all means, like add me in LinkedIn and have your chat.

Adam Becker [00:23:40]: Awesome. And stick around in the chat afterwards too. So we're going on to thank you.

Panos Stravopodis [00:23:44]: Reem.

Adam Becker [00:23:44]: Brian, question. Thank you for your talk. Do you let the agents update the state themselves or do trigger state.

Panos Stravopodis [00:23:52]: I'm not sure how to help you with that.

Adam Becker [00:23:54]: Not you. This is a poor management of interruption. So do you let the agents update the state themselves or do you trigger state updates from different specific events that occur?

Panos Stravopodis [00:24:09]: That's a good question. So it really depends for flows that we've seen that working pretty well and consistently you can have like the agent. Well, I wouldn't say necessarily update the state, but more like kind of figure out the intent and be able like to choose especially tools and where to go next in more freedom if you. We have some very sensitive workloads though. For example, if you're doing invoicing, payments, things like that, that you definitely don't want the agent to automatically update the state and all that. As I mentioned, this is kind of like in the backend. Right. So approach this as a state machine.

Panos Stravopodis [00:24:49]: Make sure that you have a very flexible way to define like workflows and different steps. A good way to think about it is also similar to step function. So essentially you move from point A to point B. What are the success criteria like for this? How does it look like if it doesn't work, push that feedback back to the LLM. Right. So LLMs are really good at reading error messages and doing things differently and continuously iterate over that.

Adam Becker [00:25:18]: Yeah, I think that actually connects with another question somebody said. So I'm going to jump. I'm going to skip ahead a couple of questions and then we'll come back. So no worries.

Panos Stravopodis [00:25:27]: Sure.

Adam Becker [00:25:28]: We have. No, no, no. Is asking. Can you say more about have a flow run a state machine in the background so that you can guide it towards better results?

Panos Stravopodis [00:25:38]: Yes. So as I mentioned, think of it as a set of step functions. Right. In any journey you can have a predefined set of steps or a mix of predefined so deterministic and stochastic steps. Right. So you can define like workloads in a way that you always know that you're going to go from point A to point B. But point A or point B might either be a deterministic like piece where essentially a piece of code that you're going to run like through a function, or it can be another invocation to model. Right.

Panos Stravopodis [00:26:11]: So you can do a lot of smart things if you combine like these two, like worlds. But the best way to do it is make sure, let's say in the journey. I'll give an example like from our world. Right. So let's say that I'm calling to book a boiler service. Right. The first thing I'm going to need to do is to identify the customer. If I don't identify the customer correctly, I cannot really move to book like an appointment.

Panos Stravopodis [00:26:36]: Right. Because I don't know who the customer is. Right, sure. But then you might want an LLM to actually understand based on where the customer is and what the contract and all these kind of things, like what the response time should be. Right. So this is a good example of a workflow that you start by deterministic, identify the customer. Deterministic, like create the job essentially. And then you have a stochastic piece where you put a bunch of information in there, like try to identify can I service the customer? Like what's the cost? Based on where they leave, what's the coverage and all these kind of things.

Panos Stravopodis [00:27:10]: Right. But then based on this output, you can continue like to actually book an appointment and doing the invoicing and everything else. Right? Yeah, I hope it makes sense. It's kind of complicated, but yeah, like.

Adam Becker [00:27:25]: I'm really curious about the actual implementation of something like this. So first of all, do you. So you pre define these steps, you also sort of like guarantee a particular sequencing of them. So you say, well, first you got to do this, otherwise you can't go and explore in the stochastic domain. You have to go through a deterministic phase before.

Panos Stravopodis [00:27:44]: Right, that's. That's exactly it. And so we build all the in house. Right. So it's a, it's our workflow engine. It's one of the things that we've built in house and it's working really well for us. Again, it's think of it as a combination of, you know, Zapier or NA10 with the tool capabilities and like the different functions that we have built in house. Right.

Panos Stravopodis [00:28:06]: So it's complicated because essentially what you want is to give it enough freedom to be able to move from step A to step B, even if the inputs are slightly different. But then you're optimizing for very high reliability. As I said, if you're doing invoicing, payments, all those kind of things, they need to be 100% correct.

Adam Becker [00:28:29]: And for folks who are, let's say, not building these things in house, Are there other frameworks that you would recommend that you think do a decent job?

Panos Stravopodis [00:28:38]: So we've tried a couple of frameworks around orchestration. The challenge is that a lot of them are pretty slow for voice because they're doing a pretty good job when you have emails or essentially WhatsApp communication. But most of them are adding both complexity and latency like to the mix. Right. So frameworks are great, but they're very opinionated. So you need to like be very careful when choosing something to introduce it like especially to latency critical paths. But I would say that if the only thing you're doing is like emails or text communication, it's absolutely fine to use, you know, LangChain and always get folks.

Adam Becker [00:29:21]: On the other hand, we got a question from Diego. Hi Diego, what is your experience with cascade versus E2E architectures? Is there performance gains by using Cascade for complex tasks?

Panos Stravopodis [00:29:34]: I would say real time models starting getting better after the latest release of Gemini and the 4.0Mini from OpenAI earlier we would see massive difference in performance and reliability. Before that especially multi turn conversations on function calling. Like the real time models were pretty bad and I know that the team, we're pretty close with team in OpenAI and like Google team and I know that they're doing a lot of work on that. The other limitation that we had with the end to end models was like especially if you're looking for different languages or if you're looking for particular accents. Right. We are based in the UK for example and in order to be able to handle all the different accents, the real time model wasn't doing a great job because we have some custom trained models around ASR and it's pretty important to be able to choose your components. But from a latency perspective of course I expect real time models to be much better because you don't need to do any more trips. Everything is running over there.

Panos Stravopodis [00:30:43]: But yeah, TL Dr. We are not there yet but future is very promising.

Adam Becker [00:30:49]: Okay, along the same theme then. Question from Komal Vendidandy. Great insights Panos. One, how much data is needed to train a new TTS agent and can you share more details on the process? And two, can you share how many percentage of calls are rerouted to humans? Maybe we'll keep that after you're done with this first one. Okay, so training a new text to speech agent. How much data is needed?

Panos Stravopodis [00:31:12]: It. Yeah, we're talking about like text to speech model, I guess. Not. Not an agent. Right.

Adam Becker [00:31:18]: I think that's.

Panos Stravopodis [00:31:20]: Yeah. So it really Depends on the language and what you want to do. On average, like, most of the models can do pretty good with two or three hours of audio. The trick over there, of course, is that you really need to simulate the tone and conversations that you want this voice to perform. You cannot have one voice, for example, to do customer service and then the same voice do like, different things. Right. Because the tone is quite different, the interpretation is quite different, the pace is quite different. So you need to kind of have like at least three hours of audio, really high quality audio on that.

Panos Stravopodis [00:32:05]: Now, the trick is, of course, like, if you want this to sound more natural over the phone, you might want to have a similar environment as well. So essentially you can be in a similar room because all these things are very important for the tone and the flow of the voice. Yeah. And what was the second part of the question? Like, how many calls are being transferred?

Adam Becker [00:32:27]: Yeah. If you have a sense of the percentage of calls that are rerouted to humans, I think you said that you should build it in such a way that there's a. You can have.

Panos Stravopodis [00:32:35]: Yeah. So it really depends on the customer. Right. We have some customers that we have 0% transfers. And the reason is that because of the systems they have in place and the way that the flows work, you don't really need to do any transfers. So the agents can handle 100% of the traffic. We have some other customers that want the transfers, mainly because it might be customers calling for different quotes, let's say, in solar. Right.

Panos Stravopodis [00:33:02]: And because, like, the value of these deals is pretty high, they actually prefer, like talking to humans because we're talking maybe for 20 or 30,000 projects. Right. So essentially, you know, it's pretty big value for them. But I would say, on average, I would say about 15% of all calls require a human kind of touchpoint. Not all of them have been transferred. Right. But these are the ones that potentially you would like to involve. Like a human.

Adam Becker [00:33:31]: Yeah. Even after the fact that you can go and review it and say, yeah, maybe a human would have been good here.

Panos Stravopodis [00:33:37]: Yes. So essentially we do that out of the box. So essentially, you know, as a call comes in, if we feel like that this call should be handled by a human or we should give a call back, you know, it's going to like in the platform. But the reality is it's mainly around the harder topics, like, for example, complaints or if there is any question about that's very complicated and we don't integrate with the relevant system, might be the case. For example, they're calling about a job they did like five years ago and was like in different system. Right. So these are the kind of edge cases that you need a human definitely to handle. Yeah.

Adam Becker [00:34:16]: Okay, so let's do a few other ones quickly. We'll do them lightning style. Quick question here. Recommended agent architecture for complex workflows. We sort of covered frameworks, but architectures, architectures.

Panos Stravopodis [00:34:37]: I would say if it's very, very complicated, try to avoid too many sub agents because if you go for more than seven or ten, let's say like sub agency, you expect things to happen. Things might not go as expected. So the more complicated it is, try to maintain, as I said, like state in the backend and try to move like between nodes, like from there. So keep the model as deterministic as possible. Make sure that your tools also allow for really good error handling and passing good feedback to the model. So as you go through the different steps, the model really knows what's happening. And for example, if something fails, why it failed because the models are really good at kind of going back and figuring it out after that.

Adam Becker [00:35:26]: So this is a question that I wanted to ask as well. Ruslan is asking, can you explain the just in time context a bit more? So it just what is. How are you actually doing it? I mean, you seem allergic to rags when they're not necessary. How do you. How does the just in time context help with that?

Panos Stravopodis [00:35:45]: The just in time context works. Again, it's very domain specific in terms of like, of course for a horizontal solution, you still might need rag, right. And kind of look at other options. But for us, because we know as part of these workflows what we can do is that we can connect to the expected context that needs to be in the agent, like, so we can continue like moving forward. And for example, if I know that you're doing, let's say an emergency call out, right. So the first step is going to be to provide the context to identify if it's an emergency or not. Right. So you know, in the next turn, what I'm going to add to the agent is pretty much, okay, well identify if it's an emergency.

Panos Stravopodis [00:36:29]: These are the five top jobs or 10 or 20 jobs that you're going to do that. Then in the next turn you don't need that context anymore. What you, the only thing you need is pretty much it is an emergency. That's it. So you made it deterministic. Right. So you don't need to add again the conversation history around how you figure out that it is an emergency. Right.

Panos Stravopodis [00:36:50]: Then you go to the next step.

Adam Becker [00:36:52]: A lot of the context that was. So first of all, you clean the context that isn't necessary.

Panos Stravopodis [00:36:56]: Yeah, that's the just in time. So essentially just in time in, just in time out. So if you don't, if you don't need it, because if you have. Actually if you're very confident that this is the correct decision, you just need that. Well, this an emergency, you don't need the context of how you go to that anymore. And it's the same with tools. I mentioned it on my, on my slides as well. It's very common that people leave the results from the tool calls like for a very long time after they're needed.

Panos Stravopodis [00:37:27]: And that can lead like to a lot of different issues. Right. Because if for whatever reason the agent needs to call again the tool with the same name but with different parameters, then you start seeing, for example, it's getting confused. Well, should I get the result like from the first tool call or from second tool call?

Adam Becker [00:37:43]: Yeah.

Panos Stravopodis [00:37:43]: And of course, like, of course they have different tool call IDs and everything, but it's all in the context. Right. So it's a bit unclear.

Adam Becker [00:37:51]: Yeah, that's. Ricardo is asking how do you handle memory in the context of low latency applications like voice agents, and how do you inject the necessary information into the context without increasing latency? Well, you partially answered that question.

Panos Stravopodis [00:38:09]: Yeah, but I assume with memory we're talking more about previous communications, maybe not necessarily like the context of the current conversation. Right. So we built something for that where essentially we have a very slim and consistent way for the agent to know previous communications and interactions. Think of it as a quick on the fly summarization that happens that, well, this caller has called, for example, two other times and they ask for this and that and that, and these are like the IDs of the jobs that were created, let's say. But what you need over there is also to have a flexible way to go back in time because you might have, for example, a customer calling for something that happens every two years. So you don't want to add all that to the memory slash context, of course. But there are smart things that you can do in terms of like if you know that a particular company runs like campaigns every few months or every year. Right.

Panos Stravopodis [00:39:11]: Or whatever, you can actually prepare all that for the agent because you provide context to say, well, currently we're running this and that's how it works and that's how we expect things to be.

Adam Becker [00:39:23]: Moving on Panos what do you do with languages other than English? I mean, it looks like many of the services seem to only exist in English.

Panos Stravopodis [00:39:34]: English right now it does. Well, we don't. So we mainly operate like in the UK and we have a couple of customers in the us. Right. So it's all English based. But I am originally like from Greece, so I've built a few agents like that can do Greek and also we have some people in the team like from France, so we've built some agents also they can do French. Reality is that models are doing much, much better in English, however, depending on the language and the. And that's another beauty of the cascade architecture.

Panos Stravopodis [00:40:10]: Right. Because you always going to have, let's say for example, Gladia is doing a much better job in French because Jean Luc of course is like French. So they've prioritized the language. So being able to choose the right components is very important. The quality of the output is not the same, especially because the way you prompt the agent and what it understands, especially in Greek, for example. Right. I mean it's not doing a terrific job. It works.

Panos Stravopodis [00:40:44]: But there's a lot of work that needs to be done over there. I had some good results with translations though. Right. Because you can use Tory, let's say, or some other tools like to do kind of on the fly translations. And this is where it's kind of doing a much better job because then you still do all the function calls and all these kind of things like in English. Right. So deterministic. And then you do the language like translation on the fly.

Adam Becker [00:41:10]: Panos is this. I don't know if it is a stereotype or not. Last question. Is it true that the Greeks are sailors? Are they good sailors? Are you yourself?

Panos Stravopodis [00:41:21]: I am myself. I don't think. Well, I mean historically it's been true and also we have our presence in shipping worldwide. It's pretty big for the size of the country.

Adam Becker [00:41:35]: Are you sailing the Atlantic at any time soon?

Panos Stravopodis [00:41:38]: Not anytime soon. I really want to do it. I'm hoping to do in the next couple of years. But yeah, it's a time intensive unfortunately, like dream and yeah, I'm busy with other things right now. I would say.

Adam Becker [00:41:50]: I can imagine. Well, thank you very much. Keep us posted if you do decide to cross the ocean.

Panos Stravopodis [00:41:55]: Absolutely.

Adam Becker [00:41:56]: Please stick around the chat. I think a lot of folks have questions for you.

Panos Stravopodis [00:42:00]: Yeah, absolutely.

Adam Becker [00:42:01]: Or LinkedIn there as well.

Panos Stravopodis [00:42:03]: Sounds good. Yeah, absolutely.

Adam Becker [00:42:06]: It's an absolute pleasure.

Panos Stravopodis [00:42:07]: Thank you very much. Thank you everyone.

+ Read More

Comments (0)

Popular

Watch More

Lessons From Building Replit Agent // James Austin // Agents in Production

Posted Nov 26, 2024 | Views 1.6K

# Replit Agent

# Repls

# Replit

Generative AI Agents in Production: Best Practices and Lessons Learned // Patrick Marlow // Agents in Production

Posted Nov 15, 2024 | Views 6.5K

# Generative AI Agents

# Vertex Applied AI

# Agents in Production

Building Reliable Agents // Eno Reyes // Agents in Production

Posted Nov 20, 2024 | Views 1.7K

# Agentic

# AI Systems

# Factory