Transforming Healthcare with AI: Automating the Unseen Work // Shaun Wei // Agents in Production
Shaun is the co-founder and CEO of HeyRevia, an AI company transforming the way healthcare providers manage complex phone interactions. With a background at Google Assistant and experience leading autonomous vehicle projects at Pony.ai, Shaun brings deep expertise in AI to his current venture. HeyRevia aims to streamline healthcare processes by automating tedious phone calls, making providers' lives easier and more efficient. Backed by Y Combinator (S24), Shaun and his team are committed to pushing the boundaries of AI voice technology to empower both healthcare professionals and patients.
This talk explores how AI can automate the invisible processes in healthcare, like insurance calls, to reduce administrative burdens. I'll share our journey with HeyRevia and how we create AI solutions to help providers focus on patient care.
Andrew Tanabe [00:00:06]: Hi everybody, welcome Back to track 4 and agents in production. We've got a really great talk coming up from Sean who is going to talk about some of their work in healthcare and the call center. So Sean, I'll turn it over to you. You've got about 20 minutes here for your presentation. I'll come back and give you a little bit of a time check towards the end of that and then we'll have five minutes of Q and A from the audience. So Sean, take it away.
Shaun Wei [00:00:34]: Thank you Andrew. Hi everyone, this is Sean and I'm the CEO of Hey Rivia. So today I'm going to talk about create AI call center agent for healthcare. And thanks for also for inviting me to the AI agent in production. Hopefully you have learned a lot and let's get started a little bit about me. I have been building AI for the last decade. I started my career in Google building Google Assistant. If you remember back in 2018 there was AI calling restaurants and hair salon on your behalf.
Shaun Wei [00:01:12]: That's part of Google Assistant. After Google I joined a point of AI building self driving cars and building on how the autonomous vehicles works and how the autonomous agents works in production. Then I built a open source voice agent project with 6,000 GitHub stores and personally as a AGI believer and you can follow me on the AGI show. As for today's topic, we're going to cover why we should have AI call agents for healthcare and how the voice agents landscape has been shifted for the last couple one years. And I'm going to introduce how he Rivia works and some of the real world use cases. How Rivia is handled right now for our providers and patients. First, why AI call agents for healthcare? If you don't know that's more than 30% of operations for the healthcare are running on phone calls. Those phone calls can include very simple calls like your appointments and the check ins.
Shaun Wei [00:02:22]: But for the provider part that's more complex. They need to work with insurance companies work with you to handle like things like credentialings help them to get in the Internet network and also handle claims if some of your insurance get denied and they have to handle those to negotiate with the insurance companies. Also like when you before even you go to the ad clinics they also responsible for check the authorization of prior auth your insurance before so your insurance can pay your visits. So that's why like the whole healthcare infrastructure is running on the phone calls and all those phone calls are really tedious and can repeat itself so many times a day. So what Is the current solution if you own a clinics or if you provide professional business solutions for healthcare industry. You know you pretty much work with BPOs and call centers all the time. And so what it actually happens is two side of the business in healthcare like if you are clinics or if you are professional business providers or if you're an insurance company, you pretty much hire agents in call centers talk to each other. Sometimes they may just sit next to each other and but they have to go through phone calls to talk to them.
Shaun Wei [00:03:51]: There's no some somehow there's no better way to handle those just through all the phone calls. And all those phone calls are really really, I think really repeats itself. So this is like we have when we work, when we check all the healthcare we find there should be a better solution for all the call center agents. And AI agent already has been really good at talking to humans and understanding the scopes. So that's why we started to build the heribia to replace all the AI call agents in those call centers. So let's talk about all the voice agent landscape, why people are excited about the voice agent and what are the new things you can build with the newer tools. So one thing you have noticed with in the last two years all the speech to text and text to speech has been getting so good and with larger model itself can understand. Start with all the text then starting with then with all the voice audio input directly.
Shaun Wei [00:05:05]: If you have seen all the newer OpenAI Real Time API you realize they can communicate really fluent invoice conversations. But then you get the question of with all those audio input and audio output how do you actually control the voice agent in production. So on the right is a simple graph with all the pipeline based voice agent platforms. Those platforms including the popular solutions like Vapi Retail Blend they all build on top of this solution like in the graph on the right. So start from the very bottom of the tweed use on the bottom or like 10x. Then you have the streaming input you have to handle with WebSocket and WebRTC. Then you have all the speech to text ASRs like assemblings, deepgrams, whispers, those solutions help you to convert audios into actual text. With all the text based larger models those text will be understand in the larger model right now is not working with your text streams, they're working with text chunks.
Shaun Wei [00:06:25]: Meaning you have to make sure you have the correct text or when the sentence finishes, you send those text into the larger model. Once you get the text out, you use the text to Speech to generate all the audios back for a voice agent. That's how it works. It goes through the same pipelines from the telephony systems all the way to the text speech. So what are the limitations you may ask for all the voice agents, you know you have to control all the audio bytes input and audio bytes output. Everything in between. Meaning if there's certain things like get latency delayed. For example, if Twilio is done, your system is crashed.
Shaun Wei [00:07:16]: There's no phone calls can be generated. The problem with those is with all the voice agent you have to control every little bit of information. Twidu give you information in every 20 milliseconds. If your system is not performant enough and get delayed for 100 milliseconds, your user or the other side going to notice there is significant delay. That's why a lot of systems built on top of the pipelines really suffer from short latency. Like if latency is going beyond 500 milliseconds, they notice with that. Also common issues with larger model. There's hallucination problems when you especially in healthcare that's really hard.
Shaun Wei [00:08:06]: For example, if you are working with a pharmaceutical companies. If you're trying to tell the other side tell the receiver and saying something about your prescriptions. If you get the millisecond wrong or if you get the volume wrong of them for the medicine, it might kill someone. So that's why for hallucination problems in health care is really critical or it's sometimes even deadly. Also like for voice agent. If you know to have a natural conversation with a human for 10 minutes, make sure the avoid agent also completes the task you want ask you to do is extremely hard. So what it gets, what the voice agent have to do is you have to keep retrying the same type some calls. Sometimes if something like really the voice agent is really confused, you're going to have issues with the agent talking to a random result.
Shaun Wei [00:09:11]: Then on the other side you have human callers. If you're working with human callers, you know they can only make one call at a time. You are. They are very limited with all the information you share with them. They have the big trends. Then they can start to make those phone calls. So that's where the heal rivia comes in. Heal rivia.
Shaun Wei [00:09:32]: Like how we do is like we're trying to put all the AI agent into the AI call centers. And it's a fraction cost of how the human cost and already add better human latency. Well accuracy how did we do that? We started with building an agent like a human think a human. A lot of agents are building on top of like the pipelines. But for us, our agent is in the phone calls and trying to communicate with. Trying to understand the context through live input. Our AI actually understand all the IVRs. So how it works is it understand why you're making the call? It trying to understand what are the menu items when it's listening into those phone calls.
Shaun Wei [00:10:19]: Then trying to decide okay, should I take a press buttons or if it's a music playing, should I wait and hold for a human to enter the call. It's all through the similar parallel pipeline. Once the human enters the call, you're first trying to understand is talking to a human. Then it starts to based on the object you have set to the agent, it will try to negotiate on your behalf with the human agent. For us like handling 30 minutes phone calls is pretty simple. The reason being is like our AI is working as a human. A lot of things like for voice agents you run into is like you don't get to control what's happening in the in the phone call. But since we process all the information in real time and we work make AI agent like a human, it actually you can actually jump into the call like take over the phone call saying okay, you know the AI you just like sit in the backseat.
Shaun Wei [00:11:17]: I'm going to take over the phone call and talk to the human directly. Then you'll just continue the conversation for me. After I just like give you back to the control, then the question is how do I actually how does the work on the scene? If you have seen all the voice pipelines, you know, it's like every step you have to like have separate component for it. But for Rivia, this part is really highly inspired by how the self drum card works. When we say like it understands all the information in real time. There's a perception layer. It trying to understand all the audio bytes and all the information we have passing to the phone call. It including if it's a music playing, if it's already talking to a human, or if it's talking to a IVR system.
Shaun Wei [00:12:07]: It knows those information in the perception. Then it's trying to predict what's the right behavior reviation behave. Very common issue a lot of human encounter is if you are putting on hold for 30 minutes. There's nothing you can do, right? You just waste your time there. But for Rivia, it knows it's talking to its nose, it's weight on hold. What it internally does Is once it knows it's wait and hold, it going to pause all the processing powers or all the larger powers for that phone call. It's just like going to sit silently and wait for the human to get into the phone call. It's going to save you a ton of token just by predicting, okay, we should be waiting hold for the human to answer the call.
Shaun Wei [00:12:58]: Then the next part is planning. If you listen to the previous talk, a lot of speaker talking about why voice agents is different than human right now, I think primarily because of the planning aspect for humans. We know when you're making calls, there are certain steps you have to do or if you even know a few things ahead of you, you have to talk to the other side before you can complete a task. But if you only use larger model as a whole prompt, there's no way you can give the information to the AI to ask it to do the right thing at the right time. But given the planning layer, if it is able to try to think ahead of its time, think of, think of, think ahead of all the information it has and trying to predict, okay, what's the next few things? If something is wrong, it's also able to correct the course of how it processes the phone call. The last bit is control, right? A lot of things, if the AI is starting to hallucinate, if it's like starting to get off track, that's where the control comes in. It creates a simple guide rail for the AI to make sure it's stick to the main object that we're trying to say. For example, if we are working with pharmaceutical companies, you don't want the AI talking about your meals or lunch, those things.
Shaun Wei [00:14:22]: So that's where the control comes in. So then you can ask like okay, this sounds great, Sean. And what about how do you know if the AI is really outperform human? So our philosophy, philosophy is like if you're trying to ask AI agent to do the similar human work, you have to evaluate it like a human. So you can see how the graph works is we benchmark how human and AI works for the similar scenarios. We understand all the transcripts, we understand all the things what human said. And on AI side we try to compare them like against each other. How human performs and how AI performs. You can see here based on our data, our AI is already outperform human in a lot of those scenarios.
Shaun Wei [00:15:16]: Next. So the Rivia now is like a standalone AI agent as itself. It's like a human, right? So you can ask like how what's the right way to work with a newer generation of voice agent that can perform on your behalf. There are two type of interactions we work with Rivia. One is through our Call center API, meaning when we work with Rivia we care about the work Rivia generates for us. So we have something called Work API. You can submit those call works to Rivia. If Rivia itself like when during the phone call is hallucinated and fails that phone call, it knows the right way to collect the information from you.
Shaun Wei [00:16:04]: For example, when we help with providers to call insurers if the provider ID or the NPI number is wrong Rivian identify those issues during the phone call. It's going to ask you those information either live or complete the phone call. Can ask you the information before you try to retry call the insurer back with the information. It's all by like Work API itself, the Rivia is able to identify its own issues and learn from its mistakes and continue to call them on your behalf. That's what we call as Call center API. If you want to have more control over Rivia yourself. We also have a call center UI, meaning you can see we are making like 10 or 15 calls at the same time for Rivia and you can jump in any individual calls. Just like when you supervise a call center, you can jump into any individual phone calls and try to.
Shaun Wei [00:17:03]: If you spot any issues, you can take over the phone call and talk to the human from the other side. That's the call center ui. So some of the real world examples. So for a 30 minute phone calls especially for in healthcare, it's really look like what I paste here. You go through the IVR system trying to understand what's the right way to get to human. Then you're going to provide a lot of repeating information about NPM member id, bunch of things. Then you're going to wait and hold for them to come in. The last is going to negotiate with the human for the information you need.
Shaun Wei [00:17:48]: Then remaining is like okay, once everything you need then you start to handle the phone call. A lot of the things like for healthcare are the credentialing prior halls like prior auth and the referrals and consultings and bleeding inquiries. Those are the things that get really important for healthcare. It takes a lot of time for them to complete. But with Revl we are able to handle pretty much like 99% of those cases at a really high accuracy. So yeah, so I have a quick demo here but it's also in the he real website. I think we may not have enough time to go through all the examples. So yes, I'm going to have some.
Shaun Wei [00:18:37]: Give the time to Q and A if you have any questions.
Andrew Tanabe [00:18:42]: Yes, thank you so much Sean. Really, really interesting work that you're doing in the healthcare space there and great to sort of get a sense of the, of the architecture, sort of the ability for the humans, you know, like the management to step into each of the calls and to provide that level of accountability. Yeah, it's really quite interesting. We do have a couple of questions here in the Q and A and one that I wanted to ask first is you know, again on this theme of working in healthcare in a regulated industry. There's a, there's a question here around is there, are there any sort of particular scaling challenges in your, in your agents or in your agent frameworks or maybe in your testing that might be unique to healthcare or to being under regulation like that that you might not run into if you're sort of in a different industry?
Shaun Wei [00:19:38]: Yes. So I think in healthcare the question you guys ask a lot is like all the compliance you have and how do you handle data and the retention and for us we post all the larger model ourselves so we have a retention policy for all of them make sure their data and all the AI training on their knowledge going to only used for one client at a time. So those data was never going to share across different providers or patient or provider or business. So that's one thing that's super important to them. Then the second is how do you handle all the really PI eyes information in our systems. The SoC2 and the HIPAA are the mandatory for you to operate in the space. I think this is a really good starting point Then beyond that is how do you make sure all your solution providers no matter it's speech to text, text to speech. All the service vendors provide service also are suck to compliance and HIPAA compliance.
Shaun Wei [00:20:50]: So those are also another important part for everyone who work with us. Make sure like everyone who else like you interact with also HIPAA compliance. Then third is like you know for, for the government agency they might come in well hopefully and things going to change. So if there's anything like going to be leakage, you know, make sure like all the system you have is up to date with all the security like standards. Make sure there's no bug, make sure no bug there security issues have been patched so stay updated with all the security issues you may encounter. Those are the things like we make sure we follow.
Andrew Tanabe [00:21:28]: Yeah, yeah, no, I mean it's A lot, especially in the American context. Maybe a little bit different in different countries, but I'd imagine quite a lot to handle. Another question that we have is from Kofo here is do you have, are you fully integrated into the EHR systems directly or is that sort, sort of something that you lay on top of when you're making these calls?
Shaun Wei [00:21:54]: Yes. So right now we are laying on top of EHRs. We don't work with EHR directly right now since we show like proof of work. So yeah, so we rely on our customers right now to work with us. Since we are more like the AI call centers, we work on your behalf of making those calls, but we don't directly integrate with our system yet.
Andrew Tanabe [00:22:18]: Yeah, no, that's great. So we've heard a little bit about earlier in the conference today, we've heard a little bit about perspectives on performance of these agents when they get out into the real world and they're being compared to benchmarks, whether they're academic benchmarks or real world benchmarks, like services customer service numbers, like you were showing earlier. Your numbers are quite positive, which is really incredible. You know, one question that we have here is in terms of customers and businesses using this technology, do you find that they're mostly, are they looking for performance better than their current solution, which is, you know, folks making these calls going by different SOPs, or are they more concerned with not underperforming the worst performers in their current solution? Do you know what I mean? Like there's sort of two ways to measure your threshold there. Are you trying to get better or are you trying to not be worse? What do you find in the market?
Shaun Wei [00:23:26]: Yes, so, so we, so I think all our customers started with concerning, okay, you know, can AI actually outperform human in those phone calls? But we, we see our results as showing them like we are much better than humans in those phone calls and we're much more efficient. And yeah, I think there's two level of metrics, but larger models do make like some like simple and stupid mistakes. So, but how do you prevent those, like cash those in during the phone calls and that's like a harder part. And but for us we have a showing to our customers, like with the right technology and how you can like handle those calls, like even jumping the cost, you can make the AI actually much better than human in a similar situation. You know, one example can show, like for some of the claims for humans usually take probably two to three phone calls for them to actually find the right reason to why they get denied but for us only. We take up one to two calls. Our AI can actually negotiate with the human and push back on the human representative to get the actual results. And, yeah, it's actually human in a lot of scenarios.
Andrew Tanabe [00:24:38]: Cool, Great. Well, thank you so much, Sean. I think that's all the time that we have for now. Really appreciate you coming in and sharing your thoughts here and hope to see you soon.