MLOps Community
+00:00 GMT
Sign in or Join the community to continue

How LiveKit Became An AI Company By Accident

Posted Sep 22, 2025 | Views 89
# Product Market Fit
# Open Source Project
# LiveKit
Share

speakers

user's Avatar
Russ d'Sa
Founder and CEO @ LiveKit

Russ d'Sa is the founder and CEO of LiveKit, the infrastructure company powering real-time voice for OpenAI's ChatGPT, Character.ai, and numerous AI applications.

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Russ d'Sa shares how LiveKit went from a small open-source project during the pandemic to powering voice interfaces for giants like OpenAI. He talks about the turning point when LiveKit teamed up on ChatGPT’s voice features, the challenges of making AI sound human, and why voice could be the future of multimodal AI. It’s a story of chance, big shifts, and building the backbone of tomorrow’s AI.

+ Read More

TRANSCRIPT

Russ Dsa [00:00:00]: We started LiveKit right in the middle of the pandemic. Grew really fast. We had a lot of, like, big companies that started to use it product.

Demetrios Brinkmann [00:00:07]: Market fit right from the get go.

Russ Dsa [00:00:08]: But we didn't have a product. I mean, we had an open source project. I got pinged by a really large cloud provider and they're like, can we buy you? Can we license LiveKit Cloud? Or maybe we'll just kill you with open source.

Demetrios Brinkmann [00:00:28]: You're the accidental AI company.

Russ Dsa [00:00:31]: Wow. I didn't think you were going to open with that, but yes, yes, we.

Demetrios Brinkmann [00:00:34]: Are jumping right into it.

Russ Dsa [00:00:36]: Right in. Let's go.

Demetrios Brinkmann [00:00:38]: How did that happen?

Russ Dsa [00:00:40]: Oh, man, that's a crazy story. I've told it once on a podcast, but I'll tell it again since you probably have a bigger audience than they do.

Demetrios Brinkmann [00:00:48]: I think you're overestimating me. My mom listens to this podcast and that's about it.

Russ Dsa [00:00:54]: I can tell you a funny story. My mom listens to a lot of podcasts. This one, not this one, probably. She's a huge fan, though, of Dylan Patel.

Demetrios Brinkmann [00:01:02]: Oh, I just interviewed him yesterday.

Russ Dsa [00:01:04]: Did you?

Demetrios Brinkmann [00:01:05]: That's so funny.

Russ Dsa [00:01:05]: Yeah, it's so crazy.

Demetrios Brinkmann [00:01:06]: The sad part is that this machine right here crapped out 10 minutes in. So it wasn't a podcast, it was a conversation.

Russ Dsa [00:01:14]: Oh, no. Yeah, yeah. He's so smart.

Demetrios Brinkmann [00:01:16]: He is.

Russ Dsa [00:01:17]: My mom told me he's so smart. He wasn't necessarily demonstrating how smart he was when we were talking, but you should put this part in too. Shout out to Dylan. I can tell you a funny story about Dylan too. He'll be mad, though, if I tell you that story on air.

Demetrios Brinkmann [00:01:32]: Off air.

Russ Dsa [00:01:33]: Off air, off air. We'll talk about that. He's awesome, though. So how do we become an AI company? So Live Kit's been on kind of a journey. We started Live Kit right in the middle of the pandemic, kind of towards the start. 2020 pandemic was a weird time in the world. You couldn't leave your house.

Russ Dsa [00:01:53]: Right.

Russ Dsa [00:01:54]: And the only way you could connect with other people was through the Internet. And the way people did that predominantly was using a camera and a microphone on their computer.

Russ Dsa [00:02:04]: Right.

Russ Dsa [00:02:07]: It turns out that when you go to do that, like transmit data from your camera and your microphone, the Internet as we mostly have known it for the last 30 years wasn't designed for that. HTTP stands for.

Demetrios Brinkmann [00:02:28]: You're testing my knowledge. Huh? No idea.

Russ Dsa [00:02:32]: It stands for hypertext transfer protocol.

Demetrios Brinkmann [00:02:36]: Yeah, I never would have got that.

Russ Dsa [00:02:38]: So it was designed. It was designed for you to transfer text HTML, which is just text over a network. It wasn't called Hyper voice transfer protocol. Hyper video needed to be and in. During the pandemic, we needed a protocol for transferring data over a network that was rich, richer than text, audio and video data from your camera and your microphone. And so there is another protocol for doing that called WebRTC. Yeah, but it's just not commonly used. Up until leading right up to the pandemic, there were really only three apps at scale that were using that protocol.

Russ Dsa [00:03:19]: Google Meet, Discord and Zoom. And so Zoom uses a kind of a custom version of it. So it should be no surprise that when the pandemic hit, those are the three apps that everyone was using because they were the only three ones that were you. Yeah, they were the ones designed for this kind of world we found ourselves in. But if you were a developer that was trying to build your own application that needed real time audio and video, you had to build a ton of infrastructure yourself. Think of it as like you had to, you didn't have stripe for this kind of need. You had to go build all of this stuff yourself to interface directly with the payment gateway in our world. Here we're talking about with networks, raw kind of network protocols.

Russ Dsa [00:04:04]: And so I discovered this limitation of how much I would have to build myself by actually working on a side project. During the pandemic, I was working on a clubhouse for companies when clubhouse was.

Demetrios Brinkmann [00:04:16]: Kind of, oh my God, clubhouse.

Russ Dsa [00:04:17]: I know, right?

Demetrios Brinkmann [00:04:18]: Forgot about that.

Russ Dsa [00:04:19]: Throwback.

Demetrios Brinkmann [00:04:21]: Very pandemic.

Russ Dsa [00:04:22]: And so I was trying to build a clubhouse for companies and I had to build all of this infrastructure myself to, to actually ship the product. And so I'm like, maybe this infrastructure that I had to build myself could be valuable to other people. So I pinged my, you know, now co founder David, we had done a company before that and sold it to Medium. And so I pinged him and I said, hey, I want to like, I want to build this side project, but I also think it'd be cool if we could give this infrastructure away to other developers so that they can use it. And that was what turned into Live Kit. It was an open source project that built real time audio video streaming infrastructure in a world where you needed to connect with other humans over the Internet. And it was very hard to do so or build an application that needed to do so. And so that's how we started.

Russ Dsa [00:05:23]: We grew really fast. We had a lot of like big companies that started to use it. You know, Spotify and Adobe and Ebay and Oracle and all this stuff.

Demetrios Brinkmann [00:05:32]: So product market fit right from the get go.

Russ Dsa [00:05:34]: It was product market fit almost from the get go. But we didn't have a product. I mean, we had an open source project. Yeah, okay, but we didn't have a commercial product. And so we decided, hey, like, these big companies are using us, let's go talk to them and find out why. And what they said was like, hey, we love the product you've built. We can see your code clearly. You guys are good programmers.

Russ Dsa [00:05:50]: Or my co founder is maybe, not me. So much full credit to him, but can we have a commercial product? Like, we don't want to deploy and scale this ourselves. We want you to deploy and scale this for us and run a network and we'll pay you money. So we said, okay, well that's a path to continue working on this for the long run, you know, so why don't we do it? So we raised some money and we started working on this live kit cloud system, which is like this global network for streaming audio and video. We'll go too much into the details of it because that's not really the question you asked me, is it? How do we become an AI company? But I promise this part is kind of relevant. So we launched the cloud product at the end of 2022. And at the end of 2022, ChatGPT the website comes out and I'm like, wow, this GPT3 model is like so good. Or 3.5 is so good.

Russ Dsa [00:06:43]: It feels like I'm texting with a human. I'm like, what if I took live kit and I paired it with this website and like I combine the two things together, you know, run chat.OpenAI.com in a puppeteer session, a headless browser, and then like take the speech, convert into text, pipe it into the headless browser into like the div tag, hit the submit button.

Demetrios Brinkmann [00:07:06]: You were so ahead of your time.

Russ Dsa [00:07:07]: Like, what if I jerry rigged this thing together where you could talk to ChatGPT instead of text with it voice AI wasn't a space, right. I think Deepgram had been around, but I put out this demo, I tweet it, and I'm like, I'm for sure going viral.

Demetrios Brinkmann [00:07:23]: Like, you felt it in your bones.

Russ Dsa [00:07:25]: I'm like, this is her, like Samantha from her. I'm definitely going viral, except without the voice and it just goes nowhere. Nobody notices. Well, 90 people faved it, so maybe 90 people noticed, but definitely didn't go viral. So I was pretty disappointed. Fast forward five months later, there's a really long, amazing story that I won't tell because it'll take a long time. Something happened in that morning that was pretty interesting and I'm like teasing it. Maybe you can just cut it out if it's too long.

Demetrios Brinkmann [00:08:01]: Yeah, we got full director's cut.

Russ Dsa [00:08:04]: I got pinged by a really large cloud provider. I think it was August 4th, 2023. So five months after I put out this demo, I get pinged by a large cloud provider and they're like, I go have lunch with them for five hours and they're like, can we buy you? Can we license LiveKit Cloud and deploy our own network? Or maybe we'll just kill you with open source. And I'm like, damn, that's usually what happens after seven years. It's happening after two. I was like, I'm flattered and I'm also terrified. And so they say, well, they're like, look. I'm like, well, how much would you offer? Just curious, you know, maybe I'll.

Russ Dsa [00:08:45]: Maybe I'll sell. Who knows? What's the number? And they're like 20 mil.

Demetrios Brinkmann [00:08:49]: And I'm like, not a bad payout.

Russ Dsa [00:08:52]: Not bad two year project, but we raised 15.

Demetrios Brinkmann [00:08:55]: Oh, okay. So I was like, not the best outcome.

Russ Dsa [00:08:58]: Not going to be the best. So sorry, guys. They're like, wait, wait, wait, wait. They're like, look, if you can somehow tie what you're doing, this audio, video streaming stuff, if you can tie it to Gen AI.

Demetrios Brinkmann [00:09:12]: No, you're like, you didn't see my tweet.

Russ Dsa [00:09:16]: 200, 200 million. If I could tie it to Genai. And I said, dude, that's ridiculous that you said that. And I can't tie to Genai. We're a video conferencing, live streaming company. I didn't think about my demo.

Demetrios Brinkmann [00:09:28]: No.

Russ Dsa [00:09:30]: So I leave this meeting. Co founder and I are talking. David and I are talking. We're like terrified. This large company's coming after us. What do we do? We're like, okay, I guess we just got to move fast. That's the only advantage we have. Driving home after, you know, during rush hour, I get this email that comes in and it says, hello from OpenAI and I, I jump on a call with them.

Demetrios Brinkmann [00:09:55]: They found you because of what you would.

Russ Dsa [00:09:57]: And they're like, we found your blog post and the demo you built that you tweeted. And we have been wanting to build a voice interface to ChatGPT. And so three weeks ago we signed up with a personal Gmail. So you didn't know is us. This is, this is the content right here.

Demetrios Brinkmann [00:10:20]: Serendipity.

Russ Dsa [00:10:21]: And we love the platform and now we want to build this thing for real and ship it. And so I was like, okay.

Demetrios Brinkmann [00:10:36]: You still didn't connect the 200 million at that point. You were like, I got to go.

Russ Dsa [00:10:40]: Back to those guys. You know what's funny? I say now that I did and I'm like, wait, I have a gen I story. I didn't really. Actually, I didn't connect it. I don't know. I think I was too like, just mind blown. What's also interesting is that. So we were, you know, we got in a month of building this with them and we shipped Voice mode in September, the first version of it before Advanced and all of the GPT4O and all of that stuff.

Russ Dsa [00:11:04]: And in that moment was when I realized that this is an AI company. And the realization. And then Soundbyte exists everywhere in other places too. Because I've said it a lot. The realization was, oh, man. OpenAI wants to build AGI. AGI, in my opinion, is a human like computer. It's a synthetic human.

Russ Dsa [00:11:30]: If we're successful and get all the way there. Synthetic human. How are you going to interact with a synthetic human?

Demetrios Brinkmann [00:11:37]: You're going to talk to it.

Russ Dsa [00:11:38]: You're going to interact the way that humans interact. Like how we're interacting and we use eyes, ears and mouths predominantly to interact with one another. The equivalent sensors, the equivalent senses for a computer, how it gets that information are cameras, microphones, speakers. It turns out that live kit, which was built for connecting humans to other humans, using cameras, microphones and speakers, you can swap out one of the humans because all of a sudden that computer is now human like. And you can use the same technology to connect a human to a machine. And so if OpenAI and Anthropic and Gemini and the labs are going to build the brain, Live Kid has an opportunity to build the nervous system to carry signals to and from that brain wherever they originate.

Demetrios Brinkmann [00:12:33]: Wow.

Russ Dsa [00:12:34]: And with ultra low latency. And so then I was like, wow, this is going to be. This could be. It has a good shot of being a core piece of infrastructure that is the backbone of multimodal AI. And in the future, you won't have to qualify it as multimodal AI, it'll just be AI. Right, because that's where this is all going to go. Once the model gets smart enough and capable enough, it's all going to be multimodal. So that's how we became an AI.

Russ Dsa [00:13:08]: Company.

Demetrios Brinkmann [00:13:09]: I was not expecting that when I asked the question. I did not think that it was going that deep. But what a story. What a way to just live serendipitously and recognize that it's crazy. You just rode the wave, man.

Russ Dsa [00:13:24]: You know what? Justin Kahn is a friend and an investor. First check actually into Live Kit.

Demetrios Brinkmann [00:13:31]: And he was the guy that started Twitchin tv, right?

Russ Dsa [00:13:34]: Yeah. Justin TV was Twitch.

Demetrios Brinkmann [00:13:35]: Twitch.

Russ Dsa [00:13:36]: And we were just talking about this the other day where Emmett, one of his co founders, who is the CEO of Twitch, very briefly, maybe the CEO of OpenAI.

Demetrios Brinkmann [00:13:45]: Oh, my God, I remember that too. But so many dramas.

Russ Dsa [00:13:48]: I once ran into Emmett at a party in YC in the early days and I had taken a job at 23andMe and like, anyway, that's not really part of the story. I ran into Emmett at a YC party and in the early days, and I was asking him, how was Justin TV going? It wasn't Twitch yet. And he was like, you know, we're doing great. We're running these interactive ads for horror movies. We're getting paid. But then there's this other thing going on on Justin TV, which is everyone keeps streaming StarCraft videos, so we're thinking about doubling down on people streaming starcraft videos. And, you know, that became Twitch. And throughout my whole career as an entrepreneur, I've just been like, why the heck can't that happen to me? Why can't I get lucky? Where people just start to use something in a way that I never predicted and then I can just ride the wave into success.

Russ Dsa [00:14:39]: You can't engineer that.

Russ Dsa [00:14:40]: Right.

Russ Dsa [00:14:41]: And so it's kind of surreal, honestly, that it ended up happening because I've been thinking about, like, why can't I just have that happen? Why can I get lucky? And then it ended up happening. It's. It's wild.

Demetrios Brinkmann [00:14:53]: Maybe there's things that now you're like, okay, I've seen this five times and it works out in the end, so I'm not going to be so stressed about it.

Russ Dsa [00:15:00]: I think it's interesting because it's like my fifth company and I have a lot of learnings from the previous ones, but I think the learning that I have can be kind of like distilled down into one important point. Not even necessarily that I've seen it play out and I missed something more. So that the thing that I didn't do with my previous companies was, was I did not think long term. Always think long term when you're doing something right. What kind of value are Creating long term. What kind of movement are you attaching yourself to long term? In the past, I've kind of optimized for more reactive things like how do I compete with this other person or how do I position myself around what is like the zeitgeist at this moment and build for that.

Russ Dsa [00:15:51]: Right.

Demetrios Brinkmann [00:15:51]: You guys don't have an MCP server.

Russ Dsa [00:15:54]: You know what? We actually might, but I. You don't. You.

Demetrios Brinkmann [00:15:57]: But that's not what you're talking about.

Russ Dsa [00:15:58]: We might. That's not what I'm talking about.

Demetrios Brinkmann [00:15:59]: You know, and that's not what stresses you. I imagine whether or not you have an MCP server isn't something that keeps you up at night these days.

Russ Dsa [00:16:06]: It's not. And for what it's worth, Shane built the MCP server if we have one, so blame him. I had nothing to do with it. I wasn't part of the conversation. So it's all about him. We're going to talk about that later actually, because he's not thinking long term. I'm just kidding. Hi, Shane.

Russ Dsa [00:16:21]: No, but Shane's awesome, by the way.

Demetrios Brinkmann [00:16:23]: Sounds like.

Russ Dsa [00:16:24]: Yeah, but you know, like we've always optimized in my past companies like around some kind of like myopic or small or near term thing. What we did differently this time was we actually didn't start Live Kit to be a company.

Russ Dsa [00:16:44]: Right.

Russ Dsa [00:16:44]: We started it as an open source project.

Demetrios Brinkmann [00:16:46]: Because you had a pain.

Russ Dsa [00:16:47]: Because we had a pain. And I actually was. This is the truth. Dz, my co founder and I, we thought that we were going to build a bitcoin wallet.

Demetrios Brinkmann [00:16:56]: DZ sounds like a bitcoin name and.

Russ Dsa [00:17:00]: I think he puts like bitcoin maxi in his Twitter bio. Maybe he does, I don't know. But if he doesn't, he is a bitcoin maxi, so don't talk to him about crypto. You'll never hear the end of it. But. Hi, dz, he's awesome too. But no, we were like, you know what? We want to work in crypto. Crypto's getting hot again.

Russ Dsa [00:17:20]: See, not thinking long term. But in the meantime, like, we don't really have a good crypto idea. So we'll like just try to build some value for people. Hey, you know what, like we're in this pandemic and people need to like use audio and video a lot. Why don't we make that infrastructure easier for developers? We've been developers, we've benefited from open source. Why don't we give it back and like build something so that they can go faster. And so that was really what it was. We're like, you know, there's going to be more video on the Internet, not less.

Russ Dsa [00:17:47]: The Internet's going to get more real time, not less real time. And so that's truly like how we started. And then when the AI stuff happened, we said, okay, well we have this magical LLM technology, right. We now have a probabilistic computer that's like kind of like a raw material. How are you going to interact with that raw material? How do you harness it and like leverage it in an application and deliver that value to users and scale it? And so it turns out that you need a lot of infrastructure around like a lot of deterministic, regular old code infrastructure around this like statistical or probabilistic or stochastic computer to really leverage it and harness it. And so that's kind of been our guiding light is we're going to build all of that infrastructure, all of that muck, so that developers who are building apps that need to use this magical technology don't need to.

Demetrios Brinkmann [00:18:45]: Yeah, you're the rock.

Russ Dsa [00:18:47]: Yeah.

Demetrios Brinkmann [00:18:47]: And then everything else is a little bit more fungible.

Russ Dsa [00:18:50]: Yeah, yeah, yeah. And so I think, think long term is the big lesson that I've had.

Demetrios Brinkmann [00:18:54]: Is there a place where you are doubling down your bets right now?

Russ Dsa [00:19:00]: Yeah, for sure. You know, we now the way I like, even if you go to our website, like we pitch the platform a little bit differently because we are kind of all in on AI infrastructure. It's not that video conferencing and live streaming companies don't use us, they still do. But every product is trying to figure out how does AI fit into my product and how can it make my product better? And so even video conferencing and live streaming is going to have AI as part of it. So we're really focused on AI infrastructure. What I say now is we're kind of building the platform for voice, video and physical AI.

Russ Dsa [00:19:37]: Right.

Russ Dsa [00:19:39]: The other way that I say it is you can now give your application an ability it never had before. It can see, it can hear, it can speak, and in the robotics case, it can move. And so what's interesting about where we are today? You know, I just gave you the pitch or the one liner, but what's interesting about where we are is that we actually don't have the complete platform yet. We're not a complete product. This is, I'm like kind of anti selling our stuff.

Russ Dsa [00:20:13]: Right.

Russ Dsa [00:20:14]: Maybe I should be doing something different. But the honest truth is we don't have a complete product yet. We have the transport network, which is how we started pre AI. Then with OpenAI, we built this agents framework which allows you to orchestrate and build voice agents. And you can think of that as like our next js, right, for voice agents. So we have the next JS and we have the network that when you build an agent using our next js, it can connect to our network and stream the audio and video. That's nice. But in the middle, you have some gaps.

Russ Dsa [00:20:46]: When you build your voice agent using our framework, how do you test and evaluate that framework? How do you know what it's doing or if it's doing the right thing or the intended thing? Then once you have a statistical confidence that it's doing the right thing, how do you deploy it and scale it and load balance?

Russ Dsa [00:21:03]: Right.

Russ Dsa [00:21:04]: Because voice agents are not web applications. They are stateful. They're always running for as long as the session is running. They're constantly processing voice and generating voice. So it's not like a connect, send some data, do a database operation, disconnect kind of thing. It's stateful and always on. And so how do you deploy and load balance, something like that? It's just a different paradigm. Then of course, you have the run part of it.

Russ Dsa [00:21:28]: So you deploy it. Now how do you run it? That's our network infrastructure, that's our telephony infrastructure. And then on the end, it's like, once you're running it, how do you observe it? Where's the data dog or the new relic for voice AI?

Demetrios Brinkmann [00:21:42]: Shout out to Tom. That's another great member of your team.

Russ Dsa [00:21:48]: I want to see something bad about Tom and then say he's awesome afterward. But yeah, we got to. I have nothing bad to say about Tom. Tom's amazing. And so building that end to end platform is really what we're trying to do.

Russ Dsa [00:22:01]: Right.

Russ Dsa [00:22:02]: And it's not because we are dreaming it up and it's like, this is what everyone needs and if we build that, they will come Field of dream style. It's because our community and users are literally asking for this stuff every day. They're like, I built this agent, I'm running this agent on the network. It's working great. But I'm struggling with deploying it.

Russ Dsa [00:22:21]: So.

Russ Dsa [00:22:21]: So we're working on a hosting product. I'm struggling with testing and evaluating. So we're looking at partnering with people who are working in the evaluation space.

Russ Dsa [00:22:28]: Right.

Russ Dsa [00:22:29]: But also we're exploring whether we should do some stuff ourselves or an evaluation that works really well within the Live Kit ecosystem. And then on the observability side, we're like, okay, well, how do we monitor this stuff? People want visibility in these systems to know what are the user experiences like that they're having with my application. And so we're kind of trying to tackle all of these things and build that kind of all in one platform so that you can go from zero to shipping and scaling and production and making money and building a successful business, all with using Live Kit, that foundation.

Demetrios Brinkmann [00:23:04]: You just plug in and then you're golden.

Russ Dsa [00:23:06]: 100%.

Demetrios Brinkmann [00:23:06]: Yeah. I love that vision. And so now it's doubling down on the platform ecosystem.

Russ Dsa [00:23:11]: Yes. 100%. Dude.

Demetrios Brinkmann [00:23:14]: So cool.

Russ Dsa [00:23:14]: You should pitch this. I think it was better. You said it like in like 10 seconds. And I. That I took like two minutes. Wow, that's easy to do. Yeah.

Demetrios Brinkmann [00:23:23]: One thing I do think about voice, which is fascinating, is how much of a rich medium it is and how many signals you can get from voice, which is very unlike text.

Russ Dsa [00:23:34]: I mean, you're right about all those things. Right, thank you. So voice is fundamentally higher bandwidth than text.

Russ Dsa [00:23:42]: Right.

Russ Dsa [00:23:43]: It carries more data, you know, it's more dense per packet, let's say.

Russ Dsa [00:23:48]: Right.

Russ Dsa [00:23:48]: Or per byte. And the reason why is because it's carrying more information.

Russ Dsa [00:23:56]: Right.

Russ Dsa [00:23:57]: Voice data has. It has prosody and intonation and a cadence. All kinds of these aspects of sound that humans, like, automatically pick up on and can change the semantics of what you're saying.

Russ Dsa [00:24:16]: Right.

Russ Dsa [00:24:18]: The way you say something absolutely matters. Maybe not as much as what you say, but it definitely makes a difference.

Russ Dsa [00:24:26]: Right.

Russ Dsa [00:24:27]: In interpretability or how that information is perceived by your brain. Right on the receiving end. And so the other part about this is like, so, yes, that's kind of just the law of physics of voice, and it simultaneously makes it an amazing medium for interaction and a very natural medium. I mean, this is what we've been doing for thousands and thousands of years, right. Is using voice to communicate ideas to one another. But it also makes it incredibly difficult to build a synthetic human like experience in code that matches the fidelity of a human, like conversation. And so I think a lot of possibility, which is exciting, but also it requires a lot of experimentation to get right. And there's a lot of moving pieces in between, as you said.

Russ Dsa [00:25:31]: Our framework tries to reduce the amount of entropy in the system by handling a lot of things for you, like interruptions and turn taking and the low latency back and forth streaming of the data. But then you also have things like transcriptions and the quality of those. When you turn that voice into text and you have the brain piece of it, the LLM, is it going to understand those prompts in the way that you want them to and going to.

Demetrios Brinkmann [00:25:58]: Call that tool correctly?

Russ Dsa [00:26:00]: Exactly. Is it going to call that tool reliably? How long does that tool call take? Do you generate a response in the middle so that people don't think that you're waiting forever? It's like there are all these aspects, conversational dynamics and reliability aspects of a conversation that if you don't get it right, then you fall into this like uncanny valley and then people kind of reject it. And I think what also makes it even more difficult is the thresholds for when someone rejects it or doesn't think it's useful are very use case dependent. So if you are building, for example, if you're building like patient intake at a hospital, we've just been talking to a company, I'll give them a shout out. We've been talking to Assort Health, they're a user of Live Kit. They're awesome, they're growing, they're amazing. One thing that we were talking to them about was around latency and they're like, you know, we're like, you know, like the latency. How do you feel if we got that latency even lower? I don't know, it's maybe coming in at like a second or something for, for full kind of turns or maybe around 750 milliseconds, somewhere in that range.

Russ Dsa [00:27:04]: 750 to a minute to a second for time between the AI speaking to you, you speaking to it, the turn latency. And they said, you know, honestly, for our use case, it doesn't matter. They're like, when people are on the phone that are like doing patient intake at a hospital, they don't really care if it takes a second or two seconds. They just want to get an appointment on the doctor's calendar and they want to talk about the issue that they're having and like whether it's like super fast, like, look, if it takes five seconds or 10 seconds, yeah, it's going to be a problem. But the distance between like, yeah, 200 versus 500 versus a second versus two seconds, they're like, what we have found is sometimes the LLMs take a while to respond. Not a Live Kit problem, just want to say, but sometimes they take a while to respond the LLM. And so it does. Once in a while hit two, two and a half seconds.

Russ Dsa [00:28:00]: They actually don't See any drop off from that.

Demetrios Brinkmann [00:28:02]: But if you think about a human doing the same thing, there's a lot of times where I wait on the. Because somebody is writing something down or they're pulling something up on a computer. So it's not like we're unused to. It's not an unnatural behavior.

Russ Dsa [00:28:18]: So that is true. But there are use cases where there are high sensitivity to latency. And I'll give an example and then I'll talk about a technique. So for language learning, Speak is another company, they build immersive language learning. So if you want to learn how to speak a different language, you can kind of have like a, an experience where you're talking to a native speaker of that language and it feels very conversational, like you're in their country and you're talking to them and they use live kit for this. And latency is very sensitive there because, like, it's supposed to feel like a human conversation. And so if you're not responding very quickly, it starts to feel unnatural.

Russ Dsa [00:28:55]: Right.

Russ Dsa [00:28:56]: And it breaks the experience or the believability of the experience. That's the uncanny valley, I guess. And so for that use case, latency is, does matter a lot. What's interesting about what you said about how in like a use case, like customer support or patient intake, the person is like looking up information or doing something on their computer, and then they respond after a while. What's interesting is, yes, that is true, but usually you're getting some kind of nonverbal feedback.

Demetrios Brinkmann [00:29:28]: You're hearing papers in the background, you're.

Russ Dsa [00:29:30]: Hearing paper in the background, you're hearing keys clacking. Or they're, or they're saying, you know what? I'm going to do something. Can I put you on hold for a second? Or they're saying, oh, let me see. Huh? Like, and so either non lexical or actual lexical. Lexical responses that are coming back very, very quickly. And then you're like, you're primed. Okay, I'm gonna wait. And so if you just hear silence, you go crazy.

Russ Dsa [00:29:52]: That's the part that is like, wait, this isn't.

Demetrios Brinkmann [00:29:54]: This isn't working. Did it not.

Russ Dsa [00:29:55]: This isn't right. Is it broken?

Demetrios Brinkmann [00:29:57]: That's funny.

Russ Dsa [00:29:57]: Let me ask again. So you know where this is becoming a thing that people are starting to think about is in reasoning models. Because Anthropic's model and some of the other models now, they don't have like a separate reasoning and then a separate, like regular single shot is they have thinking tokens and A thinking budget and you can apply a certain amount of thought to different kind of operations or prompts. And you're like, well, if I'm going to apply a high thinking budget, how do I build voice AI around a reasoning model that can dynamically think for different periods of time? Well, my recommendation is you need to figure out how do you respond and what do you respond with immediately before you trigger or in parallel with triggering kind of like a thinking process.

Demetrios Brinkmann [00:30:43]: It's so funny you mentioned that because I was talking to Elliot, who runs voca, and they do something similar where he.

Russ Dsa [00:30:52]: I'm actually meeting with the VOCA guys tomorrow.

Demetrios Brinkmann [00:30:53]: Really?

Russ Dsa [00:30:53]: Yeah, yeah. Oh, I'm hanging with them for coffee.

Demetrios Brinkmann [00:30:57]: He's one of the deepest thinkers on conversational design that I have talked to and he's done some really fun stuff where he said we put background noises in so that it feels more human. You know, you don't want just like a voice agent there that you can't hear anything else except for the voice. And then he also said that they will intentionally make the agent a little bit like dumber, quote, unquote. Because the human tends to enunciate more and they tend to be a lot more forgiving and say what they want in a very articulate way when they think that, like, oh, this is just some, like, shitty AI.

Russ Dsa [00:31:37]: Oh, wow. And so human empathy.

Demetrios Brinkmann [00:31:39]: Exactly.

Russ Dsa [00:31:39]: Right.

Demetrios Brinkmann [00:31:40]: So funny.

Russ Dsa [00:31:40]: It's like this trick.

Demetrios Brinkmann [00:31:41]: It's such a nice little hack that you have. And what he was saying too was. Yeah. You have to mention that you're thinking or you're doing something, that action is being taken. If you're making someone wait.

Russ Dsa [00:31:55]: Yes.

Demetrios Brinkmann [00:31:56]: Because otherwise they think that it just went offline.

Russ Dsa [00:31:59]: Right. There has to be that feedback that you get.

Russ Dsa [00:32:02]: Right.

Russ Dsa [00:32:03]: To kind of.

Russ Dsa [00:32:04]: Yeah.

Russ Dsa [00:32:04]: Prime your mind as to what's happening.

Russ Dsa [00:32:06]: Right.

Russ Dsa [00:32:07]: Even if you can't see what's happening, like hearing over. Hearing it over the phone. You need some signal, dude.

Demetrios Brinkmann [00:32:12]: I think that's perfect way to end it. Is there anything else that you wanted to hit on that we didn't talk about?

Russ Dsa [00:32:17]: No, it's great, man. Yeah, yeah. Love chatting.

Demetrios Brinkmann [00:32:19]: This is very cool. I appreciate you doing this.

Russ Dsa [00:32:25]: Sam.

+ Read More

Watch More

Anatomy of a Software 3.0 Company // Sarah Guo // AI in Production Keynote
Posted Feb 17, 2024 | Views 4.1K
# MLOps
# DevOps
# LLM Operations
# Machine Learning
How A Manager Became a Believer in DevOps for Machine Learning
Posted Mar 30, 2023 | Views 482
# DevOps
# Language Modeling
# Healthcare Technology
Becoming an AI Evangelist
Posted Mar 01, 2024 | Views 566
# AI Evangelist
# AI Startup
# Weights and Biases
# wandb.ai
# Thursdai.news
Privacy Policy