Real-Time Data Streaming Architectures for Generative AI // Emily Ekdahl // DE4AI
As a Senior Machine Learning Engineer, Emily leverages her skills and experience in AI/LLMs, machine learning, and data engineering to deliver innovative solutions for industry. Emily works with a cross-functional team of engineers, product managers, and business stakeholders to design, build, and deploy scalable and robust data pipelines, models, and reports that enable data-driven decision-making and optimize business outcomes.
Bridging the Gap Between Batch Processing and the Lakehouse for Next-Gen Customer Experience
As Generative AI (GenAI) and large language models (LLMs) evolve at an unprecedented pace, traditional machine learning architectures that rely on batch processing and static can no longer keep up with the amount of data they need to process. To beat competitors, numerous organizations are implementing real-time data streaming solutions, leveraging technologies like Apache Kafka and Apache Flink. These tools work together to ingest and process data in real-time, which, when combined with a vector database, can significantly boost the performance and reliability of GenAI applications. In this talk, we’ll dive into the benefits of the "shift-left" paradigm, which is all about moving from the old-school batch and lakehouse models to real-time data products. This shift allows companies to create GenAI applications that are more responsive and context-aware. By integrating streaming data with real-time model inference and using the Retrieval Augmented Generation (RAG) method, companies can cut down on latency and ensure their LLMs deliver up-to-date responses. We’ll cover key architectural patterns, potential challenges, and best practices for making this transition, all while sharing real-world examples of how integrating Kafka and Flink with vector databases can lead to next-level NLP applications.
Adam Becker [00:00:07]: Next up, we have Emily. Let's see. Emily, are you around?
Emily Ekdahl [00:00:10]: Hello.
Adam Becker [00:00:11]: Hi, Emily. Okay, we're glad to have you. You're gonna be chatting with us about real time streaming for Genai, correct? That's right. Okay, so your screen is up. Take it away.
Emily Ekdahl [00:00:25]: All right. Good morning, everybody. I'm Emily Ekdal, and today we're embarking on a journey with benighted airlines, a fictional airline struggling with outdated customer support systems. We'll explore how adopting shift left data architecture transform their AI customer support system from chaos to clarity, improving both their customer support experience and their operational efficiency. Imagine a customer desperately trying to change a flight at the last minute, bouncing around various communication and support channels trying to get help. United Airlines have been plagued by their disconnected systems, data silos and batch processing that delay the data the AI needs to provide high quality responses. Customers might be pinging them through multiple channels, but their data wasn't up to date, and this led the AI agent to provide ineffective and incomplete support. Even the humans had to spend quite a bit of time composing the customer case across many systems.
Emily Ekdahl [00:01:35]: In order to help the customer, United Airlines needed to act fast. Real time data streaming, powered by technologies like Kafka and Flink, would allow their customer support AI agents to access up to date information they needed to help their customers in near real time. Instead of relying on the bash process ETL system that landed their data too late, they could, with a complete customer context, serve their customer on first contact. So now imagine the same customer trying to change a flight last minute. This time, AI agent has all the information they need from every interaction across all channels, including that angry tweet the customer just sent. Before implementing shift left data architecture, United Airlines faced critical issues. They had long data delays. Their data was scattered across systems, and they had complex, costly ETL processes, including a reverse ETL that made it hard to determine where data was going to lend and when.
Emily Ekdahl [00:02:50]: Their human and AI support agents spent a lot of time composing a customer case before they could help. And the AI agents weren't always able to transact over across all those systems. So there were poor customer experiences. And when they had the batch process, even when they first tried to implement their AI customer support agent, the project wasn't as much of a success as they'd hoped. The customers didn't like the AI customer support agent. It said things that did not make sense, and it didn't seem to know things that the customers expected it to know, like that flight they just rebooked. Or the five other systems, or agents or situations that they had contact with the business within the past couple of hours. With shift left data architecture, benighted airlines moved from the batch process that they had to real time data streaming.
Emily Ekdahl [00:03:52]: With the power of technologies like Kafka and Flink, they could stream data in the transform it, embed it, land it in the vector store, and then the AI agent had access to all the information it needed to create a unified, up to date customer context. And this allowed the AI agent to deliver fast, high quality customer service and resolve issues on first customer contact. United Airlines no longer has their AI agents struggling to stitch together information, or providing outdated responses or hallucinating with the real time data. Their AI customer support agent provides a seamless, unified customer experience that is similar to that of the the humans, but at a lower operational cost. So there's no more repeated questions or fragmented answers. And they were so inspired by their improved customer experience, they rebranded as enlightened airlines. Now, with shift left data architecture, enlightened airlines is poised for the future. Soon their AI systems will handle more complex queries with larger context windows.
Emily Ekdahl [00:05:21]: They're also expanding into multimodal models so their customers can send pictures or other forms of communication rather than just text. Also, they're able to expand into multi agent systems, which will allow the customer to take actions in their system simply by chatting into their AI support. Botanical enlightened airlines journey from disjointed, inefficient customer support to a real time, data powered AI agent is a lesson in transformation. By implementing shift left data architecture, they've not only improved their customer satisfaction, but they've also reduced latency and decreased operational overhead. If your AI agent is facing similar issues, it might be time to explore shift left data architecture for your organization. Thank you for joining me on benighted Airlines transformation journey today. I hope this talk has inspired you to consider how you could implement shift left data architecture and transform your AI customer support experience. And if you want to dig in further, I'd recommend checking out the article I cited, the shift left data architecture by Kai Werner.
Emily Ekdahl [00:06:43]: Thank you.
Adam Becker [00:06:45]: Thank you very much, Emily. Is that okay if I ask you a couple of questions?
Emily Ekdahl [00:06:49]: Sure.
Adam Becker [00:06:50]: Okay. Can you go back to the original? I think it was like the original shift left architecture. Yeah, actually original ETL. So. Yes, yes, that one. Okay. Before shift left. So we're saying the conventional ETL architecture with batch workloads and consistency, we have all these different data sources all funneling in to our analytical systems, and then we have.
Adam Becker [00:07:18]: I'm going to make this a little bit larger. 1 second. Okay, we're going to be here. The raw data dumps, I'm just trying to fully like make it make sense in my head and. Okay, so now can you go back to the, to the shift left one?
Emily Ekdahl [00:07:43]: Yeah. So basically what I'm proposing is that we use Kafka and flink to stream and transform the data and go right into the vector data store with very few delays. So of course agents can operate tools. But what I'm proposing is that if the two failure modes, the two most common failure modes of AI are no information and hallucination or stale information, then streaming and transforming embedding data right into our vector data store gives the AI agent all the access it needs to provide high quality customer support. Because things I've observed across other organizations I've been a part of is even the human customer support agents have to potentially like bridge five different systems to compose the total customer context. But the AI may not be able to do that for various reasons. So that's why I'm recommending shift left.
Adam Becker [00:08:38]: With shift left, they just take all the customer data, stream it directly to the, like, they do whatever embedding they need, and fit it straight into the vector datastore. And then you allow the agents to, let's say, just operate on those data stores immediately on the vector store. And where do analytic? Is the way we think about analytics differently too? Or is it mostly just like operational? So if you go back, can you go back to the previous slide? We have analytic. What are the downstream implications for analytics? So I understand how like the AI and the ML, let's say like now you have some genai agents, they're able to better contextualize everything because they have access to all the different vectors. What about the analytics?
Emily Ekdahl [00:09:24]: The analytics is also improved because there are business cases where people want to ask reporting type questions to their AI agent and I batch processes would also delay the ability, the data, and therefore the ability of the AI agent to answer. So like for example, you might have a situation where if the customer actually used your UI reporting UI, they would get an answer that was based on point in time. But if they're using your AI support agent, and your AI support agent is backed by a batch data process, it'll provide an answer that may even be as old as like yesterday. So like not giving you a current reporting answer. So there are cases where AIA agents are operating over reporting, and this shift left data architecture would still be a huge resource for them as well.
Adam Becker [00:10:22]: And then one more question. I have about the distinction between the curated data products and the raw data streams? Are any of them affect? Yeah, if you go back to the previous, to the current ETL. So we don't have that kind of distinction here. Right here?
Emily Ekdahl [00:10:42]: I think so. Because here you have your reverse. It's kind of hard to tell, but here you have your reverse ETL, right? You have your kind of like your files and logs going over here, reverse ETL. And then transaction in your operational systems.
Adam Becker [00:10:59]: I'm going to hide us from. Okay, so basically it's the reverse ETL in this context that we would sort of look at as kind of like similar to, or at least analogous to, let's say, like, just full on curated data products in, like, the shift left world.
Emily Ekdahl [00:11:22]: I'm trying to think if I understood your question, but basically.
Adam Becker [00:11:28]: Yeah, yeah, yeah.
Emily Ekdahl [00:11:29]: So there are actually, I have worked on business cases where people reverse ETL data back into their systems to support transactions in their application. And even in that case, I'd make the argument that shift left data architecture is appropriate because there can be, you know, significant delays sometimes as much as a day for this whole process to work through and feed that data back into your system. So I think, you know, I'm making the argument that this is most appropriate for AI agents and. But I would argue, based on my prior experience, that this can even benefit those cases as well.
Adam Becker [00:12:07]: Yeah, yeah, I can see that. Emily, thank you very much for sharing this with us.