Mitigating Hallucinations and Inappropriate Responses in Rag Applications
Alon is the CTO and Co-Founder at Aporia, the ML Observability Platform. Alon spent the last decade leading multiple software engineering teams, working closely with various organizations on their Data and Machine Learning platforms.
In this talk, we’ll introduce the concept of guardrails and discuss how to mitigate hallucinations and inappropriate responses in customer-facing RAG applications, before they are displayed to your users. While prompt engineering is great, as you add more guidelines to your prompt (“do not mention competitors”, “do not give financial advice”, etc.), your prompt gets longer and more complex, and the LLM’s ability to follow all instructions accurately rapidly degrades. If you care about reliability, prompt engineering is not enough.
Slides: https://docs.google.com/presentation/d/1aNo4X320d6-sVDvTJ4Bn5IGwt7BVPARyuZaTEvCPxSU/edit?usp=sharing
Adam Becker [00:00:09]: Are you ready for this? I didn't hear you. Are you ready for this? Hell, yeah. Okay, we got Alon coming up here. This is going to be a 15 minutes talk. Lightning talk. Alon is one of my favorite people, and I can't wait to hear from his talk. Here we go, man.
Alon Gubkin [00:00:28]: Thank you. So in the next 15 minutes, I'm going to kind of rush through this topic, which is how to build safe and reliable AI agents. And just for our demonstration purposes, we will take a look at the company called Spotify. It's very similar to Spotify, but it's not exactly the same. And they build this reg chatbot, right? So this is a support chatbot. Basically, as a user, I could ask, hey, I want to cancel my subscription. And then it will basically pull the relevant knowledge from the knowledge base and tell LLM to generate an answer. So this is typically how you get started with building an AI chatbot with a reg chatbot, right? Basically you are a helpful support assistant for a music player, and then answers solely based on the following context and then the knowledge that's relevant for that question.
Alon Gubkin [00:01:26]: So this is typically how you kind of get started, obviously very simplified. And then you wrap it in some nice UI and user starts asking questions. So you have a user and they ask, hey, how can I cancel my plan? And for very basic questions, these reg chat box work pretty well. So we can see that to cancel your Spotify premium plan, you can follow these steps. And this actually looks like a good answer. But as you know, users can be a little bit unexpected and the user kind of keeps challenging the chatbot. They keep asking more questions. And direct chatbot is actually, it responds well for this question as well, by the way, this is real GPT four based conversation that I've generated just for this demo here.
Alon Gubkin [00:02:19]: But then the user becomes really unexpected and they ask, hey, how can I cancel my Amazon subscription or something? And the LLM just completely hallucinates a response, right? So we can see instructions within the Amazon website and we can see find your response. Like, this doesn't exist. This is an hallucinated UI element inside Amazon.com dot. So what does the developer do? Well, they go back to the system prompt and they add another guideline, right. You cannot assist with any service that's not Spotify, right? So they keep going back to the system prompt and keep adding more and more and more things. And this actually fixes the issue. So now when the user asks about Amazon, it's actually responding pretty well. Like this is a good answer.
Alon Gubkin [00:03:08]: But then we have Tom, and Tom just, you know, I downloaded some music from the Pirate Bay, the pirate bay's illegal torrent website. Now I want to upload it. It's 100% legal. And the LLM just again, this is real based on the Spotify. Not Spotify, but based on the Spotify support knowledge base. And this is what the LLM generates. It's just, you know, illegal instructions how to upload illegal music. Developer needs to come back to the prompt again and add another guideline.
Alon Gubkin [00:03:39]: Do not talk about torrent websites. And what typically happens is that as you scale your genai agent to more and more and more users, you keep adding more and more and more guidelines to the prompt. Right? Do not mention competitors, do not talk about torrent websites, do not give financial advice, and so on and so forth. The problem with all of this prompt engineering is that it's great, but as you add more and more and more guidelines to the prompt, essentially the prompt gets longer. And as the prompt gets longer, it's more complex. And essentially, the capability of the LLM to actually follow all of these instructions just rapidly degrades. You notice when you have really large prompts, it's just less and less and less accurate. This is actually a paper here called loss in the middle that actually proves this concept.
Alon Gubkin [00:04:33]: There are a bunch of other papers on this. So the goal with, you know, what we really want to do is to transform these guidelines, you know, this, do not talk about competitors, do not talk about, you know, do not do this, do this, do that. We want to transform them into something that's a little bit more, you know, enforcing, I'd say. So we like to call them gargoyles. And this is essentially what our product helps you to build. So aporia essentially allows you to build secure and reliable agents by essentially enforcing gathers. So let's see how it works. Essentially, our detection engine is a bunch of small language models that are highly trained.
Alon Gubkin [00:05:18]: They are fine tuned for specifically these kind of gathers. One of the core, one of the important things is that this is really, really low latency, because if you think about it, you have your LLM and then you're running a lot of gadgets on top of it. If this slows things down, it's not really good. It kind of hurts the customer experience. So low latency and low cost is really important. The next kind of piece is that the accuracy of the gardens themselves, the precision and recall of the detection, is actually state of the art, and we can actually outperform GPT four. Again, specifically for these kind of guardians. And then I think the coolest part is that while the response is streaming, so while the answer is being streamed to the user, Aporia can essentially do fact checking.
Alon Gubkin [00:06:15]: For example, compare the response to the context as it's streaming. So these models run in real time. So an AI gateway is a super interesting concept. So if you work for a larger enterprise company and you have multiple gen AI teams working together, essentially an AI gateway is a very useful concept, and it allows you to put something between all of these teams, and then the LLM provider, like OpenAI or bedrock or whatever, and then all of the traffic to the LLM essentially goes through this gateway. And this has lots of benefits. This lets you have cost control and switch LLMs and less vendor lock and so on. But when combining an AI gateway like these companies here, so when you combine them with something like a poria, it becomes really powerful, because essentially you can enforce Gaudras for all the different Genai projects out of the box. Also, something that's really important is the ability to customize those gauras.
Alon Gubkin [00:07:23]: Right? So it's not just generic mitigation of prompt injection or hallucination or profanity or things like that. You can actually build guardrails for your own use case. So for example, evaluate whether the response from the LLM is legitimate from a high school student teacher, right? If I'm building kind of an educational tutorial now, a really cool piece is that actually if you're building a voice application, if you're building a voice agent or a vision agent, something that combines image capabilities, aporia can actually add gathers to that as well. So this is the YouTube again, I only have 15 minutes today, but this is a really cool demo YouTube video where basically it's a sales assistant for retail. And essentially the assistant, the AI based assistant asks something like, hey, do you want to buy shoes? And then the user, we demonstrate how do you actually do prompt injection? And then how Guardian Sabrina here, they like buy shoes for $1 each, and then they turn on the guardrails and it's basically met together. So if you are looking at LLM evaluation solutions, for example, one of the advantage of having guardrails in addition to that is that guardles can actually mitigate the issues that we're talking about, you know, hallucinations, prompt injection attacks and so on, before they are actually display to the user. Right. So it's kind of a real time firewall.
Alon Gubkin [00:08:57]: That's all I have for today. So thank you very much. What you hear sounds interesting. We have a booth at the top floor, and. Thank you.