Guardrails for LLMs: A Practical Approach
Shreya Rajpal is the creator and maintainer of Guardrails AI, an open-source platform developed to ensure increased safety, reliability, and robustness of large language models in real-world applications. Her expertise spans a decade in the field of machine learning and AI. Most recently, she was the founding engineer at Predibase, where she led the ML infrastructure team. In earlier roles, she was part of the cross-functional ML team within Apple's Special Projects Group and developed computer vision models for autonomous driving perception systems at Drive.ai.
At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
There has been remarkable progress in harnessing the power of LLMs for complex applications. However, the development of LLMs poses several challenges, such as their inherent brittleness and the complexities of obtaining consistent and accurate outputs. In this presentation, we present Guardrails AI as a pioneering solution that empowers developers with a robust LLM development framework, enhanced control mechanisms, and improved model performance, fostering the creation of more effective and responsible applications.
So speaking of being at the forefront of this L LM movement, I've got trea here. Let's see if I can grab her and bring her onto the stage. Yes. What's up Trea? Hey, to meet through. It's nice to see you again. Yeah, you too. So we just confirmed right. We're gonna see each other in San Francisco when I'm there in two weeks.
Yeah. Yeah, that's right. Yeah. Yep. So I'm gonna drop by your office, maybe even with Diego, and, and, uh, we can record one of these podcasts that I'm talking about. Yeah, I think that sounds amazing. Uh, we'll have to get an office in time for that, but outside of that sounds amazing. There you go. That's, uh, living in the hybrid world.
Yeah. So no, I'll go by, but I'll let you go, uh, because as I said before, you can call me Ringo today. I am keeping time and we're already four minutes over, so I'm just gonna give it over to you and hear what you gotta say. Awesome. Thank you. Um, so I'm sharing my screen and that's up here. Perfect. So I'm gonna get started.
Uh, hey everyone. Uh, my name is Reya. I am the founder of, uh, guardrails ai, a company focused on building, uh, AI safety and reliability for a large language model applications. Uh, and today I am going to be talking about how to build practical guardrails. Uh, if you're working in industry or if you're putting LLMs into production.
Um, so with that, let's get started. So why do we need guardrails off the bat? Um, LMS are awesome, but they are brittle and hard to control. Uh, and as we work with them, I am guessing a lot of people in the audience have worked with them as well. Um, it's typically a, a, a number of issues pop up in practice.
So for example, The LM application works while prototyping, but the moment you run it into production, it ends up being flaky. Um, there's, you know, issues with hallucinations and falsehoods, often issues with lack of correct output structure, et cetera. Uh, interestingly, the only tool available to developers is the prompt.
So if you want the l LM to behave a certain way, Typically people would just put into the prompt, do not respond with this word, or always respond in this manner, et cetera. Which just seems, you know, uh, prehistoric in some ways and, and insufficient. Um, so this, all of these issues combined mean that anytime you wanna deploy lms, uh, in an application where correctness is critical, um, you know, that becomes really hard.
And we've kind of seen that in the, in the applications that we see in practice. So, How, what are the tools available to us to control LLMs, uh, to start out with, so first is controlling the LLMs with a prompt, but we talked about why this is insufficient. Uh, primarily because LLMs are stochastic, and what that means is even for the same input, you might not see the same out output repeatedly.
So the prompt, you know, just does not guarantee correctness. Um, The second way of controlling LLMs is via the model. Um, so, but often it ends up being very expensive or time consuming to train or fine tune a model on custom data. And if you're using the l lm, you know, behind an api, uh, there's, you know, no control over model version, et cetera.
And the l m provider might change the LM from, you know, uh, without, without notifying you. Um, so the guardrails approach to controlling LMS and offering guarantees is combining an L L M with output verification. Um, so what that means in practice, and I'm gonna dig deeper into this in a second, is that there are applications specific checks and verification programs.
So these are small programs that you can run, uh, on the outputs of the LMS that ensure that. Uh, the l l m output is correct within the context of your application. So that was a mouthful. We're gonna dig deeper into what that means in practice right over here. So, uh, this is the standard way of l l m development wherein, you know, there's some application logic that, that lives, uh, you know, uh, uh, somewhere.
And within that application, logic is the l l M unit, uh, which essentially like takes the prompt. Uh, sends that prompt over to an L L M A P I, uh, generates the output and then forwards it out back to the application. Um, the alternative way of doing l l M development that we believe in is that after you get the raw output from an L L M, you pass it to a verification suite.
And this suite contains, you know, these small independent programs that we talked about earlier that encapsulate what correctness means, you know, for your application. So this might be, you know, a totally. Um, uh, like a totally varied set of criteria. So, for example, like not containing personally identifying information or not containing profanity might be two separate checks.
Um, you know, if you are building like a commercial application, then making sure that there's no mention of competitors, um, maybe important to you if you're working in code generation. Then making sure. That the code that is generated by an L L M is executable. If you're generating um, summaries, making sure that the summaries are correct and faithful to the source text, all of that ends up becoming important.
So we get this L l M output and we pass it through a suite of verifications. Uh, if, if all of the verification tests like pass, then you know, we continue on with that output and like send it back to the application logic. But if verification fails, then there's, you know, a number of different ways to handle it.
And then one of the ways to handle it essentially is to pa is to, you know, construct a new prompt. So this prompt contains context about what went wrong, uh, and then re-asking the large language model to, you know, like, Uh, correct itself and, and, you know, like, uh, try to produce something that is more aligned with the, uh, with the applications, um, uh, context.
Um, so this is where Guardrails ai, uh, comes in. So Guardrails AI is a fully open source library that offers a bunch of functionality. Uh, so for starters, it is a framework for creating any custom validators. So any of these tests that we talked about, like there's, you know, like, uh, nice hooks in the framework to, you know, like make creating that very easy.
Um, it is the orchestration of the prompting to verification to re prompting logic that we saw earlier. Um, additionally, it is a library of many commonly used validators across multiple use cases implemented for you. Uh, and it contains a specification language for how to talk to and communicate with large language models.
Um, so, um, I'm going to go through this example quickly, uh, and make sure I can squeeze it within the time slot that I have. Uh, but here we're going to go through this example of getting correct sql. Uh, so the problem is we are building an application that takes natural language, language questions over your data and generates, you know, SQL queries that represent those national language questions.
Um, and for this, Specific application are databases, you know, like department management that contains information about employees, et cetera. Um, so this is the same, uh, you know, guardrails development workflow from earlier. And then within this workflow, our verification logic has now changed where it contains these three tests.
So the first is that the SQL query is executable in the code base. Um, so what that means is, you know, uh, like once we generate the sql, we, you know, like pass it through our code base, through our database to make sure that the SQL query can actually work. Um, the second verification logic is that there are no private tables that I don't want to ex expose to my end user.
Like none of those private tables are, you know, part of the query. Um, and then the third one is that there are no risky predicate. So even if my end user says like, Hey, delete these, like drop these tables, et cetera, those, uh, those tables won't be dropped or, uh, or those queries, you know, won't be executed.
So, Um, so let's say that this is the quest, uh, like, uh, this is the question that we end up receiving from the user. Which department has the most employees? Um, so guardrails, you know, automatically, um, like guardrails, text to sequel package has a prompt that is automatically constructed for this na national language query, which will pass in all information about the schema of the table.
Um, other information that may be relevant for answering this query, uh, and sends the prompt over to the large language model. Uh, let's say for this example that this is the output that we end up getting from the l l m, which is, you know, select name from departments. Uh, and here, let's say that the department's table doesn't actually exist in our schema.
Um, so when we run this output through verification, we find that two of our verification tests pass, uh, you know, no private tables, uh, and or no risky predicates exist in the query that is generated. But because the department's table doesn't exist in the database, uh, this SQL code is not actually executable.
Um, so we now enter this like validation, failure and reconstructing prompt part of the guardrails workflow where a re re-asking prompt is automatically constructed by guardrails with all relevant context about, you know, why this particular generated queries incorrect. So all of this is automatically done and then sent over to the lm and we end up getting this like corrected.
Uh, response with the correct table name. Um, we pass this through verification again, so we're back in like this part of the co uh, this part of the logic and then all verification tests pass. And so we end up passing this output back to our end user. Uh, so this was obviously a very, very simple and contrived example, but it goes through the workflow of what building an LM application with guardrails AI looks like.
Um, so very briefly, I'm just gonna leave this up here. Not really talk through it, but these are like some of the examples of the, uh, of the validators, you know, either available in the library or that you can build with the library, you know, and so they span a ton of use cases. Um, and we only talked about re-asking as a way to handle validation failures.
Uh, but guardrails comes with, um, you know, a bunch of other options, including like filtering or refraining, raising an exception, just logging that, uh, logging incorrect failures if they occur, or trying to programmatically fix them wherever possible. Um, so in summary, uh, guardrails ai, fully open source library that offers a lot of functionality to make AI applications more reliable in practice.
Uh, inclu, including, you know, a framework for creating custom validators, the orchestration of the prompting verification and re-prompt, um, a bunch of commonly implemented and used validators available in the library, as well as the specification language for communicating with LLMs. Um, to follow along, you can, um, look at the GitHub package, uh, on Reya r slash guardrails.
Um, the dogs are, uh, at Get Guardrails ai or you can follow me on Twitter, uh, or the guardrails AI account on Twitter, where I continuously show, um, the guardrails philosophy. Um, yeah, that's it. Oh, so good. So many. Great. Oh, sorry. Wrong, wrong button. I want you to stay on for a minute. Yeah. You don't need my face anymore.
You just need like, how to follow along with this thing. Get outta here. I gotta keep it moving. No, I wanted to show you this. I think you've seen it before, but, uh, any time that I think about this shirt, I always think like, oh, this should be the guardrails mascot shirt or something. Because it is, uh, it just plays into these LLMs so much.
We all know it's true. They hallucinate like crazy and there are some, uh, there are some. Uh, incredible questions that are coming through the chat. So I'm gonna direct you Reya over there in case anybody wants the i Halluc name more than chat, g p t-shirt. You can scan this right now and get it and I look forward to meeting you in person in a few weeks.
Reya, likewise. Thanks again for inviting me to be true. Yeah, we'll see you soon.