MLOps Community
+00:00 GMT
Sign in or Join the community to continue

The LLM Guardrails Index: Benchmarking Responsible AI Deployment // Shreya Rajpal // AI in Production 2025

Posted Mar 13, 2025 | Views 265
# LLMs
# Guardrails
# AI Risks
Share

speaker

avatar
Shreya Rajpal
Creator @ Guardrails AI

Shreya Rajpal is the creator and maintainer of Guardrails AI, an open-source platform developed to ensure increased safety, reliability, and robustness of large language models in real-world applications. Her expertise spans a decade in the field of machine learning and AI. Most recently, she was the founding engineer at Predibase, where she led the ML infrastructure team. In earlier roles, she was part of the cross-functional ML team within Apple's Special Projects Group and developed computer vision models for autonomous driving perception systems at Drive.ai.

+ Read More

SUMMARY

As LLMs become more common, ensuring they're reliable, secure, and compliant is more important than ever. With many guardrail solutions popping up, how do AI engineers figure out which solutions work best for them? In this talk, Shreya will cover the first-ever AI Guardrails Index — a thorough evaluation of top guardrail solutions. Based on systematic assessment of 6 key AI risk categories, this benchmark helps AI developers, platform teams, and decision-makers better understand the landscape of LLM safety. Key takeaways: - The Current State of AI Guardrails and why they matter for responsible AI use - See how different guardrail solutions stack up in terms of precision, recall, accuracy, and speed. - Find out which guardrails work best for specific needs, from finance to healthcare to content filtering

+ Read More

TRANSCRIPT

Demetrios [00:00:04]: The first keynote of the day. Let me bring out the magnificent Shreya. Where are you at? Hey, how's it going?

Shreya Rajpal [00:00:15]: Hey. Hey, Demetrios. Good to see you. I am in San Francisco and I'm very excited to, you know, kick off with a keynote.

Demetrios [00:00:23]: Excellent. Well, thank you for waking up early and doing this for us. I know it is a little bit earlier where you're at. I will let you share your screen and we will rock and roll. You've got 20 minutes and then we are going to have Q and A. Feel free, folks. If you have questions, drop them in the chat. And there's also a Q and a button that you can hit if you want so that we can see all the different questions that are coming through.

Demetrios [00:00:57]: But in the meantime, Shreya, I see your screen is shared, so I'm going to get that rocking and rolling and I'll be back in 20 minutes.

Shreya Rajpal [00:01:05]: Awesome. Sounds great. I'm happy to kick it off. First of all, really honored to be back at the AI in Production conference and to be giving a keynote. So welcome everybody. And I hope the talk maybe sheds a little bit of light about where we are in terms of making a lot of the generative AI infrastructure that we're using reliable. So as part of that. Right, like what I'm going to be talking about today is the Guardrails Index, which is all about how can we benchmark the efficacy of AI reliability infrastructure.

Shreya Rajpal [00:01:38]: So a little bit before we get started, so you know who we are. Like, I'm the founder and CEO of Guardrails AI and Guardrails is the number one open source generative AI guardrails framework. And, you know, we basically probably are best known and most effective for, you know, having the largest collection of AI guardrails anywhere. So 65 + open source validators that you can use, that we've built or have been contributed to by the community, etc. And are ready to kind of like add to your applications or AI infrastructure. Awesome. So let's get started with what this talk is all about and what we're going to be covering. So this isn't really news to most of you people, which is that generative AI has a pretty substantial AI reliability problem.

Shreya Rajpal [00:02:24]: So these are just some screenshots, but I'm sure so many of you, especially in the year of vibe coding, as Demetrius talked about, so many of you kind of wrestle with this idea of like, okay, it kind of works, but the value, after a while as you use it starts becoming capped because you can't reliably and repeatedly get a lot of the same behavior out of generative AI. The fascinating thing is that generative AI based applications and agents specifically are still so useful enough that even with all of the issues about like trust and reliability, we tend to see that people still like, you know, really go ahead and adopt a lot of agents, you know, into their work, like work lifecycles. Right? So we come up with this interesting problem of like, since the reliability tends to be this, you know, big frustration and can often like, limit the impact that generally I can have, how can we like attack reliability systematically for generative AI agents? And how do we kind of like measure and benchmark how effective they are? So that's what I'm going to be covering in like the bulk of this talk. So let's start with the first part of that, which is essentially, you know, if you're wrestling with this idea of, you know, issues with AI reliability, how AI guardrails is a component that helps you add a lot of that reliability back in. And this is something that I'm guessing like, you know, now more so than ever, a lot of people are familiar with, like what guardrails fundamentally are. But I'm just going to give a quick refresher on that just to lay the foundation of a lot of what's going to come later in this talk. If you think of what a typical LM application looks like, the idea is very simple, which is that you basically feed in a prompt to your LLM and then get an output back. And then that output could be a text to SQL query, that output could be customer response, that output could be a generated code output, but whatever that output is, you essentially generate it via prompting and then use it at various different parts of your LLM application.

Shreya Rajpal [00:04:27]: The key difference, when you think about how guardrails add reliability is rather than relying purely on the LM to do the right thing, quote unquote, and behave in the exact manner that your application desires, why don't we add explicit validation and verification around the LLM that checks for, okay, do we, you know, all of the assumptions and instructions we laid out in our, you know, system prompt or something where those followed is our output that's generated, you know, correct and actually usable by our application, etc. So, you know, instead of just making an assumption that all of those things are going to be correct, we should explicitly verify and check for them. Because LLMs under the hood are very, you know, non deterministic. Right? So that's the core idea of what guardrails are. So in our terminology, what we call them is like you can essentially have input guards and output guards around your application, where an input guard is a set of guardrails and each guardrail checks for one element of correctness for your application. So that could be, is there any PII or proprietary information being leaked? Is a jailbreak attempt detected in your prompt? And then on the output side, are there any hallucinations not safe for work responses, sensitive topics, etc. That are being discussed? Each individual check is a guardrail and then you combine a bunch of them into guards and then add them on the input and output side of your application. This is the core idea of what guardrails are.

Shreya Rajpal [00:05:59]: Very briefly, this looks like a complicated diagram, but I think the core idea to take away from this is that as you build more and more sophisticated AI applications such as, you know, co programming agents, or you know, multi turn chatbots, et cetera or co pilots, the idea of guardrails is pretty extensible. Wherein you typically would pepper that throughout your AI architecture, right? So let's say this is, you know, a customer support agentic, like an, like an AI agent for customer support framework, right? So what you would typically do is like as you go through this pipeline of, you know, a customer query comes in, you'd basically pass it through an input classification guardrail as a support assistant generates an action, you essentially go through an action validation guardrail. As you execute that action, you see, okay, was the response that was generated desirable? And so you go through that response validation guardrail. This entire idea tends to be pretty extensible. And how guardrails are used within the application tends to be much more, tends to grow with the level of sophistication of the application that you have. The other thing I want to quickly highlight is another way that people typically add reliability via guardrails is by just adding them directly onto your AI platform. So let's say you, you know, as you're building AI applications, you end up building like a centralized AI platform to make it easy to, you know, just use AI or you know, parallel work on or build a bunch of different applications. This tends to be typically the case that, you know, if you're working in larger organizations.

Shreya Rajpal [00:07:41]: So a lot a common pattern is to typically add guardrails right at the source of where your LLMs are being used. Where if you have any sort of AI gateway, you would essentially add like a guardrailing layer around that gateway so that throughout the organization, instead of using, you know, raw unguarded LLM endpoints, you essentially like use that same functionality in chatbots, in rag applications and agent workflows, whatever your way of using and building AI is. But instead of using that raw endpoint, you essentially use, you know, this guarded endpoint. Awesome. That's a little bit of the foundation of what guardrails are. One of the things we haven't really touched upon is how they really work. What I'm going to talk about is not all guardrails are created equal. This is especially the problem because a lot of times these guardrails are essentially looking at verifying policies as part of the checks that they're performing, hallucinations, et cetera.

Shreya Rajpal [00:08:41]: All of these are policies that you want to apply to your AI application or your AI platform platform. And then to detect all of these policies. It isn't typically a straightforward, you know, problem to verify or validate. So typically you have to build a machine learning model in order to attack that problem. Well, right. So how this can look under the hood is you can basically have a rules based or a heuristic system. This tends to be, you know, only for I, you know, I would say like the most kind of basic problems, but tends to, you know, work really, really fast but not be as flexible in terms of what it can attack. So typically things like regular express expressions or pattern matching or keyword filters, this can dictate part of your guardrail.

Shreya Rajpal [00:09:22]: So let's say you have a guardrail about, you know, no PII is allowed to be leaked. You can detect what PII looks like, like what a phone number looks like, for example, by doing like regular expression matching some guardrails. Most guardrails in fact tend to be small fine tuned machine learning models, especially if you look at, you know, things like factuality. So detecting when there's a hallucination. You can do this by using like a smaller fine tuned ML model, or classifying the toxicity or language, et cetera used by an AI model or an input going into the prompt. That can basically be done by a traditional small fine tuned model. Topic detection to detect what is the subject matter that the chatbot's talking about or the AI application is talking about, or even detecting any named entities. All of that can be done by traditional small ML models.

Shreya Rajpal [00:10:11]: Then finally, if the problem that you're trying to catch with guardrails tends to be much more complicated or sophisticated, you can also use a secondary LLM call. For example, are my chatbot responses coherent across multiple sentences or coherent based on what the input was? That's a much harder thing to verify with just a small fine tuned machine learning model. This is one of the things you can do with using a secondary LLM call similarly reading the tone of voice used by an LLM. These tend to be much more fuzzier and therefore tend to be much harder problems in machine learning. From a machine learning perspective, most commonly what you'll see is that a guardrail typically uses a combination of these approaches, right? So you'll have like a waterfall like system where you can see if you can kind of like build a high confidence estimate of whether the undesirable behavior trying to catch with guardrails can be detected with like some of these simpler techniques. And if not, then you know, go towards the slower, more expensive but like typically more flexible technique and you end up kind of like building an ensemble. Awesome. So with that out of the way, as with anything machine learning, it's very important to understand how effective and how performant the machine learning system is.

Shreya Rajpal [00:11:31]: And this is where the subject of this talk really comes in, which is that we made a huge amount of investment in the open source, basically building out a comprehensive AI guardrails benchmark that is essentially intended as this useful tool to help you decide which guardrails are most relevant and what performance trade offs you're really making when you're adding guardrails to your framework. And that's what I want to shed light on and share some interesting findings etc that we have. What we ended up doing is especially because people talk about guardrails but never really talk about how performant they are, what the latency to expect is, how accurate or how many of the risks can guardrails really catch. This is where what we did was we created six tasks which represented the most common tasks that we had seen on our open source guardrails hub is being the most popular task for which people use guardrails. Then for these six most common guardrails tasks, we essentially curated data sets, picked up the most common guardrails that you can use for this, either by using open source models, by using our guardrails, or by using some verification endpoints using the cloud providers, or by prompting OpenAI etc. And then we kind of shared what our findings was. So that's some of the effort required here in this talk what I'm going to do is I'm only going to focus on these two just in the interest of time, so PII detection benchmarks and then jailbreak detection benchmarks and I'm going to skip over these four. But if you're interested, you can find the entire report, you know, just publicly available on index.guardrailsai.com so I'd encourage you to check that out.

Shreya Rajpal [00:13:25]: Awesome. So let's look at PII detection guardrails and you know, like, what are interesting findings there are. So the task here that we're trying to solve with the PII detection is essentially protect personal data if you're building any AI application in financial services, in healthcare, hr, legal and call centers by automatically masking sensitive information. Right. And especially if you're working in using AI for any of these regulated industries, you'll typically find that this ends up being like a huge blocker in adopt, successfully adopting and getting the type of reliability that you need from generative AI applications. What we did was we curated the dataset and I'm going to talk in a second about the dataset that we created. We then essentially benchmark three providers here to talk about how effective guardrails are. One of the most interesting findings for me here was looking at the performance of Microsoft Presidio.

Shreya Rajpal [00:14:31]: Microsoft Presidio tends to be one of the most commonly adopted things you can find if you want to mask viii. In fact, we have it available on our guardrails hub as well, and we like using it. I think it tends to be really popular. It's very easy to use, very fast, very cheap. Then we benchmark Microsoft Presidio, a leading OSS model that is Transformer based. In contrast, Microsoft Presidio is more rules based or traditional NLP technique space. Then we also benchmark a new PII guardrail that we ended up building, which was kind of like a ensemble of a lot of different types of techniques for catching pii. What we kind of found that even with Microsoft Presidio's adoption, even with everybody using it, like nobody really benchmarks how effective Microsoft Presidio is.

Shreya Rajpal [00:15:22]: And then we found that like, especially if you care about like F1, precision, recall, accuracy, any of those, any of those, like more traditional ML metrics around performance, Microsoft Presidio isn't the most performant way to mask bii. And not only is it not very performant, it's also objectively a lower number for a machine learning audience. Here, F1 scores, a 0.37 of an F1 score is objectively not very high. And we're used to seeing these numbers be much higher typically for a lot of the ML models that we use in pods. This was actually a very surprising finding for us. We found by just switching to more sophisticated Machine learning and deep learning based systems, we were able to improve on the F1 scores of Microsoft Presidio by 75% in a second. I'm going to show you guys where a lot of that lift comes from. But we found that even if you look at latency, because Microsoft Presidio uses these really small rules based systems, etc.

Shreya Rajpal [00:16:28]: The latency tends to be really fast and really reliable, which is one of the best things about Presidio. So this is like about 15 milliseconds here. But if you're looking at just what's perceptible for the human eye, even though we have a roughly 4x higher latency when you're running our guardrails on GPU with like 64 milliseconds, 64 milliseconds is still in the range where this is not really even perceptible to the human eye. So even though on paper that latency is higher, you don't really get too much of a, too much of a slowdown in terms of perceived performance. Let's look at the data set that we use. Design. We actually use the data set created by AI for privacy organizations. PI.

Shreya Rajpal [00:17:10]: I'm asking 300k. This data set is linked in the report that we've published on our website. Then what you can really do is break down PII by these categories and then kind of see what the performance ends up looking like. Right? So straight off the bat, one of the most fascinating things for me here was that person, which is, you know, like your name essentially if a name pops up in a statement or a data set or something, it's one of the most common things that people think about when they're thinking of pii. Like you want to mask out, you know, a username, etc. And then interestingly, this is also one of the areas where PII has one of the lowest, you know, like numbers objectively that we'd kind of seen. Right? So across the board the performance tends to be higher across, you know, some of these other categories. But for PIs specifically, like, we're basically seeing this like very low, like 0.1 score, which you know, is objectively like really, you know, not very reliable compared to that.

Shreya Rajpal [00:18:08]: Like liner actually ends up getting the highest score with like a 0.45. And then, you know, we're not far behind with like a 0.44. And then for a lot of these other categories around email, phone number, etc. You know, these scores tend to be much higher and tend to be in the 90s or 80s range, which again is like if you're using machine learning models and systems like in the 90s and 80s ranges, like what you're typically most used to. Awesome. The second category we're going to look at is jailbreak prevention guardrails. Again, one of the biggest kind of most common concerns that people really have around unreliability with AI systems and the task with jailbreak prevention is essentially mitigating the risks of jailbreaking LLMs so that they can bypass safety measures or produce restricted or harmful content. Right.

Shreya Rajpal [00:18:57]: So what we did here again, we curated the data set and then we benchmarked four models. So Llama Meta's Llama prompt card is the first model we benchmarked. Anthropic. This is actually an interesting one. So Anthropic doesn't officially have a jailbreak prevention model, but they released like a prompting guide that talks about how you can use anthropic models for jailbreak prevention. Microsoft SHIELD Prompt and then finally we created our own jailbreak prevention model which was transformer based. Then what we saw here was that if you're looking at Llama prompt guard immediately, these numbers should stand out to you, which were once again really surprising for us, which is that both the true positive rate and the false positive rate is 1. And Llama's PromGuard tends to be very skewed towards always predicting a positive label, which again, if you're looking for like, you know, accuracy, if you're looking for, you know, precision, like maybe not the best behavior that you're really looking for compared to this, like, if you're looking for like the most successful, you know, like true positive rate or you know, if that's kind of like the metric you want to optimize for.

Shreya Rajpal [00:20:03]: We interestingly found like this contender with Anthropic surprisingly having like pretty good performance. So if you were just to detect jailbreaks and the metric you care about is like your true positive rate, and you do this by prompting using anthropics like Claude model with prompting, you would essentially see like one of the highest true positive rates we've seen, like one of the, one of the best kind of like trade offs between, you know, true positive rate and kind of like just I guess accuracy would be like a more consolidated way to kind of like talk about the true positive and false positive. Microsoft SHIELD Prompt surprisingly, like didn't really have as robust performance here. And then with guardrails, what we try to kind of maximize is the overall balance performance between if you're looking at the area under the curve, or if you're looking at F1 scores. And so we ended up kind of coming out the winners here on these two metrics as well as on accuracy in terms of latency. Llama Promguard had the lowest latency here with like 52 milliseconds when running on a GPU. And then we are not far behind with 54 milliseconds. Right.

Shreya Rajpal [00:21:10]: And a big part of the work that we've done in creating this index is making sure that we share the data set publicly, we share the breakdowns publicly, etc. So that you can independently verify all of the metrics and results we came up to. So this data set is also available on our report. We essentially found this big gap in all open source data sets that we looked at for jailbreaking where they had this over representation of this one single type of jailbreak category, which is role play, pretend, hypothetical. So if you're familiar with like Dan style attacks of jailbreaking, which are essentially like you're an lm, like a do anything now kind of style of attacks where you kind of like grant permission to the LLM to kind of like, you know, follow your instructions, we found those as being like the most overrepresented categories in the open source data sets that we looked at. So a lot of the work we did here was really kind of like taking these data sets and then balancing them out so that we also have these categories represented as part of the jailbreak metrics that we're looking at. And these are by the way like non overlapping. So one data point can belong to multiple categories.

Shreya Rajpal [00:22:19]: And so you see some of that kind of like coming up here. Cool. So I'm going to close it out. The only main, the key takeaway from this keynote that I want you to think about, we see a lot of people using guardrails and there's a lot of like ways to add reliability. But like none of it, even though a lot of this is machine learning, you know, based like none of the, none of it is really very grounded in numbers. So the main takeaway I want you to take from this keynote is that guardrails and any AI reliability framework you'll use under the hood are basically machine learning models and that you should benchmark every single ML model that you should use. And if you're interested, you can check out the full report [email protected] thank you so much.

Demetrios [00:23:04]: Amazing. Shreya, thanks so much for doing this and as always, what an incredible talk. This Stuff is so important if you're trying to put AI into production. And that's why I always appreciate you coming on here and schooling us through this. In the meantime, I am going to let folks ask some questions. I know they'll start trickling in, but I can ask. While we are waiting for people to ask these questions, the stuff that I'm thinking about really are like, at what point in your development cycle should you be thinking about guardrails?

Shreya Rajpal [00:23:49]: Yeah, that's a really great question actually. And very candidly, I think this is also where generative AI is continuously evolving and such a new topic. And then every week there's like, you know, 10,000 updates that happen. Right. And so our opinion on this has also evolved from like when I first like launched the open source, you know, like almost two years ago now. And so what we've seen as being the most successful points at which, you know, you really should think about guardrails are like one when you're really thinking about kind of like enablement for your organization, like AI enablement for your organization. So really anytime you're thinking about like, hey, I work at this organization and I want to use enterprise AI and I'm building like a centralized AI platform that has maybe an AI gateway, AI model gateway, a vector database, et cetera, I think we've seen a lot of folks really add guardrails as an essential component of that AI gateway. So that every model router that you're using or model gateway that you're using, so every request that comes through that is guardrailed.

Shreya Rajpal [00:24:55]: I think that's one of the most important points. And then the second one ends up being like typically right at if you've developed an AI application such as Rag, Chatbot, AI Agent, whatever that may be, as you're going into production, essentially making sure that you have a lot of this preventative measures such as guardrails before you ship to prod as a key way of enabling prod deployments. I think that's 10 tends to be the other kind of like crucial one.

Demetrios [00:25:24]: So there's some questions coming in here now. How are you creating testing data?

Shreya Rajpal [00:25:32]: Yeah, yeah, that's a good one. I think in terms of test data, I'd actually recommend like looking at the report here for a lot of these. We either looked at what are open source data sets available. If you know there's a really good data set available, we, we basically use that and then update that. So PII is a great example of that where there is a really fantastic PII data set available for other ones, like jailbreaking, for example. We found that there are open source data sets that have some good and some bad things, but they require a lot of augmentations, curations, sampling, filtering, et cetera. So we did a lot of work there in kind of like creating the right data set. So jailbreak detection has one of those and then also kind of like share details about like how we curate or created the data set like in the report there and then for some other categories.

Shreya Rajpal [00:26:26]: So I didn't talk about it in this report, but like for, I believe it's for the off topic conversation or restricted entities kind of categories. The most effective thing we found was to, you know, like create a new data set from scratch. So there wasn't any open source data set, any public data set that we saw that we could use there. So here we ended up kind of like creating our own data sets.

Demetrios [00:26:47]: All right, so I've got one more great one for you from Joe in the chat, and he's asking, what's the trade off between cost and latency versus safety? Are there ways to observe and tune that with guardrails?

Shreya Rajpal [00:27:03]: Yeah, yeah, that's another great, great question. I think in general, I would say that there's no one universally applicable answer for that question about like, trade off between like cost and latency for safety. Because every, like, safety means different things to different people. Right? So for you, for example, going back to this, you know, information about categories, for you, safety might mean like no jailbreaks and no hallucinations, whereas for something, someone else, it might also include content moderation. And the more categories you add there, the more kind of like this, you know, the more your cost increases or the most kind of like latency increases, depending on how you've orchestrated your framework. Right. What we have really tried doing with our open source is make it so that you can add as many guardrails as you want without necessarily needing to increase your cost. Because we essentially have implemented a lot of the foundations you need for parallelization and latency optimization.

Shreya Rajpal [00:28:04]: I think the cost at the end of the day really depends on what models you're using as part of your guardrails. And that's also where this index can kind of like be a helpful, like, tool and assistance. So if I look at, you know, gleam prevention here, I talked about how Anthropic had like, surprisingly great performance. What I kind of like didn't touch on is like, if you look at this number in this column here, right. Anthropic also has I think this is like what, 40x or something like worse latency and also one of the most expensive models that we kind of like tested for. So there tends to be like this again the straight off between what you really care about. And then we didn't include cost numbers here because a lot of these are for example, like GPUs. It depends on what your GPU rate is with your cloud provider and that kind of really varies.

Shreya Rajpal [00:28:52]: But we have shared the data sets and the models, etc. To do a lot of this benchmarking. And so if you want you can reproduce a lot of that and then with your cloud provider test out, you.

Demetrios [00:29:04]: Know what, this rate should be excellent. There's another great one coming in here about the best way to protect some case specific PII that wasn't present in the data set and another person just in the chat. I think we can drop some links to use cases or case studies that they can go and read later too. But that question was best way to protect some case specific PII that weren't present in the data set.

Shreya Rajpal [00:29:37]: Ah, interesting. I would actually say that I think the PII taxonomy, this one. So this, we kind of like selected the top. I think this is 10 or something. We selected the top 10 just to kind of like include in the slide. But I think in our report there's about like 18 categories or something here. I would. If you're thinking about like some category that isn't present here or some case that isn't present here, I would first double check with the report that has the full numbers and see if that's a category that's present there.

Shreya Rajpal [00:30:07]: And if there is, then you can essentially just check like which model performs best. Right. And then just use that off the bat. If you check that and that's something that isn't present, what I will say is that your best bet is using like something like a named entity recognition system or like a regular expression discretionary system. And those tend to kind of like, you know, be extensible and flexible enough that without needing to do too much they'll give you like good performance. I think with pii, the key thing is that, you know, these models tend like often like, tend to be like local like. So you can't really use an LLM for this because you know, you're just I guess like passing on the contaminated PII containing data like to the LLM as well. And you know, so, so it's not really possible to use an LLM here, but I would recommend using either A named entity recognition model, or figuring out if you can solve this with regular expressions for named entity recognition.

Shreya Rajpal [00:31:05]: I didn't talk about it in this keynote, but we actually do have a pretty performant model here. Again, if you look at the report, you'd be able to find links to it and everything. I'd recommend checking that out as well.

Demetrios [00:31:15]: Very cool. Last one. Just on a practical level, should guardrails. I get a feeling the answer is a bit of both, but you're the one who knows more about this. So do guardrails always need to be created per task slash objective, or can it be at an enterprise level?

Shreya Rajpal [00:31:38]: Ah, that's a great question. I love that question. Okay, so I think there's different schools of thought on this and you know, there's, you know, I think Meta's Llama, for example, has this one school of thought where you can have like one guard model, right, for everything. And, and that's also where you see like llamas, like, you know, like Llama Guard, for example, is like one of the most common models there. And then our approach is much more, you know, no, they need to be like application. They need to be like kind of concern specific, like one guardrail per thing that you're kind of like concerned about. I'll talk about why we believe our approach is like the right way. And we're starting to see, we're starting to see with like anthropic constitutional classifiers that just came out a couple of weeks ago, or you know, even the opening I released yesterday, we're starting to see a lot of these large AI labs also align onto our way of thinking.

Shreya Rajpal [00:32:34]: And I think the big reason for that is like you just do not have consistent things that you care about in a universal setting. Right. So that's definitely not true in a global sense. So maybe, for example, in the use case, hallucination might be really a liability if this is going to be something that's customer facing. But for creative applications or for AI companionship use cases, hallucinations aren't necessarily a bad thing. So you tend to get these trade offs. But even within an organization, if your AI application is internal facing versus customer facing, you're going to care about different things. Or if you know you are in a regulated industry versus not in a regulated industry, you're going to care about different things.

Shreya Rajpal [00:33:14]: And so I think that's why it tends to become like really hard to have like, okay, for this organization, you just need this one like guardrail. And that's kind of it there can be this kind of like common. What do you call it like common guardrails that you apply just for everybody within the organization. So you know, for your org, you can say like okay, every else request, whatever application it's used for, should we should make sure that, you know, there's no content safety concerns or there's no jailbreaking concerns or pii. Right. So those three you just add to your, you know, guardrails proxy for the entire organization. But you still will end up needing like some specification like for application and then for industry, etc.

+ Read More
Sign in or Join the community
MLOps Community
Create an account
Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.
Like
Comments (0)
Popular
avatar


Watch More

Building the Next Generation of Reliable AI // Shreya Rajpal // AI in Production Keynote
Posted Feb 17, 2024 | Views 943
# AI
# MLOps Tools
LLM in Production Round Table
Posted Mar 21, 2023 | Views 3.1K
# Large Language Models
# LLM in Production
# Cost of Production
Responsible AI in Action
Posted Aug 07, 2024 | Views 721
# Responsible AI
# ML Landscape
# MLOps