Fraud Detection in the AI Era
speakers

Entrepreneur and problem solver.

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
SUMMARY
Rafael Sandroni shares key insights on securing AI systems, tackling fraud, and implementing robust guardrails. From prompt injection attacks to AI-driven fraud detection, we explore the challenges and best practices for building safer AI.
TRANSCRIPT
Rafael: [00:00:00] Rafael Sandroni, I'm the founder of the GuardionAI, it's a AI security company. And usually I mix Brazilian coffee with Italian mocha. And so that's the best combination, but I also, I like the espresso.
Demetrios: We'll be back for another MLOps Community Podcast. I'm your host, Demetrios. And today we're talking with Rafa all about the security vectors that you got to watch out for when it comes to building apps or just using LLMs.
Demetrios: I particularly enjoyed the moment that we touched on traditional fraud detection that he's done at places like Nubank, Nu, N-U, and you shout out to their engineering bug. And then we contrast that with these days, the kind of frog that sounds like I'm saying frog, the kind of fraud that you can do when it comes to LLMs.
Demetrios: And if you have AI systems in place, let's get into this conversation with my man. I[00:01:00]
Demetrios: know you were working on Siri and I know that you were doing stuff that was to help Siri understand other languages. But Siri's been around for a while. It's also been pretty wild how the evolution of Siri has been happening. And you posted some stuff that caught my eye on LinkedIn about like how to make assistance, AI assistance, better things that you need to like keep in mind when you're creating AI assistance.
Demetrios: What are some of these things that you've recognized as best practices that you can share from your time doing that? And I know Apple has some pretty strong NDAs, so you don't need to get into specifics, but just what are some key learnings?
Rafael: Okay, so what I understand is that with the LMs, the APIs and so on, it's much easier [00:02:00] AI systems.
Rafael: And based on my experience, I don't know, Siri is one of the top five AI users in the world. It's a huge problem to make sure that everything is going like a testing make sure that the response have a, is like a kind of a quality. But also the orchestration, like depending of the car, what AI assistant is this.
Rafael: So you need to. Orchestrate, integrate with APIs. ~The os and so you make ~you need to make sure that everything is integrated. Integrated test is integrated because, I dunno, it makes you need to make sure that you have contracts between the integrations the test as well.
Rafael: So you can see recently using a mix of AI and human to test.~ But ~yeah, I will say like the test is one of the most important part, like if you are creating some value for people, you need to make sure that AI, this AI system [00:03:00] is like working well, you don't have a kind of regression.
Rafael: ~Yeah, but ~So I work at CD on some let's say components to make sure guardrails to make sure that you don't have a kind of outsource referencing, like a kind of profanity and competitors and so on. And if you think about, I don't know, a bank using this is a regular regulatory markets that have a kind of regulatory as well, you need to make sure that the answers of this AI is you have a kind of control, right?
Rafael: Yeah so recaptain test is the integration, make sure have a kind of contract and also controlling, like I have like Roger rails or controlling the output. I feel
Demetrios: like Siri is the ideal abstraction layer that I want an agent to play at or an assistant because it can do anything on the phone and it can [00:04:00] go into the apps.
Demetrios: I wish it could go deeper into the apps, right? But I guess that's just, each app has to make things discoverable for Siri. And so there's this. Question, do I want to have my own AI assistant inside of my app? Or do I want to have Siri be the AI assistant inside of my app? And that type of thing potentially are what different apps have to grapple with all the time.
Demetrios: ~But yeah ~you were doing the guardrails before guardrails were cool. And before there was these huge blunders. So it's funny to think about if you didn't do your job well, Apple could have been the Air Canada. It could have had that brand awareness that no brand really wants. But then I know you went and after you work in an Apple, you went and worked for one of my favorite engineering blogs of all time, which was NewBank, right?
Demetrios: And NewBank is awesome [00:05:00] because of how transparent they are in their engineering blog, I think they're. I say this a lot, but I don't say it enough. The NewBank engineering blog, I've learned so much from them and they are so transparent that I'm thankful for that. What were you doing at NewBank?
Rafael: Yeah. So it's a little bit complex because I spent almost three years.
Rafael: Over there. And I had the opportunity to work together with the part of this team. But I guess the important part, the thing that I like at Nubank is like a sharing like like knowledge. So you have a lot of let's say workforces meetups internally to make sure that everyone is in the same bar.
Rafael: And my, my first experience at NewBank was interesting because I was part of let's say a new feature internally, you are building a personal [00:06:00] finance assistant for NewBank users. And it was part of my startup, my that I founded years ago. And you have the opportunity to create the basis for NewBank.
Rafael: Of this AI assistant and funny that the funny part to I was do new bank decided to, to buy one of my computer computers like a personal files management app. And it was part of the due diligence, understanding internal tools and so on. So it was an interesting part building this AI assistant.
Rafael: I guess now it is part of the new big app helping people to have savings controlling the savings, have a better planning, financial planning. And but after that, it was part of like a fraud thing fraud and AML team like was basically the same and the challenges was like, I guess maybe you, you saw this blog post about the real time machine learning for [00:07:00] machine learning models for fraud.
Rafael: ~So it's it's ~I guess everything tech and bank, you have this challenge, like understanding, blocking fraudsters in transactions online or like onboarding for new customers. Yeah, I had the opportunity to like, create the tools like doing retraining models automatically for, to understand and ate the models with like new patterns.
Rafael: And also improving this the performance of this this models. So I has, I have the, an interesting blog post about this.
Demetrios: Yeah, and that was all traditional ML, right? So you went from Yeah. With Siri you were working on traditional ML or you were doing the guardrails or you were doing software engineering.
Demetrios: Do you feel like the evolution was there between those two?
Rafael: Yeah, it's a kind of, I should say, traditional ML because, I don't know, when you say NLP, I believe it's traditional [00:08:00] ML, right? But Yeah, but I believe everything is connected, right? Probably in a level of maturity of AI you need to implement traditional ML.
Rafael: You need to go out of APIs and implement a kind of some models internally to have more control. So not only the LLMs to use open source, but also, I don't know, if you need guardrails, probably. The best way to go to traditional NLP, like a classifier using small language models and implement this, but the challenges is like around not only the AI, but the soft engineering, like how to deploy yeah, so I believe that everything is connected it's Easier right now to use, starting using AI, right?
Rafael: But I guess the challenge is the same of five four or five years ago. So if you need traditional ML, so the challenge is the same,
Demetrios: ~yeah. ~All right, now. Talk to me about what you've been learning about [00:09:00] recently.
Rafael: Yeah,
Rafael: so this is the, the interesting part of this talk.
Rafael: But I left Apple months ago to start something. And I started learning about AI security. How to make sure that. Every AI application, AI assistant is like it have, it's protected guys like fraudsters hackers and so on. So I started with the OWASP top thing for LLMs.
Rafael: So later I joined as a member to understand more the challenges. And it's a huge problem, to be honest everyone that I'm talking about, like at banks, larger fintechs. Healthcare companies, legal, startups, local, like everyone is like implementing, but it's a kind of challenging topic because you need to make sure you need to, some people like
Demetrios: So basically they want to get hacked and then they want to put a password manager in place.
Rafael: Yeah. So basically, so depends on the [00:10:00] future of the company. So I guess the banks and the financial market is more like aware of these risks. So they face the same challenges on the software side. So probably the AI, they are taking care of this.
Rafael: But that's an interesting part. I would say that you have two components of to make sure that to implement like a security standards on AI. So the first is You need to implement the AI following a zero trust approach. So make sure that you have the permissions the credentials and the integration is safe.
Rafael: But after that, you need to do like a kind of test, security test, so write it in. The same way that you have like a pain test in the software side. So to discover the vulnerabilities. Understanding these vulnerabilities is like you have the opportunity to fix on your implementation or [00:11:00] implement use guardrails.
Rafael: So the second part is a way to following the zero trust approach where you don't believe that you don't believe that the LME is 100 safe and we can implement these guardrails, not only for the security, but also let's say to prevent like a kind of hallucinations, a kind of off top and and so on.
Rafael: And the security part of the guardrails is very challenging because these guardrails. You should learn. You show this model, show know patterns. What's what's like a prompt ejection methods that have more effectiveness. So to block this this attempt, yeah, so our overview of the AI security, I would say like a starting on the AI applications.
Rafael: So have this conference.
Demetrios: So as you're looking at these different vectors that LLMs can be [00:12:00] basically corrupted on, and you're thinking about guardrails, you're also thinking about zero trust. First of all, I would love to understand what you mean by zero trust for the LLMs. What does that actually look like in practice?
Rafael: It's the same approach like when you implement a software so make sure that you don't have a, you have the access just for that function, for that access, for that resource. So in AI it's the same thing. So if you have a AI system that have access to APIs and other tools like making sure that the scope is very defined.
Rafael: The access the permission is like very defined, basically it's like an approach that the developers, the dev team can follow and, um, and and just in the implementation,
Demetrios: yep, that makes sense.
Rafael: It's
Demetrios: more like a mindset, and trying to limit the scope, which is a hard thing to do, or it's a very fine [00:13:00] line because with the LLMs, they generally perform better the more context you give them.
Demetrios: And so you want to give them as much context as possible, as long as it is the right permissions. And so trying to figure that out and trying to architect your system. In a way that is going to know who has permission to what can get a little tricky. Have you seen good ways of doing that?
Rafael: What do you mean about the permission here?
Demetrios: So I guess it would be like who has access to what data, who has access to what documents, that type of permission. And then, because you don't want an LLM to be able to, maybe it has access to every database or every channel in Slack. But if I'm asking a question, I don't have access to every channel in Slack, right?
Demetrios: And then all of a sudden, the LLM does, and so I'm able to find out [00:14:00] things that I am not privy to.
Rafael: Yeah, so this is a part of making sure defining what's the most important data to AI. So you don't have to share out of your database. With the AI and and maybe you don't need to share off out of your table data table for with AI ~and I guess ~so it's complex, but I guess is following this mindset and being more like a restrictive about the
Rafael: data.
Demetrios: Yeah. And then what? So we were talking about the different. attack vectors that you've been seeing and one of them was jailbreaking. One of them was red teaming. I think people have seen a lot of examples of those on the internet. Have you noticed different examples of this or ways what have you been thinking about when it comes to the security vulnerabilities and then making sure that guardrails work, like how do you architect the guardrails so that they are more effective?
Rafael: I guess [00:15:00] the most Challenging vulnerability of AI is the the prompt injection. So we can pass some context, do a kind of social engineering with AI. And then I don't know, some have access to some data. The system prompt will be like the first step to understand more dynamic of the AI or AI agents.
Rafael: And I believe like implementing the right guardrails to prevent that to have let's say malicious inputs that can affect your LLM or your AI agents, like the chain of LLMs, you'll You'd be like the best practices. So that's what I'm saying, seeing right now in the industry. So it's not the finally is a kind of mitigation.
Rafael: It's like a one. It's not 100%, but you can eliminate a good part of fraudsters or malicious users that can [00:16:00] try to expose your AI. So, and usually it is guardrail. So you implement this as like a real time is you are implementing the orchestration and it's working real time blocking some some patterns.
Rafael: And also if you have application that have access to the Internet. Or documents or data that you are not the owner, you should make sure that you can, you should like a filter and make sure that you don't have an indirect injection, so there's a example, a very interesting example from the conference called Black Hat of one guy trying to exploit The Microsoft Copilot that was configured to summarize emails.
Rafael: And basically, it it opened the vulnerability. It said an external person can send an email to this corporation. [00:17:00] And then, just a rough gap. So every email could be, like, summarized. And in the last step let's say replied with the AI agent, depending of the topic and this external user send an email with a prompt injection hiding bit by the human eyes, but like using coding and colors, but the I code in their stand and they starting sending phishing emails for every user.
Rafael: email in the user base. So it's an example. So if I have access to external world, you should make sure that you ~have ~don't trust on the prompt instructions. Don't put your guard rails on the system prompt. Use like independent models machine learning models to filter and detect this kind of let's say potters.
Rafael: So it is it's very challenging again, because it's not 100% you can like it's just a mitigation so you [00:18:00] can be like 100% safe. But
Demetrios: so let me see if I understood that black hat example, because that is a new one that I haven't heard before, where if someone is using copilot.
Demetrios: And it's in their email. Then you can get a phishing email that has white on white text. So white text on a white background that the human eye is not going to see. But the copilot will see and it will think, Oh cool, I can summarize this. And when it summarizes it, then that goes into the LLM and it does an indirect prompt injection which says basically, Send an email or summarize this email and send it back to XYZ.
Demetrios: And so if you don't have two things in place, One, just the human touch, that means to okay the send or return, reply email. Or [00:19:00] two, that you have some kind of an external independent critic that is judging what the copilot LLM call or agent is doing. And then it's saying, hey, look, here's all the information, here's what I did.
Demetrios: Is this correct? Here's the prompt. Here's what I've done. Here's what information I got. And then you have that judge that says yes or no, this is correct or this isn't. You might want to go back to step one. So you could get into a whole lot of trouble if that is not the case. What you're doing and how you're architecting it.
Demetrios: So I, I've seen that in a way, that type of design pattern where you have almost like this catch all critic at the end and you can throw everything into it. And you can say, looking at all of these steps, this is what I've done. What do you think? Is this the right way of doing it?
Rafael: Yeah, I have. The point is that [00:20:00] looking for the you have a judge a name is a judge.
Rafael: And the same vulnerability that you have in the LLM, you have on this LLM as a gadget. I guess the solution that I'm seeing, this is like using traditional machine learning classifiers. And that you don't have the same vulnerability of the LLM, is not LLM. And then we can mitigate don't have don't expose another channel, another another
Demetrios: space,
Demetrios: Okay, so you're saying don't even use another LLM because then you might just get this cascade of prompt injections. Yeah, don't use
Rafael: another LLM to filter malicious inputs because this, another LLM can suffrage and can follow these malicious inputs as
Demetrios: well. And what are some other ways that you've been seeing security vulnerabilities, or some real life examples of things that are happening in the wild?
Demetrios: I'm
Rafael: working with some some fintechs, understanding the challenges and like [00:21:00] validating these business tasks. So validating some ideas around the ICQs. ~And I saw some, ~there are some companies, the fintechs working building AI assistants on the WhatsApp.
Rafael: And these AI assistants can like can do transfers, bank transfers can pay bills and so on. So we have basically like AI assistants that have ~uh, ~access to the, your bank accounts. Or in other scenarios, you have your bank account managed by IEI on the WhatsApp.
Rafael: And yeah, the trick part is that ~you can you can I saw some ~I saw an interesting example recently ~some ~some hackers trying to override the amount of of the transfer, for example or override the account balance that you have in your account. So this is in real life.
Rafael: You actually don't have You don't have a lot of problems because you, this scenario that I'm explaining, [00:22:00] like these startups had ~a banking as a ~let's say infrastructure as a banking in the core. So we have another kind of validation, but that's why it's important to follow like the zero trust mindset because we can like defend and protect these malicious behaviors.
Rafael: on the software part, not the AI as well. But it's important because as well, so you can like, you have, you can have the AI trying to answer something that is following some bias and talking about competitors and generating reputational risk for the company. So
Demetrios: I'm trying to think how that looks and what the benefit would be of having.
Demetrios: me be able to do my banking through WhatsApp. And so I guess the benefit for a end user like myself, I'm using WhatsApp and I say, Oh, send 50 [00:23:00] bucks to Rafa, and then it knows which Rafa I'm going to send it to, and it will send it, or I tag Rafa as a person. And then it has your phone number and my contacts book, and it will then send it to you.
Demetrios: But. It feels like there's so many things that can go wrong there because it could just hallucinate a different Rafa or a different phone number or whatever. There's so many vulnerabilities that I wonder if the small lift. That we get from being able to interact with our bank account on WhatsApp is worth the gigantic risk that it brings.
Demetrios: Yeah, I guess that's the
Rafael: challenging of the developers, the ML engineers nowadays is so you should have, you should follow the. The tests have the [00:24:00] kind of AI quality, uh, constantly to make sure that it doesn't happen. But and I believe that you're learning how to create this kind of application, right?
Rafael: Like this mission creation application. ~But ~yeah, that's a very challenging, but I guess very interesting problem to to work on,
Demetrios: yeah, it does feel like we're still exploring where and how we are going to interact best with. AI assistants, AI agents, what the ways that the public prefers to use them.
Demetrios: Like you said, maybe for a banking app, it's not the best option, but potentially for a food app, it is the best. Because we can just text the food app and say, I'm hungry, I want something in the next 30 minutes. And it gives me five different suggestions and then I can just say, yeah, I go with suggestion two.
Demetrios: But if it's for a banking app and it's send 20 [00:25:00] and next thing you know, I sent 2, 000 because there was some kind of a mess up or a hallucination, then it's a lot more risk than if I get pizza or pita, ~I don't know. I believe that's like ~
Rafael: The customers all like to have this easier way to interact with money.
Rafael: So what I'm seeing is you have this I don't know, you have AI bank assistant, like you can talk with the, it's not an app. So, depends, I believe that there is a profile, specific customer profile for this for this application. And so it's a risk, it's riskier, but I don't know.
Rafael: I believe that the market is going from this side I don't know, looking for early stage fintechs building on this space, but later in the next few years, I don't know, like banks could have a new channel as the same of the mobile. So we have these legacy systems, banking systems, and then they.
Rafael: Try to follow the fintechs and the, and build the mobile application. Make sure that like the UX is nicer. Everything [00:26:00] is mo is motor. And maybe the next wave you'll be like having AI, helping people to like make a transfer, understanding~ this~ the expense and so on. It's a mission critical application more critical than asking for ordering food.
Rafael: But I, I believe that there is a value of the this application has, have value from the, some customer profiles,
Demetrios: yeah, and when you have worked on traditional fraud detection systems, and then you think about these new ways of fintechs having fraud vectors, how do you weigh these different areas and vulnerabilities and like, how do you think about traditional fraud and almost, I don't know what the new one would be called, but I'm going to coin it.
Demetrios: I'm going to say new age fraud. What do you look at? Is it different? Is it the same in your mind? How do you view those [00:27:00] two? I believe that with
Rafael: AI both of parts of a fraud, the fraudster and the, let's say the bank, it's like, it, they have superpowers. So it's the fraudster have like the tools to create a live NSFace and fake documents and so on.
Rafael: For them, it's It's becoming easier to to bypass a fraud system and and also internally inside a fraud a bank a fraud thing. I guess you have the tools, but it's a challenging moment, looking for this new kind of AI. But also we have the frow around the ai, like the interation, so it's a kind of frow as well.
Rafael: So basically you have a more vulnerability around, we have the Frow with superpowers. With AI superpowers trying to figure out like what is the vulnerability that they can, they could like follow and deep dive. And I don't know. It's a challenging moment. [00:28:00] If you look for the cyber security as well.
Rafael: So not only on the financial space, but the cyber security In general, so it is a challenging moment because they think you have looking, you have seen increase of incidents, security incidents, that's because the fraudsters, the hackers have more powers right now they can automate, speed up the process that would I spend one month with some minutes with
Demetrios: AI, yeah, and you're able to just test where the vulnerabilities are so much quicker.
Rafael: But I believe that I believe that the fraud, the same dynamics of a fraud in the, that you know right now, and all days. You see this dynamic in the AI as well. I guess the challenge would be this almost the same.
Rafael: So we have some fraudster trying to injection doing prompt injection and other side you have the guardrails or other reserves to protect them, learning the new patterns and so on. [00:29:00] I don't know, maybe human the look as you have in the fraud operation as well. Maybe. You should have this human in the loop in the AI operation as well.
Demetrios: Yeah, you make a great point about how many different levels there are that fraudsters can play at because it can just be falsifying documents or now these days you can clone voices with just like five seconds of voice and so you can do social fraud and get information by doing those like social hacks.
Demetrios: And then On the other hand, you have the LLMs themselves and how they can be vulnerable and you have the new systems that we're putting into place, like we talked about how, okay now we're talking to our bank through WhatsApp, there, there's a whole bunch of vulnerabilities that come along with that, or you have this superpower fraudster who can use AI to their advantage and [00:30:00] scan for security vulnerabilities or fraud vulnerabilities, are there specific Yeah.
Demetrios: design patterns that you've been seeing that are now becoming very common and something that is like table stakes for if you're trying to deal with AI fraud on any of these levels.
Rafael: So I believe that knowing by first hand, the vulnerability that a fraudster can attack, it's important. So doing the right teaming.
Rafael: Testing your system doing a pain test on software and Red Team in general. I guess it's it's a must have you should do it before so if you have a mission or critical obligation and so on, that's the best way. So you can have the first by first hand [00:31:00] knowing the vulnerabilities and then trying to protect this implementing the protection.
Rafael: So it's I believe it's it's approach that is becoming common. And also as well, like the guardrails, implementing the the correct guardrails for the obligations, like depending of the use case,~ the how ~depending of like the provision, the risks that you like
Rafael: to accept or not.
Demetrios: All right. So what other kind of things have you been seeing out there in the wild?
Rafael: ~I was reading example these days of ~AI agents actually there is a recent paper report from OWASP for about the vulnerabilities in the AI agents. And it's interesting because when you have a lot of, as many LLM is stuck with each other it's become like more complex to make sure to control.
Rafael: And you have a kind of the it's not, it's LM is not predictable and you have a lot of other working together. It's even ~not un even ~unpredictable. And and the problem is that in the AI agent's [00:32:00] obligation it can let's say with a combination of tokens let's say attack ~can ~can broke the first one, but the context, the information, then a kind of noise can follow through the next steps and also depending of the steps, if we have some documents data to access externally one simple let's say one prompt injection that was received in the first step can affect I don't know, the, the last steps, you know, but it's more like a general, it's not a example specific example, but it's interesting because ~You, uh, this ~you can have a kind of noisy going forward for each step.
Rafael: More lms, more yeah, as more LMS you have in your assistant, may more protection you should have to make sure that we don't have this information, just noise information going forward.
Demetrios: Yeah, it's almost like one prompt injection can get propagated throughout the whole chain of agent calls.
Demetrios: And you have to make [00:33:00] sure to architect your agents in a way that, like we were talking about before, potentially it's not even another LLM call critic at the end. It is something that is a bit stronger and more rules based type of guardrails that you have. I'm also looking at the OWASP. org page right now on their top 10 vulnerabilities.
Demetrios: And I find it fascinating that the first one is broken access control, because I think a lot of folks are trying to figure out how to get agents to act on their behalf. And this just feels like a wasp is calling it out early and saying, Hey, look, you can try and have the agent act on your behalf, but there's a lot of potential that you're going to have some access control issues here.
Demetrios: ~Yeah. ~
Rafael: Yeah, this is the challenge [00:34:00] because it's a balance between autonomies autonomy and The security, right? Yeah. And yeah, this is the challenging part. I don't know you have been seeing a lot of value from agents that do your steps by yourself, but I guess this is a part of the Zero Trust approach that you should follow, so make sure that, like you have the right permissions, it's restricted for that task, for that scope, and so on.
Demetrios: Yeah, these are things. You as a machine learning engineer going into this world of security, it feels like you had some experience when it comes to the fraud detection side of things, but now you're having to learn a lot about, I'm sure, the security side of things. And are there pieces that you wish you knew when you started going into this world?
Demetrios: What are some things that potentially you can just share as [00:35:00] a nugget of~ in ~Insider knowledge that took you a while to figure out, but now that you know you're very happy that you do. ~And ~I would even classify that a little bit further and say, what are some things you've learned in the last three months that you did not realize as a machine learning engineer, but now you see as like you're getting into the security world.
Rafael: Yeah. The first thing is that fraudsters and hackers, they are like They have a lot of creativity. So even as a machine learning engineer, you're building an AI LLM and so on, you don't know what's the vulnerabilities. So you know there is like a, oh, someone can use a base 64 and encoding, a kind of method A, B or C.
Rafael: But when you put it in the road, like a lot of new stuff because you didn't figure out like [00:36:00] That people can do this and~ it's it's~ it's very wild. And one of the most, the learns that I did recently is joining cyber security engineers with understanding, talk with these these guys.
Rafael: And learning their perspective about AI security. So it, because it's different as a AI machine learning engineer or AI research is different is that it's another, they have another opinions. Is, so let's say when you have, when you do random theming, they're not taking care of ing just the LLM, but the whole system, for example.
Rafael: Yeah. So it's important. And also about the protection, they graduate they're, they don't, is not looking just for the LLM, but the whole system as well. So they access the permission and so on. And yeah, I guess the insight that I have recently is they have another perspective it's more broader about the AI system [00:37:00] in general.
Rafael: ~And also, ~ as I'm looking and I'm researching about on the guardrails security guardrails. Data it's important. So as like we are building guardrails you should have data to understand what kind of patterns and methods that like hackers are using or should or could use to, to bypass AI or these guardrails.
Rafael: Data is important and if you look for just for the open source data that you have, it's not enough. Oh interesting.
Demetrios: Wait, tell me more about that. So it's like you should A, try to red team your own system, pen test your own system and thinking about the system as a whole, trying to break into it in any which way possible, not just the LLM.
Demetrios: But then if you're only doing that, you don't know what you don't know. And when you put something out live. It's going to get hit by the very creative hackers that they are. And so is it like [00:38:00] you should get a third party to try and pen test also? Or is it that you should, like, how do you get that data if you don't have it and you don't know that you don't know?
Rafael: Yeah, you should choose, I don't know, if you have an open source model that was trained on this data. Or you have, I dunno, a tool provider that work works on this, that collect the data filter this data to, to train like the gure models. ~And ~that's because this is very difficult to create the garage reels by internally entirely by itself.
Rafael: So it can use LMS as well, but it's it's vulnerable. Vulnerable as well. And also to create small language models, you should have data. So it's a challenge, it's like a cat and a hat. Yeah. But the challenge is this you should, what, the best way here to collect data is to work together with the cyber security team, the right team to understand [00:39:00] the patterns and then implement this, uh, this solution.
Demetrios: Yeah, it makes sense.
Rafael: And I believe I don't know, you, you have more. You see more open source open science and open source model around AI security, you don't have many tools and and models on this space. Let's say some models that you can, you could use as a guardrails and and make sure that everything's okay.
Rafael: So it's part of one, a path that I'm looking for, like to create more. A kind of open science around it.
Demetrios: Yeah. Almost like the trick there is that as soon as you put out a dataset that can help you understand where there are attack vulnerabilities, that dataset is no longer valuable because it needs to be behind closed doors.
Demetrios: It needs to be something that is almost [00:40:00] not shareable or not public.
Rafael: Yeah. Do mean the data, right?
Demetrios: Yeah.
Rafael: Yeah. But it's a challenging of collecting data, collecting new patterns, learning, and then updating it constantly. ~Huh. So yeah. ~
Demetrios: ~So creating that flywheel. ~
Rafael: ~Flywheel. ~
Demetrios: Yeah.
Demetrios: So like almost like this, as we did in traditional ML and as you did a ton on retraining risk models or sorry, fraud models, your. In a way, continuously collecting data and then retraining these LLMs, but you're not retraining them in the sense that you're not going in, you're not going to train or fine tune the LLM.
Demetrios: You're just updating your, maybe it's your prompt or maybe it is the LLM calls, something like that.
Rafael: Yeah. I believe that is a good point iterating around Learning and integrating and improving, right? Yeah, it's a must have process to you can inspire on the traditional [00:41:00] ML and and use it in the in the AI application, LLV applications.