Sign in or Join the community to continue

MLSecOps is Fundamental to Robust AISPM

Posted Aug 30, 2024 | Views 540

# MLSecOps

# AISPM

# Protect AI

Share

speakers

Sean Morgan

Chief Architect @ Protect AI

Sean Morgan is the Chief Architect at Protect AI. In prior roles, he's led production AIML deployments in the semiconductor industry, evaluated adversarial machine learning defenses for DARPA research programs, and most recently scaled customers on interactive machine learning solutions at AWS. In his free time, Sean is an active open-source contributor and maintainer and is the special interest group lead for TensorFlow Addons.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

MLSecOps, which is the practice of integrating security practices into the AIML lifecycle (think infusing MLOps with DevSecOps practices), is a critical part of any team’s AI Security Posture Management. In this talk, we’ll discuss how to threat model realistic AIML security risks, how you can measure your organization’s AI Security Posture, and most importantly how you can improve that security posture through the use of MLSecOps.

+ Read More

TRANSCRIPT

Sean Morgan [00:00:00]: I'm Sean Morgan, chief architect at protect AI. I take my coffee black, sometimes Americano, but black coffee works just as well.

Demetrios [00:00:09]: What is going on, MLOps community? We are back for another edition of the Mlops podcast. As usual, I am your host, Demetrios, and talking with Sean today, we're talking all things security. Not just that boring type of security, not that DevOps type of security. We're talking ML ops type of security. And Sean's coming at us. There's a whole other community out there. It's like our sister community, the ML Sec ops community. So if you vibe with anything that's going on here, or you want to know how you can be more well-versed in your security when dealing with machine learning and AI, I highly recommend you go and check out the ML Sec ops community.

Demetrios [00:00:54]: Feels like they may have gotten a little inspiration from somewhere familiar on the name, but I love it. I appreciate it. We tend to talk about security and security vulnerabilities in the MLops community. These folks talk about it all day and all night, and he brought up one security vulnerability around Ray that, I'm not going to spoil anything right now. I'll let him talk about it. But if you're using Rey, just make sure you ain't got that shadow ray going on. Let's get into this conversation. And as usual, if you enjoy this episode, it would mean the world to me if you can share it with just one friend.

Demetrios [00:01:38]: All right, real quick, I want to tell you about our virtual conference that's coming up on September 12. This time we are going against the grain and we are doing it all about data engineering for ML and Aihdeene. You're not going to hear rag talks, but you are going to hear very valuable talks. We've got some incredible guests and speakers lined up. You know how we do it for these virtual conferences. It's going to be a blast. Check it out. Right now you can go to home dot mlops.com munity and register.

Demetrios [00:02:12]: Let's get back into the show. So basically, what I was just saying is I was at Kubecon in Paris a few months ago, and it felt like every other booth or every third booth was security related. And for some reason, when we get into mlops, as opposed to DevOps, which is very kubecon focused, but this year, as a little aside, it was everything AI, because it's the hottest thing out there right now. Right. But for some reason, when we get into mlops, it is a little bit thrown on the wayside I would say, and I don't know why that is. You've been talking to a ton of people and you are spreading the good word about ML security. Or there's the devsecops and then there's the Mlsecops, I would say. Why do you think that is? Why do you think people forget about security for the ML op side of things?

Sean Morgan [00:03:18]: Yeah, I mean, there's probably multiple factors there. I think the pace of innovation in AI and machine learning kind of dictates that you build towards accomplishing things quickly, and security is often one of the first casualties when that is a requirement of the organization. Unfortunately, you end up paying the price later down the line. But I certainly agree with the sentiment that in MLOps security is not ingrained in the same way that it is in DevOps. And that's part of the message that we're trying to share with the ML Sec ops community is about how you can integrate security into every phase of the model development so that at the end of the process you're not simply trying to slap Social Security on a model that's about to get shift into production.

Demetrios [00:04:02]: Yeah, that's so good. And what are some ways that you've seen things go haywire?

Sean Morgan [00:04:09]: Yeah, I mean, there's all different ways. So there are the kind of traditional threats that exist in DevOps, but also exists in ML secops specifically. I'm thinking about the supply chain here. There are all of the library dependencies that you're pulling in machine learning, and AI is foundationally built on the open source libraries that we all use. But there's more with machine learning. There is the datasets that you're pulling in and the foundational models that you're pulling in. And you gather these all from repositories that, that make it very challenging for the end user to really attest to who made them or how they were made, or scans for vulnerabilities, and it just increases the attack surface. So it actually, in practice is more difficult to secure the machine learning lifecycle, but can be critically just as important if not more important.

Demetrios [00:05:00]: Yeah, it feels like going back to your first point, it slows things down considerably. If all of a sudden now you're thinking about, cool, I have x number of datasets and how can I ensure that all of these datasets are not going to make me regret using them? And nine times out of ten, or like 9.5 times out of ten, it's really nothing malicious at all. And you may not even know this is another thing that I would love for you to speak to because I know that there are people doing this right now in the LLM training world where they're buying common crawl domains and then they are putting malicious stuff on there because they know that the LLMs are going to be trained with those common crawl websites. And so you've got to think about how do I even combat this? And how do I, like you said, produce business value, but at the same time don't commit a huge faux pas and do something that could really screw stuff up.

Sean Morgan [00:06:19]: Yeah, for sure. And I would say datasets, kind of, as you're calling out here, are some of the most challenging ones to confidently say that this data set is secure, because as you mentioned, these are terabytes in size, often more. And to say that there's no poison data or no one that snuck something into common crawl is quite difficult to do. But at a minimum, what you should do is you should attest to how you're creating your models and what data sets are going into them in a way that allows you to remediate the problem if it becomes, it surfaces in literature that, hey, this data set actually had a problem. Are you able to go find which models, you know, you created with that and take appropriate actions? Is, is just kind of the first step of a long, you know, number of layers of defense that you should likely do within ML sec ops.

Demetrios [00:07:05]: Yeah, that lineage, it, and it's so funny because it's not, it's not top of mind until something goes wrong. And it's the same thing. I remember my first job in tech was a, I was working for a password management company, and the only people that wanted to buy password management tools really were folks who had problems with password sharing that went wrong. It's not like, it's not, you know, the normal people were like, yeah, share my passwords, or I just use 123456 and it's all good. I've never been hacked. It's the ones who have been hacked or who have had things leak that have problems, and then they recognize the value of security. And this secops and the other piece on the data, which I find fascinating, is that it could be totally malignant for so long and then out of nowhere it can come up. If there's something hidden in that poisonous data or that toxic data, whatever you want to call it.

Demetrios [00:08:09]: I don't know what the technical term is for, like tainted data that you've trained your ML models or AI models on. It might come up later and you don't recognize it, and then you have to go sifting through that. That just feels like a nightmare.

Sean Morgan [00:08:26]: Yeah, it's certainly a risk for enterprises, but as well as hobbyists that are doing machine learning. So with respect to what you can really do about it, there's different avenues of defense that you can put in. I know there was a talk with ads Dawson from co here. There are runtime protections that you can put at the inference endpoints and things along those lines. But the core principle behind ML sec Ops is that there's no one fix to this problem. There's a lifecycle that happens as machine learning developing is done. And at every step within that process, security needs to be front and center or at least a consideration. And how you're going about obtaining datasets or moving models productions or red teaming them.

Sean Morgan [00:09:11]: Various different steps throughout that.

Demetrios [00:09:15]: Yeah, and you did mention there's this whole life cycle. How do you see it? And what are different security measures you could take for each step along the way?

Sean Morgan [00:09:29]: Yeah, so I think that there is the model builders, which often can compose a different team than the ones that actually bring it to production. But within that area, there's all types of security considerations that you should make. There is the scanning of static code, the same way you would do in DevOps, but also taking that a step farther and scanning your notebooks for API keys or anything else that may have that type of ability to leak this type of sensitive information when you commit it to a repository. From there, there is the pulling in a foundational models, which is critical within LLM ops today. And there's a lot of different nuance involved in that. So their first step would be pulling from a organization or a person that you could attest to actually built it. And that's something I'd like to talk about a little bit later, is around open source tooling in order to sign your model to say that meta created this model or some other organization. Then the second aspect of that is how are you pulling in the model itself? When we look at models that are up on hugging face, it takes some time for the actual model's architecture to be added to say the transformers library.

Sean Morgan [00:10:52]: When these state of the art models come out often, if you're using the transformers library, you'll see the use remote code true function parameter when you pull a new model. And you need to understand what exactly are you doing there. You're basically saying there is a repository that exists out there. I am going to load that python module, whatever it is, because I trust this source that's acceptable if you know the source is actually the one that put it out and it's not some type of attack where they're pretending to be someone else or something along those lines. We've seen this in the wild. There's good examples. We could talk about people pretending to be different organizations on ugly face. I think that the call out here is that the same types of attacks that exist on NPM or PYPI, they exist on model repositories too.

Sean Morgan [00:11:43]: You need to be cognizant of that and aware throughout the process.

Demetrios [00:11:48]: Okay, so that's the new AI world. You've piqued my interest. I want to go down this tangent of people impersonating companies or different organizations, and then I also want to go down the road of what do people need to think about with the traditional ML route? First, let's get into the juicy stuff. What happened?

Sean Morgan [00:12:13]: Yeah, so, I mean, you'll see this actively on hugging face. There's published research from a number of sources here, and majority of them are security researchers kind of poking holes into the model that people are using. And so if you go on ugging phase right now as of the recording today, and you look up meta llama 3.1, you'll see the official model that comes from meta. But if you go and you look up Facebook llama, the organization, instead of metalama, that's actually a security researcher that's put up a separate model that's been patched for a security issue to prove this type of attack. And so often we hear from users that just say, well, if you're pulling established models, what are you really concerned about from an attack vector here? Um, it's not that simple. There's very, um, you know, sophisticated and clever ideas you can do in order to get people to, you know, copy code on stack overflow, pull the wrong model, and execute some bit of arbitrary code on each system.

Demetrios [00:13:13]: Dude, we're lucky that it's the nice folks that are trying to poke holes in this. I imagine if you get the black hats, there's probably, they're probably out there and you probably know stories, uh, more than I do. But it's funny that it's like we're all so nice to each other, we're just poking holes in it just to show that there are holes. And when you get the black hats coming in, that's when we know. Yeah, we really, and again, like, that's the hard part about security is that it feels like people don't take it seriously. Until they've got their company data held at ransom or until things are leaked, and then you have to go back and say, okay, we should probably invest in this. And even if they do invest in it, they probably aren't investing enough. Look like Snowflake just got all their data leaked, right.

Demetrios [00:14:06]: That was fairly recently, and that caused whole world a hurt for a lot of companies. And so it's one of those ones where whatever you're investing in security, it's not enough.

Sean Morgan [00:14:19]: And I strongly agree. It's one of those things where it's easy to kick the can down the road, but it's really about investing in the products and services that you're utilizing for security, but also within your team and upskilling them and educating them about the risks and how to manage that and how to mitigate that and to implement these processes throughout the process. And so earlier, to your question about where do you implement these, Seth, these steps, we talked about model development, but the next part of that is you're moving these into model registries and then you're packaging them up into inference containers or using inference servers. And the threats continue to persist throughout the process.

Demetrios [00:14:59]: What kind of threats are there?

Sean Morgan [00:15:00]: Yeah. So for inference servers in particular, what you are exposing is API endpoints for users that you think should be using it, as well as users that shouldn't be. If you look at some of the recent CVE's around PyTorch inference server or mlops, slightly different, but also has API endpoints, there is this ability to compromise the server itself. You can do things such as writing payloads within the server, but also network traversal. I'm able to compromise the server that's feeding the model to the user, and then I can move across, and I can move across to say the user profile that's able to make the models, which by definition has access to a treasure trove of data in order to do that. So these Personas that do get targeted by machine learning attack vectors have very elevated privileges generally, and it makes them a very juicy target for the black hats that you're talking about.

Demetrios [00:16:02]: So you got any good stories?

Sean Morgan [00:16:05]: So there's been several models within hugging face that we've been reported and taken down. You can kind of look at some of the threat research. There was a pretty widespread number of organizations that were being named, squatted on, which I talked about before, everything from SpaceX 23 andme where they were uploading these type of models. And specifically, I believe the 23 andme one had the ability to basically scoop your environment credentials and push them to a server. So I mean, what payload these models can run is pretty much up to the creator of them. At the end of the day. These are binary artifacts that are meant to execute something. There's nuance to the file formats that enable different things, but yeah, it's an active threat in the wild for sure.

Demetrios [00:16:52]: Well it also, what you're talking about feels a lot like the end of this supply chain that you had mentioned. And so it's, all right, cool, we've got the model now we're going to go and do stuff with that model and put it out there. And there's the other side, which if you're looking at the data side of things as you were talking about. All right, where am I getting this data? How do I know that it's good? What are some other pieces around that? Because maybe you're importing frameworks. I think I heard about something where they were saying one of the frameworks that you could import was tainted or people were downloading something off of Pypy and it turned out to be totally malicious.

Sean Morgan [00:17:39]: Yeah. So there was one, might have been a year ago or so for Pytorch actually, where there was a commit branch where you could see that they were going to be, you know, using another import as, as a dependency of Pytorch, but they hadn't yet claimed the name for it. Uh, so somebody actually saw that ahead of time. They saw that they were going to be adding a new dependency and then they actually went up onto, uh, PI. PI, and they, and they claimed the name and they put up a malicious package there. Uh, luckily, uh, I think it only affected the nightly builds of Pytorch. Um, but I mean, nonetheless, plenty of people install the nightliest and it was caught before a full release of Pytorch had happened. But yeah, there's really a lot of ways to make this happen.

Sean Morgan [00:18:25]: And in a space when open source is so critical for AI machine learning development, that's a great thing that really enables this pace of progress, but we have to make sure that we secure it. It's not just the security team's job to do that, it's throughout the lifecycle of being cognizant of that.

Demetrios [00:18:43]: Yeah. And how do you generally go about it? Because I can imagine you and all the other security folks are seen as the party poopers and you want to like just rain on our parade all the time, especially as you so aptly mentioned, you kind of got to move fast. You want to be able to prove business value with your ML or AI projects. And a lot of times your ML projects are in exploration mode. It's not super clear if you've got that ROI there, you got to put it out and you got to test like you want to get something out as quick as possible to know if it actually moves the needle on metrics. But then you have to juxtapose it or you have to fight against the security team who's saying, you can't just put that out there. What are you doing? So how do you navigate those waters of like pulling people back or letting them run freely?

Sean Morgan [00:19:45]: Yeah, it's a great question. I think it's funny that, you know, the assumption is that I'm from the security world. So generally I'm always a data scientist. I used to work in semiconductors, I used to do research on ML algorithms. So I know the pain, I know the pain of what you're doing is experimental work, or uh, perhaps, you know, you finally really land that model that, that is driving that business value. Uh, and then, you know, for someone to come in and be like, well, what was your security process here? Like, can you, you know, give me a rundown of everything that went into this model is definitely cumbersome. Uh, and so what I think we need to do is meet in the middle on these type of tools that, uh, for example, don't require, um, you know, much action or change from the end users. Uh, so there are, you know, different capabilities where you can have basically artifact stores that have been vetted or a walled garden of models that have already been approved by the security team, perhaps.

Sean Morgan [00:20:43]: Or you can put proxies between the end users and hugging face so that the end user, the machine learning developers process doesn't need to change. And instead you can still give those kind of guarantees with zero code changes to the, to the general process. So that's one example. But I think meeting in the middle is really a key part on how to get these teams to collaborate well.

Demetrios [00:21:06]: And are you seeing folks like yourself understand or be better suited for positions like this because you have the knowledge of what the life cycle looks like from a data scientist perspective, from a, maybe it's a ML platform engineer or someone who understands what it takes to productionize it, but they also understand the creation of the models aspect. Or is there a different Persona, like a PM that can step into this role and, and constantly be recognizing where there's threats, where there's vulnerabilities or if more questions need to be asked because it's it's less about like. Or from what I understand right, is it's less about we can't do that, it's more about hey, let's make sure that we daughter eyes and cross our t's before we do that.

Sean Morgan [00:22:05]: Yeah, 100%. And so to the point of the question there, you know, what Persona do I think is best to run this? I think it varies. I think you kind of aptly call out that someone with a, you know, data science or machine learning engineering background certainly has the context into how iterative this process can be. Where the data sources are coming from, the models are coming from, and that enables them to really understand what points in the mlops lifecycle should the security be implemented in the flip side of that is something that we called out is it's not just unique attack vectors, adversarial inputs, prompt injections. There's also the traditional supply chain and the libraries and other containers and things that you're pulling in, in which case a traditional appsec Persona may have a really good grasp there. So we've seen everything. There's a growing push to these AI security directors or team leads that own this. Ultimately I think it's about assigning ownership to someone, and that's someone owning how they're going to go about, you know, facilitating their team to do this.

Demetrios [00:23:13]: Do you have an area or a piece of that supply chain that tends to give the most headache?

Sean Morgan [00:23:23]: Yeah. So I think that today the easiest issue to call out is those model artifacts that, you know, when you typically work in an enterprise and you want to pull down like a binary artifact, there's generally a process for it. It usually gets scanned, it gets approved by it, but because of the pace of machine learning, you generally just go to on face or another site who is performing a very valuable service here. But at the end of the day you need to understand what you're doing. You're grabbing a binary artifact and you're pulling it into your enterprise and running it. So that's one that I would say causes the most issues, most apparent issues for sure. But when it comes to processes like verifying that models are being signed and things like that, I think that that's coming on the horizon. We're starting to see this growth within open source tooling and then OSSF and Linux foundation where people are really pushing towards making this an easier process for you to sign your models, to verify that they're coming from the source that you think it is.

Sean Morgan [00:24:26]: And that's where I see a current problem that may be alleviated soon. So very much looking forward to that.

Demetrios [00:24:33]: Are there other open source tooling out there that can help scan for vulnerabilities or help you, like we were talking about with the data, if you recognize or helping that whole data process to rip out different data sources if need be, and clean the data sources or any other mix of making sure that you have all those I's dotted and t's crossed like we were talking about before.

Sean Morgan [00:25:04]: Yeah, so I'll specifically call out kind of the model artifacts and then the trained models and what you can kind of scan there. There are tooling, some really nice tooling within pickle scan, which is designed to scan the default pytorch serialization format. There is model scan that also supports tensorflow formats, carriage formats that have these other known vulnerabilities. And then as you go about creating the models, there's open source tools like Girac and LLM guard where you can actually scan these models for vulnerabilities, look for things like hallucinations or other impacts that perhaps poison data training had on your models. When it comes to scanning datasets itself, it is very much an academic problem. I used to work on a research program for DARPA around this type of aspect. It is very difficult, it is very difficult to say that one sample within a terabyte file is going to coerce false information or fake news or something along those lines. I think to your point is that where the headache is within the data set? It could be, it could be down the line.

Sean Morgan [00:26:10]: We need to make sure that we're keeping that data lineage and pulling from trusted sources and making sure that copyrights are in compliance and all these other issues that happen not just from poison data, but just from the sources that you're pulling from in order to get a model with sufficient accuracy.

Demetrios [00:26:27]: Okay, so I know you guys are doing a bunch of stuff at Mlsecops and you've created a whole community around it. I think, in fact, that's how we met. I met somebody from the community in the databricks conference last year, and also our good friend of the pod and the Mlops community, Diego Oppenheimer. He's very vocal about security vulnerabilities when it comes to ML, and he had mentioned you all to me. And so it's cool that you're on here. What are you doing exactly, though, in the community? What are some things that if I went there and I wanted to poke around, what would I find?

Sean Morgan [00:27:11]: Yeah, I mean, you would find training material. So we have certificates that you can take and actually validate that your team is aware of these types of issues and they understand exactly how these can be addressed. We also have blogs, publications, and community resources such as slack channels where you can actually connect with other members of the community that are taking this kind of security first mindset, and there's a lot to be learned there. Happy to hear the words getting around.

Demetrios [00:27:41]: That's super cool. What are some stories? I know that with the mlops community I get lots of stories of people asking questions or learning stuff or getting jobs, whatever it may be. What are some stories that you've got from the community? And maybe that's them sharing that they were able to stop a security vulnerability or learn about something. What does that look like?

Sean Morgan [00:28:06]: Yeah, some of my favorite sharings within the community are when people are actually creating new libraries for these type of defenses. You mentioned earlier what kind of open source tooling is in there. One of the defenses for LLMs is known as Vigil LLM. I saw a post first for that maybe six months ago in the ML Ops Slack channel, and since then I've been following it actively and it's a really great program amongst the shared research and ideas that people are sharing within those channels. I'm a huge fan of people sharing their open source work and socializing that.

Demetrios [00:28:43]: Yeah, that's super cool to see that, especially if folks are sharing the open source work and then they get others from the community who jump on board and then you can all go and create something really cool together.

Sean Morgan [00:28:58]: Exactly. Yeah. And that's kind of the goal. So yeah, sure.

Demetrios [00:29:01]: Nice. And have you seen where someone completely drops the ball? Like what are some potential consequences if you just throw all inhibition to the wind and you say, yeah, like this security stuff is cool and all, but we don't need it, and then it comes back to bite him in the ass.

Sean Morgan [00:29:23]: Yeah. So I think probably the best example of that I have is within Ray. So Ray distributed processing machine learning framework, great framework, has a user interface, and that user interface itself has no authentication. It is very easy to spin up within your local environment and easy to think that that is simply a tool for you and nobody else. What we've seen is there was several reported vulnerabilities with Raydhennae and then since then there's now active reports of ray vulnerabilities where people are scanning the Internet to find these user interfaces which they're able to attack and do privilege escalation and everything along those lines. So I'd say issues like that where you think that the framework itself has hundreds of maintainers, everybody's paying attention to the security and you can trust it. They always have some other aspect of them that likely is getting looked at less, such as a user interface or a web interface. So it's very important that we look holistically at what these type of libraries are that you're pulling in and how you're using them.

Demetrios [00:30:27]: Dude. And Ray's got like $100 million company behind it for sure. And remember, the last round that anyscale did was really big and that is wild. I had no idea. So anybody out there that's using Raydhe, look at your user interface and what can they do? What is the patch?

Sean Morgan [00:30:47]: Yeah, so if you read the documentation thoroughly, they will specifically call out how you can set up the user interface so that it can only be contacted by your local host. The problem is not everybody read the documentation thoroughly. And so many people use Ray that this actually manifests as an open problem and actually was known as the first attack on AI workloads that is being actively exploited in the wild. It's known as Shadow Ray. You can do a bit of research on your own, but there's been several publications about it. So yeah, it's not just small libraries that don't have the funding behind it. It can be the large critical infrastructure pieces as well. And again, it's important that every team member, everyone who may want to spin up a user interface, is aware of what type of security risks can exist.

Demetrios [00:31:39]: I will give it to him. Shadow Ray is a great name.

Sean Morgan [00:31:42]: It is.

Demetrios [00:31:43]: I truly like that. That might, that might have something to it. We might need to make a shirt or something for that shadow ray. So what are some other ones? What are some like, are there common vulnerabilities or flaws in other projects or open source tools that I may be using besides the ones that we were talking about with like the models being impersonated or the model organizations for sure.

Sean Morgan [00:32:07]: So ML flow is probably one of the most actively. Heavily used, yeah, heavily used and also one of the most actively call now systems, especially the open source version for vulnerabilities. Now the databricks team is fantastic about patching those vulnerabilities and get them updated. But if you just search for CVE's for ML Flow, you'll see a number of different ways in which you can say log a model to the registry and instead actually write a file to their disk. And then once the file is written to the disk, when they load up a bash shell, you can actually make a remote code shell for the attacker, and at that point you're completely compromised. So there's numerous ML flow vulnerabilities, specifically on the server itself. That's where we see a lot of these coming from. When you have these exposed endpoints, that is where hackers really make their money, is when there's an easy means to access something.

Sean Morgan [00:33:06]: It helps the machine learning developer, but it also helps the malicious actors.

Demetrios [00:33:10]: It's that low hanging fruit we always love to talk about. Yup, for sure, these hackers are no different than us. It's like, how much can I, or how little can I work and how much can I get from that little amount of work? Right?

Sean Morgan [00:33:25]: Yeah, absolutely. I mean, they're not in the business of doing more work for the same result. So yeah, they're looking for the ways to attack it.

Demetrios [00:33:33]: So explain that a little more because it feels like it could be generalized where it's on the server and it's the endpoints on the server. So what else, or how could we generalize that just to make it so that in case maybe we're not using ML flow, but we're using another tool and we want to make sure to cover for those vulnerabilities also?

Sean Morgan [00:33:54]: Yeah, so I would say one of the primary things you need to do is continuous or frequent patching these vulnerabilities do get found. The company that I work for actually runs a bump bounty program where we pay researchers to find these types of vulnerabilities, but as well as the maintainers to fix them. And so what you'll see is that these often are identified and they do get fixed. But updating your ML flow server is not something that is a common practice throughout the week. Once these vulnerabilities are known, once the CVE's are published, everybody in the world knows about it or can know about it. So at that point you have a vulnerable server that the entire black hat community knows is vulnerable, and getting those servers patched is critical. And so you want to make sure that you're doing that appropriately.

Demetrios [00:34:43]: Yeah, I guess the key is that if you find the vulnerabilities, if the payout for that is nice, then you're not going to be thinking, all right, well, let me just do something illegal, and try and get more money than I would get from exposing this vulnerability.

Sean Morgan [00:35:04]: Yeah, for sure. I mean, I think bug bounty programs have had a measurable impact on a number of people that may have went down the wrong pathway in life and found a way to make a very sizable living, you know, reporting these as white hat hackers. So I think for sure that's been a big impact.

Demetrios [00:35:20]: Yeah, I'm glad. I have no idea how to even start with this, because then it's just like one less thing that I have to worry about. I would get, I would go down rabbit holes and it would not be good. So that's glad you did.

Sean Morgan [00:35:35]: A temptation, for sure.

Demetrios [00:35:36]: Yeah, it is. It is definitely a temptation. We've been talking a lot about model vulnerabilities and what happens when the model's out there, and that is very machine learning. But then there's just like the data stuff that's happening before this. And I can imagine there's a ton of vulnerabilities there which may or may not play into the ML part of things, but they're also just as important, right?

Sean Morgan [00:36:04]: Yes, for sure. I mean, you know, the models are a function of the data that trains them. When you look at the data ops space and you look for distributed scanning of personally identifiable information or bits of samples of information that may be incorrect, it absolutely leads into the rest of the process. I would say dataops is within the Mlsecops domain. It does go end to end. It does have specific challenges with respect to scale and how you are going about protecting that. But that's something that I'm really excited to see where that goes in the next five years or so, because simply legal entities that are now coming after model foundational model training and fine tuned models for copyright infringement and stuff are starting to put this emphasis that, hey, the data lineage that you put into it, as well as the process that you go through inventing it, is coming into the spotlight and nothing makes people pay attention more than either illegal challenges or security breaches.

Demetrios [00:37:10]: So good. That is so true. Oh man. So any, before we go, like any of your own stories on how you potentially learned the hard way and what you learned from that, first of all, and then what happened.

Sean Morgan [00:37:32]: Yeah, so one thing that I would say back when I used to consider security aspect with respect to the data that I was pulling in or the models that I was using, was when it came to inference time, I would typically be looking for inference servers or even packaged containers that were ready for this process. At the time, I wasn't very much interested in the distributed inference that goes on within enterprises. What I found is that those containers that you pull from cloud vendors or other may not be as secure as you think. What I would recommend is at any time that you can to scan those containers, four vulnerabilities within the libraries contained in them. You often don't need a novel system for this. There's a lot of open source tooling like trivia and others that can scan containers. And you need to understand that aspect of the integration as well. And so in my past, I had worked as someone who assisted with enterprise deployments.

Demetrios [00:38:36]: So then, to recap, I think one of the biggest things and the easiest things that we can start doing right now if we are not, is scan everything. Yeah.

Sean Morgan [00:38:47]: Take zero trust standpoint.

Demetrios [00:38:48]: Yeah, that feels like if you're not doing that, and maybe you think you are, I feel like there's probably a lot of times where, because as you were saying, there's not really clear ownership on this. Sometimes you think it's been scanned, potentially, or you think that, yeah, after I hand this off, it will get scanned. That's what the security team is for or whatever. And so a lot of times I can imagine, again, this is my little fairy tale fantasyland, which we learned earlier isn't the most accurate because I thought that the data was the biggest headache. But in my little fairy tale fantasy land, it feels like it can be organizational issues just as much as the technical issues. And if you're not communicating and if you don't have that owner saying, this is what we need to do every time, or here's the process we need to go through, then you can let things slip through the cracks, certainly.

Sean Morgan [00:39:51]: Yeah. You know, it's very easy to assume that it's somebody else's job. And so when you have that clear owner, they can not only take responsibility for making sure that the process is vetted and running smoothly, but also to make sure that it's not too much of a burden that the pipelines that are being utilized for deployment or registry models into your model registry include these scans without the need for every single user to have to go through the process of scanning it themselves. Just simply when they do the defined process, it happens behind the scenes. And I think that's ultimately where you want to move your organization to.

Demetrios [00:40:28]: So number one rule, scan everything. Number two rule, well, number one rule, have somebody that owns this. Number two rule, scan everything. Create the processes to scan everything. Number three rule, if you're going to use hugging face, maybe think about what were you saying? Use proxies for it.

Sean Morgan [00:40:46]: That could be a better, that can do the scanning for you at intercepted traffic and hugging phase does a great job of scanning models. They're continuously improving that they have antivirus scans, they have the pickle scan that I mentioned earlier. So beyond just looking for solutions to solve this for you, go to the hugging face page itself. Look at the banners, check for how the scans were performed and what the results were, which is often a step that most people kind of forget.

Demetrios [00:41:16]: Yeah. Oh, I can't tell you how many times I've downloaded something, didn't look at anything, just like, oh, yeah, this is it. That's what I want. All right, cool.

Sean Morgan [00:41:24]: Boom. Happy place.

Demetrios [00:41:25]: Yes. In hindsight, it is incredible that nothing has happened to me. Let's knock on wood. And that's what I imagine a lot of people that are listening are thinking that same thing, too.

Sean Morgan [00:41:41]: Yeah, for sure. And, you know, security is one of those things that you have to get out of. You don't want to wait until something bad happens, and you want to find a way that it isn't overly burdensome. So there's processes that we talk about on ML sec ops about how you can integrate this security without it being overly burdensome.

Demetrios [00:41:59]: Yeah. Yeah. So last key takeaway. If you're using ray, make sure that you are not doing. What was it called? I can't remember the name.

Sean Morgan [00:42:07]: Shadow ray. Yeah. Having.

Demetrios [00:42:09]: What a good name. How did I forget that? That is so true. Do not do Shadow ray. If you are using Ray, just look up Shadow ray and make sure that you're protecting against that. Dude. This has been great. Sean. I appreciate you coming on here.

Sean Morgan [00:42:20]: Yeah, of course. Thank you very much for having me to beatrizen.

+ Read More

Watch More

AI Operations Without Fundamental Engineering Discipline

Posted Jul 23, 2024 | Views 527

# Anti-AI hype

# AI Operations

# Hermit Tech

Synthetic Data for Robust LLM Application Evaluation

Posted Oct 24, 2023 | Views 739

# Synthetic Data

# Application Evaluation

# ExplodingGradients

Building Robust Autonomous Conversational Agents with Simulation Techniques from Self Driving // Brooke Hopkins // Agents in Production

Posted Nov 26, 2024 | Views 1.3K

# Conversational Agents

# Coval

# Agents in Production