Trust at Scale: Security and Governance for Open Source Models
speakers

Hudson Buzby is a Solution Engineer with an emphasis on MLOps, LLMOps, Big Data, and Distributed Systems, leveraging his expertise to help organizations optimize their machine learning operations and large language model deployments. His role involves providing technical solutions and guidance to enhance the efficiency and effectiveness of AI-driven projects.

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
SUMMARY
For better or for worse, machine learning has traditionally escaped the gaze of security and infrastructure teams, operating outside traditional DevOps practices and not always adhering to organizations' development or security standards. With the introduction of open source catalogs like HuggingFace and Ollama, a new standard has been established for locating, identifying, and deploying machine learning and AI models. But with this new standard comes a plethora of security, governance, and legal challenges that organizations need to address before they can comfortably allow developers to freely build and deploy ML/AI applications. In this conversation will discuss ways that enterprise scale organizations are addressing these challenges to safely and securely build these development environments.
TRANSCRIPT
Hudson Buzby [00:00:00]: There's a ton of value there that I think the industry is, like, just starting to explore, and how they diversify the requests more efficiently, use resources that starts with some form of a centralized gateway and some semblance of control as to how these requests are being made. And if everything is just sporadic and every team is using a different model and you have no idea, those kind of end goals won't be. You can't accomplish them.
Demetrios Brinkmann [00:00:35]: What's the obsession with Point Break?
Hudson Buzby [00:00:39]: It's just like one of the greatest movies ever. I don't think I need to. I don't think I need to defend Point Break by any means. You know, great, pure surfing scenes, just peak Patrick Sway, some great philosophical elements. Katherine Bigalow's kind of debut movie leading into a fantastic career. All the above. Love that movie. And Keanu.
Hudson Buzby [00:01:05]: Just anything with Keanu Reeves is well worth.
Demetrios Brinkmann [00:01:10]: Well worth endowment. Very true. The reason I say that is because you mentioned that you've seen point break over 283 times, I think. No, but who's counting it? Yeah. Yeah. Rough estimate. I think I would probably say I'm the same way for the Big Lebowski.
Hudson Buzby [00:01:28]: Oh, okay.
Demetrios Brinkmann [00:01:31]: I can't tell you how many times I quote that movie, like, weekly or I send people memes when. Or GIFs. When I'm like, yeah, that's just your opinion, man. So I feel you on that. That is great. So, anyway, man. Well, we're going to get into everything about the enterprise and their adoption of AI. I think it's timely because we've all seen the headline of, like, 95% of AI initiatives in the enterprise are doomed to fail.
Demetrios Brinkmann [00:02:08]: And it reminds me for anybody who's been in MLOps for longer than a few years, back in the day, when we said that stat that either Gardner said or somebody said once, and every single company quoted it on their website, like, 80% of all data science initiatives are do not make it into production. And so everybody put like, but if you use our tool, you will make it into production. That was kind of the marketing speak. Now, you. You remember that one?
Hudson Buzby [00:02:44]: I think we had it in. In one of our. Our intro slides for Quack, the. The previous product that I was working for. Yeah, absolutely. And it was true. I don't know. That was in general.
Hudson Buzby [00:02:57]: I talked to a lot of customers and prospects, and when you ask them, how is your ML environment, how is it going? Maybe 10% say it's in a good state. Like, we understand what we're doing. But more often than not, it's it's generally stitched together, barely holding on for dear life and. But that's kind of the, on the traditional side I think with gender of AI it's kind of switched into now everybody's in production with it or what they consider production, but it's a different kind of type of development that's taking place.
Demetrios Brinkmann [00:03:32]: Yeah, different kind of production, right. I was talking to a buddy who works at one of these big fintech companies and he said yeah, it's super easy. We just whenever we can outsource all of the infrastructure worries to OpenAI. Yeah.
Hudson Buzby [00:03:48]: And you know, I think for certain implementations, certain services, that works.
Demetrios Brinkmann [00:03:56]: But.
Hudson Buzby [00:03:58]: Yeah, I think there's kind of a fundamental shift happening between kind of how data science and data science teams were originally defined as kind of their own little sector of an organization that like, you know, maybe C suite threw some money at them and they're kind of experimenting with projects for a few years, getting a few ideas. Maybe they get something that works but it's not necessarily taken seriously as a part of the engineering or strictly enforced from like DevOps standards, infrastructure standards. And now you're seeing that change with again now they're getting things into production and now the rest of the organization, back end teams, front end teams, are starting to adopt AI services where there's a bit of a disconnect in terms of they have not traditionally treated their services in a production like way with tests, multiple development environments, all of those things. You ask data scientists that they're doing it, they kind of jump scare a little bit. But as all of these services start getting incorporated into like more traditional engineering environments, they're going to be need to be treated with like a higher level of scrutiny than what's kind of the norm currently.
Demetrios Brinkmann [00:05:12]: You know what I was thinking about this week and it was for the swamp up talk that I'm giving, I'm thinking like, oh, what is some stuff that I can mention has changed over the last year and I was trying to figure out what the different development cycles are when it comes to predictive ML versus this new generative AI workflows. And I'm qualifying generative AI as almost like image video generation. And then the other stuff I would say is like agentic development and the life cycle there because I think you probably have rag that fits in there somewhere. But most people are trying to do things agentic more than rag I think these days because RAG kind of hit. I don't know if it hit a plateau or if it just became something that was A whole lot of squeeze for the juice that you were getting. And so maybe you've seen different life cycles or you've seen different thoughts on, on that.
Hudson Buzby [00:06:20]: Yeah, I don't know. I think with a lot of the organizations I talk to, maybe it's just they're not throwing a. I think a lot of them are still doing rag like services, not throwing as much of a rag label on it where I almost think just additional context has become kind of the norm. Maybe that just kind of coincided with like the vector store kind of like boom and bust where everybody looking and building vector stores and then PGvector comes out and every database has some form of vector search now and that kind of got like democratized and open sourced. But yeah, I think in terms of like the development life cycle it really varies and I think that's kind of indicative of like the wide degree of skill technical acumen implementation in the data science industry. You know, from kind of low end of the spectrum of the kind of automl qlik devs not writing a line of code or kind of now looking into more of like the vibe code world to the opposite end of that. The OpenAI's and the anthropics who are fine tuning massive foundational models, all those environments look drastically different even in the middle there in terms of how teams are thinking about development or thinking about the safeguards. So I mean for a lot of organizations, you know, I don't think it's deploying into production, but I don't know if there's that many steps before getting into production.
Hudson Buzby [00:07:52]: I think a lot of it is a bit of trial and error and throwing up kind of agentic services, almost like a lambda function or a serverless function and while it's working great and then we'll figure it out when it breaks, but not necessarily developing like this massive framework around fail safes and testing and what happens when this tweak is made to a model and it changes?
Demetrios Brinkmann [00:08:16]: So it's almost like in your eyes the path to production has been shortcutted in a way.
Hudson Buzby [00:08:25]: Yeah. Or I just don't think that just because it's so easy to get things into production and it's still relatively new. I still, you know, like there aren't as so much in terms of standards that exist or are being developed that teams know to follow when it comes to development and deployment. I don't know if that's necessarily for, I mean to a degree I think there's a, a sector of the developers who have never been exposed to a more sophisticated kind of style of. Not sophisticated, it's a bad word, but.
Demetrios Brinkmann [00:08:57]: More.
Hudson Buzby [00:09:00]: Legacy form of software development. But I think something will emerge in terms of patterns and development environments that will look a bit more what we're used to in kind of the SDLC of the last five, 10 years. It's still new and it's still being developed and teams need to experience the pains of failing services and not having tests written until they, you know, decide that there's a, a better path.
Demetrios Brinkmann [00:09:34]: Yeah, wait, Gemini and Claude can't just write all my tests for me? What's going on here?
Hudson Buzby [00:09:40]: Yeah, they can. They just like somebody has to run those tests and then they have to look at the results of the test and then fix them. That's the, that's the part where, yeah, it takes, you know, just a, a little bit more effort that teams have to put in. But, you know, all of these. I think it's one thing that's consistent in computer science and software engineering is that foundations and principles reemerge in the same way that event sourcing was hot five years ago. And those are patterns that were developed and written in the 90s. And as more, you know, compute scales and new patterns become possible, then old ideas reemerge. So I'm sure.
Hudson Buzby [00:10:27]: Yeah, like I said, it's a temporary thing, but we're starting to see more organizations look at it like regular software, I would say.
Demetrios Brinkmann [00:10:35]: Yeah, because that begs the question, do you see these environments becoming a whole different thing or is it just bringing the old ways to what is new?
Hudson Buzby [00:10:46]: Yeah, and I think that's, you know, the, the product and team that I'm working on, that's kind of a big thesis that we have, is that they are many organizations. Because of what I was mentioning earlier, data science teams kind of existed in a vacuum. They weren't necessarily following the same, like DevOps standards or practices. Now as those AI services go from, you know, cool little feature that created a dashboard that somebody might look at to very core fundamental services, you're just not going to be able to have one, two different pipelines for managing the rest of your organization's software and then your ML applications. Doesn't make sense functionally. And then also from a security perspective, from a governance perspective, from a legal perspective, those can't exist as, you know, weekend projects that are running in production. They need to have the same level of scrutiny or we'll start to see, you know, huge vulnerabilities and hacks in production, which I'm, I'm sure is on the horizon either way. But I think, yeah, just kind of coming to a maturity point and realizing that they will need to be treated at the same level of scrutiny and we, it's just easier to have them managed under one process.
Demetrios Brinkmann [00:12:04]: And there's another piece that I wanted to hit on with you, which is just how adopting specifically open source LLMs is at the enterprise level and what that looks like, what you need, the kind of resources that you have to throw at it and where there's challenges that you've seen.
Hudson Buzby [00:12:24]: Yeah, it's really interesting and I would say kind of a complicated problem for a lot of enterprises, large companies. You know, I would say on the whole of the Fortune 500 or Big Tech companies that I talk to, Everybody is using OpenAI or Anthropic to some degree that that wave has passed. I would say most organizations went kind of full force into adopting those managed LLM providers to some degree. And I think all of them kind of see on the horizon, you know, either costs, accelerating distrust of providers from like a competitive landscape or a data privacy landscape or just a, you know, desire to have an open source shop and we want to find an open source alternative. So I think everybody is kind of forward looking too. And also the convergence I would say of like as open source models get better and better and kind of starting to reach that plateau. They I think long term want to have a large presence with open source LLMs in their environment and they want to allow their developers and teams to freely build. But the problem is most organizations can't just, you know, allow their teams to build Deep seek the day after it gets built.
Hudson Buzby [00:13:43]: In fact, most of those organizations, their main concern when Deep SEQ comes out is how do I block it from every single computer? And it's not just from, you know, the, the perspective that you would see on Twitter or the news of like, you know, the CCP stealing your data or something like that. It's more of like a, from a licensing perspective, from a governance perspective. You know, even with licensing, it's really interesting. We speak to companies all the time that spend, you know, a great deal of effort deciding like which licenses they can adhere to or follow. But it's hard to keep track of those licenses on hugging face, especially as you start pulling and playing around with random fine tunings of any of the big models that might flip that license or switch it. Now it's kind of a larger legal question as to do we follow the original model's license, we follow Joe Schmo's New license, all this like, you know, very, very mundane stuff that is not like the sexy vibe code stuff that you see on Twitter. But they're real problems that I think prohibit development right now. So organizations are, you know, and kind of leading into like the security element as well.
Hudson Buzby [00:14:57]: Hugging Face has just gotten like any other open source package provider flooded with a lot of bad models. Yeah, I work for JFrog. We have a, our own kind of dedicated research team and we've done a lot of work. We scan Hugging Face models several times a day and our results are often put into the Hugging Face console. And basically we're seeing like, I think the rate of models growing on Hugging Face is like five times month over month, year over year or something like that. And the rate of vulnerabilities is increasing by like seven times. And just, yeah, like I said, just like any other package provider going after misspellings or even going as far as just fine tuning models and including exploits, reverse shells. It's just like any other piece of software.
Hudson Buzby [00:15:47]: But the delivery mechanism through it seems so friendly and so easy to pull into your environment and easy to deploy into production. And that's kind of the problem here is that these large enterprises recognize all of this, but they need to have some degree of certainty with these open source models with the power and flexibility that they have, that they are not exposing their teams and they can safely allow them to build a little detour.
Demetrios Brinkmann [00:16:15]: I download a Hugging Face model and it is compromised. You're not talking just like it's data poisoned or there's something that is hiding in the latent space of the model. You're talking like the actual package that I pull from Hugging Face then is a virus that gets unleashed on my computer.
Hudson Buzby [00:16:36]: Could just be opening a shell and then you know, sending out your, your keys or something like that, sending out your GitHub key. That's one example. But yeah, that, and it comes, it could be in the, the code itself of the model or it could be in any of the packages that they're using. So it's important to I think scan for both because. Yeah, but again it's just like traditional software. You want to scan your Docker images, you want to scan your, your PIP packages, your conda environment, whatever, or poetry, whatever you're using, or jdm, any of them. But yeah, both that exist and we, we see instances and examples of both. So that's, yeah, I think that's one aspect is just, you know, many organizations, teams, they see the long term benefit of these models they see the cost benefits of these models but they need to shrink the, the marketplace a little bit.
Hudson Buzby [00:17:32]: Shrink what is available to maybe instead of you know, thousands, hundreds of thousands of models on Hugging Face to like maybe five, maybe 10 that there are certain work but still because of just generative AI they have the flexibility to be able to implement so many different workflows. But shrinking that down a bit and not letting developers basically go directly to a hugging face or olama, any of those, you know, large language model providers having a proxy or a middleman at least so that they're not going directly to the source just because it's, it's that easy to pull in something malicious.
Demetrios Brinkmann [00:18:08]: That is fascinating you mentioned because I have seen a fair amount of questions pop up on the MLOps community Slack like hey, what is an alternative to hugging face? My company doesn't let me use hugging face or and the way that it's generally framed in the slack questions is oh, my company's so old school that they won't even add support to hugging face. But now seeing it from your side it's like oh, maybe the company is actually kind of forward thinking in a way or they know what's up.
Hudson Buzby [00:18:40]: Yeah, it's a bit of both. Yeah, I'm sure there's, I mean one, there isn't really a place right now to get models out of, you know, besides kind of the. More like traditional machine learning libraries the, for the tensorflows, the pytorches of the world. You can get some models from GitHub. But yeah, hugging Face is the marketplace right now for it. And I agree it's both kind of antiquated and also, yeah it's safe. It makes sense when you think about it. And also again looking at like the, the developers that are pulling those models, you know, it's not a wild statement to say data scientists don't really care about security data folk.
Hudson Buzby [00:19:20]: It's not, you know, there are definitely some that are, that are concerned with that but it is not top of mind by any means. So yeah, I think guard rails are really important especially just as more and more models keep coming out.
Demetrios Brinkmann [00:19:35]: Yeah, it's like the double edged sword. What makes hugging Face so amazing is the ability for anybody to just go and add their model to hugging face. But then what makes it absolutely crazy as far as security vulnerabilities is anybody can add anything on there.
Hudson Buzby [00:19:55]: Yeah. And again it doesn't even have to be, you know, I feel like one of the most common forms of like entry for malicious packages is just misspelling, you know, pan with a O or something like that or S P, you know, just swapping out a letter and that's, that's all it takes where you're not even, you're not even realizing that you're pulling in. Something bad happens all the time. So yeah, trying to add some kind of a proxy, some kind of a curated list of models that have been reviewed, have been allowed do follow our organization's legal and governance and then also having a somewhat of a metadata record of these events taking place, keeping track of the model cards that are being used from hugging face. We call it like an ML bomb, but it's really just a metadata stamp that has all the environments that were used, all the safe Tensor files that were used. That's really, I would say purely forward thinking. Is there a ton of benefit or value to that today? No. But if a lawsuit comes out a year down the road and you need to figure out all of the relevant information around a training or a security leak happens and you need to go back and kind of figure out what happened, it's helpful to have those things in some form of a registry, a central location that you can easily audit versus having to go, you know, dive through developer notebooks and do that hunt.
Demetrios Brinkmann [00:21:34]: Yeah, and you're saying something where it's like, oh, New York Times is suing OpenAI type of lawsuit or not necessarily.
Hudson Buzby [00:21:42]: I mean that's the most extreme case. But yeah, it's I think, you know, just being prepared for those events and having the relevant metadata just given like the flexibility and the power of these models, how extendable they are. I think that's, that's one of the more kind of important governance perspectives you need to have is kind of like we don't know how this will be used. A new trend emerges in six months that completely throws out agentic AI and has a new pattern for development. But we do want that metadata to exist somewhere. And yeah, or like I said, when the licenses actually start getting enforced and then teams have to really care about that stuff or limit the types of models they're using. Similarly, another problem that we've heard from a lot of our customers and something that we're releasing like beta in the year. JFrog stores a lot of organizations artifacts and code.
Hudson Buzby [00:22:36]: We hear people basically saying for similar purposes, we need to keep track of governance, we need to keep track of licenses. We have no idea where generative AI is being used in our organization because teams have just PIP install OpenAI, start using the Models. So that's one thing that we've been working on is basically just going through your whole organization's code base and detecting all right, here are all the instances where this is being used. Let's generate some of that metadata for, you know, especially for regulated industries, if, you know, a law comes out next month that says, you know, any financial companies need to have like a specific list of auditing requirements related to any applications using generative AI or they ban generative AI with specific human use cases, whatever it might be. It's just important to know where those applications exist, where those packages are being used. Also for vulnerability purposes like, you know, more and more applications start using OpenAI SDK, Cloud SDK and something happens. You would want to know where everything has been distributed.
Demetrios Brinkmann [00:23:43]: Well, I know I've talked to folks about this when it comes to auditing how AI is being used throughout the workforce. And one person that I spoke to said, yeah, I just got done. I think they were in a data governance role and they were expecting to find maybe like 12 to 20 uses of generative AI and they found over 90 and they were like, oh my God, I would have never have guessed. But then you can also take it a step further and say, because you're, you're really talking about the stuff that you can see in the code base, right? Or the stuff that the developers are doing. But what about the marketers that are using that new shiny marketing tool that helps you write copy and you send all of your like positioning documents to that SaaS tool and that is using generative AI on the background? Like does that count? You know, from a governance perspective now there's these third party SaaS tools that are using AI and how do you qualify them too?
Hudson Buzby [00:24:52]: Yeah, I mean that's a great point in a whole other Pandora's box if that needs to be categorized and classified as well, yeah, we're focusing on purely just the code. But I mean the example that you highlighted is exactly what we're hearing and seeing. It's already out of the box, it's everywhere and it's only going to keep showing up everywhere. Not only, I mean from a development perspective everybody is using some form of code gen tools at this point and then the actual models themselves. I think, you know, we kind of anticipate like in three to five years it'll pretty much be any, any service will have some component of, of generative development. So yeah, just keeping track of everywhere it's being used. I think that's the, the important perspective. Having Some kind of a framework there.
Hudson Buzby [00:25:50]: And yeah, treating, treating it like any other, you know, application or service. Like, that's the, It's a low bar, but it's important. And I think it's, you know, you see these like, kind of very like diverging paths in, in development. Like, it almost feels like in many ways like the larger enterprise organizations I talk to are speaking like a different language of AI than the startups of the world or anything coming out of Y Combinator. Like, it's truly like two completely different pathways, two different focuses and two different concerns that I feel like often get blended together and people are looking for AI tools, ways to improve their AI developing experience. But yeah, it's just there's, there's two very different kind of pathways emerging that do require kind of two different sets of solutions and those solutions are ultimately serving like an AI purpose. But it's very different than, I think, the, like the ML ops landscape of five years ago where I think everybody was more or less kind of like on the same page to a degree.
Demetrios Brinkmann [00:27:03]: But all suffering together in trying to get those damn jupyter notebooks into production.
Hudson Buzby [00:27:09]: Yeah. Which is still a problem, actually. I think teams still have a hard time doing that, but you just don't have the same. I mean, the startups should have the same security concerns, but we've all been in a startup where everybody has AWS console access and we'll figure that out next round or after we get the next customer to sign. So it's. That's understandable, but it's. Yeah, it is two different sets of concerns.
Demetrios Brinkmann [00:27:39]: Yeah, yeah, it's almost like PMF trumps everything. And so you're building that organizational debt because you need to find PMF when you're at the startup and if you're at the enterprise, you really have to think about all of these key concerns. Since you have a lot to lose when you're at a startup, you don't have as much to lose. And so whatever, you can kind of play fast and loose.
Hudson Buzby [00:28:06]: Yeah. And the, the cost of an exploit or vulnerability is, you know, tremendous for enterprises when it comes to not only remediating the code, but like auditing, doing full investigations, autopsies of like every, you know, path that could have been touched by it. It's. And that is how enterprises are thinking when they're looking at different tools to adopt, different platforms to adopt, because, yeah, they have to be concerned.
Demetrios Brinkmann [00:28:33]: Yeah, it just sounds like a headache even in that glimpse that you've given me.
Hudson Buzby [00:28:39]: Yeah, I think one Other aspect on the open source that's interesting or a lot of a big motivation for a lot of organizations that want to adopt open source development is just on prem development or air gapped environments. That is still. And again this isn't in like the most pressing technology or bleeding edge but there's still so many organizations that have massive on prem clusters that will have massive on prem clusters that have moved stuff out of the cloud. It's not going in a way and it's kind of this neglected portion of the industry that doesn't get a ton of attention but is massive. So trying to find solutions for defense industry that yeah, they want to use open source models as well. They also can't in a lot of instances use an OpenAI use anthropic because the. Or maybe they can but like those are large decisions that are made at the top level, like heavily negotiated. They would love to start implementing gen AI a bit easier and they're trying to find ways to do that with open source.
Demetrios Brinkmann [00:29:47]: You did say something I wanted to go back to. It was on the. Yeah. Trying to just sniff out all the places that AI is being used and if you're in that governance position just how hard that is and how cumbersome it can be and then.
Hudson Buzby [00:30:07]: No, it, it's. It's a, it is a problem that I think, you know, previously you're not necessarily concerned with where is Pandas being used in my organization? Like that's an interesting question but like what value does that really give or dictate? Whereas you know, not only from like a commercial standpoint, you're negotiating a deal with OpenAI. I'm sure they can give you the, you know, top dollar amount but you want to understand like where it's being used. It's just so easy to implement any of these services that it. There isn't always a clear auditing log. You could, you know, there's tools that can go through your entire code base and GitHub and detect these types of things. But I think that's One reason why JFrog kind of has like a unique offering here in being able to. Because we store a lot of organizations binary codes so we can in our scanning processes go through and kind of detect where those packages are being used, where those libraries are imported and then provide a mechanism of creating guardrails or some kind of a administration system also providing kind of like a gateway type feature where you can.
Hudson Buzby [00:31:25]: Yeah. Limit access to models. Limit who's calling models. Yeah, I think again going in just like the delivery mechanism of how most of these models are being pulled into products and services. It makes it just very easy to sneak in.
Demetrios Brinkmann [00:31:41]: Yeah, especially because you can almost like go around it too. And so I'm doing it in my cli, but then I can just go to the website and maybe I'm using like five different ones. And so I have, I'm paying for Claude code right now. I'm also paying for Gemini and I'm, I've got cursor set up or I've got amp and all of these different ones I'm using in some way, shape or form. But maybe it's like not always the same. It's not just like a VS code plugin that I'm using. I am going to Gemini directly on their website and talking to it or ChatGPT or whatever it may be. And maybe we have the enterprise version or maybe it's just me using the free version.
Demetrios Brinkmann [00:32:30]: Hopefully like people aren't doing that, but you know that there's going to be people that a hundred percent are.
Hudson Buzby [00:32:37]: Yeah, I mean either to test out a new model, I'm sure that's, you know, probably a challenging point for a lot of teams. Like you're not getting something to work and then O5 comes out and you're like maybe it'll work. Yeah, ChatGPT5, not 05. Yeah, but that, yeah, it's a good point of like how do you control those gateways? Again, it ultimately comes down to like you need to block off access to the ChatGPT host in your organization. There has to be one gateway or one pathway. I think all of it does kind of come down to, yeah, some kind of a gateway that each organization manages, maintains on their own, has some form of a security, you know, user model where they're determining who can have access to certain models. Also from a cost perspective, like, you know, you don't want to be giving SQL analysts access to the PhD level models. We don't need that.
Hudson Buzby [00:33:34]: Maybe some. But yeah, I mean putting the, yeah, the costs are, that's whole other thing and I think there's a lot of really cool companies and products that are focusing on that. But yeah, I think just controlling the entry point is the kind of, the one of the main kind of theses that we have of what creates a framework that will be scalable, lasting and secure moving forward and can hopefully anticipate some of the different changes that might happen in the industry.
Demetrios Brinkmann [00:34:07]: But you're, you're almost like bottlenecking via the like if they're using the WI fi of work or if it's the company computer because I can still see people inadvertently being like oh damn, I can't like reach chat GPT on this computer. But if I bring my personal computer to work I can still get to it when I need those questions answered that it won't let me answer.
Hudson Buzby [00:34:33]: Yeah, it's. I mean, yeah, that, that's just highlighting the differences between. Yeah. Startups and enterprises like enterprises. No, that's, that's a very standard thing that like specific hosts. I mean even. Yeah. Back in the day like certain content websites are blocked from your work machine for obvious purposes.
Hudson Buzby [00:34:52]: It's all cap. It's all pretty easy to do. But yeah, that's, that's how we see it. And I think a lot of other. Also some cool tools that I've seen coming out of like you know, not even also like functionally creating like LLM gateways that will allow you to really easily swap out models, direct requests to cheaper models. If there's like an intermediate model that can determine this is a low complexity use case. Let's filter this to the small language model. There's a ton of value there that I think the industry is just starting to explore and how they diversify the requests more efficiently, use resources that starts with some form of a centralized gateway and some semblance of control as to how these requests are being made.
Hudson Buzby [00:35:50]: And if everything is just sporadic and every team is using a different model and you have no idea those kind of end goals won't be. You can't accomplish them.
Demetrios Brinkmann [00:35:59]: Fascinating. And building that gateway, because I've heard this idea of a gateway before a few times. I know there's a few companies that are doing it, but it does feel like that's going to give the, the company the most control of what is going out and coming in. But yeah, you have to think on various levels. You're not just thinking on the level of everything that's going to the research labs, but what's coming off of hugging face like you were saying. And then what are we running locally and are we setting up these smaller models and setting up like a platform team that's going to babysit those smaller models and that's us. Maybe it's on prem, maybe it's not. And how that looks is.
Demetrios Brinkmann [00:36:52]: Is completely different Also now I know that we mentioned before we hit record the whole evolution of quack and how quack. You've been there for a while. It was very ML Ops focused and now it's kind of grown, it joined JFrog. It now is security focused but you're also seeing this type of stuff like on the platform engineering level, the DevOps level. These folks are tasked with the AI platforms. In a way you have almost like the conglomerate of ML AI engineers, DevOps engineers, data engineers, and that's how you get the platform stood up. So what have you been seeing since this evolution took place of like mlops to platform maybe is the way that we could call it?
Hudson Buzby [00:37:54]: Yeah, and you know, kind of the segue that you mentioned of those teams managing those small language models, that is it's still ML ops. Like it's a different type of model, it's a different type of latency and throughput but at its core it's, you know, it, it is mlaps and it can fit into what organizations have already built for mlops. Maybe they're not spending as much time on like the experimentation portion of that because unless they're really fine tuning, they might just be constructing an image or an artifact that they're then going to turn into a service. But you still want all of those core components of MLOps sophisticated A B testing. That's something that we see a lot with machine learning. The persistence of all of the result data into some easy to access, easy format where developers, data scientists can query results, do drift analysis on those results. So all of that's still relevant and I would say needed with generative AI and with large language models it can fit into that pattern. I think it's just in this kind of trailing way.
Hudson Buzby [00:39:08]: Everybody kind of forgot about it for two or three years with the excitement of what they could do from a development perspective. And now as they get into maturing these models and services over time, auditing them, needing a system of record, they're kind of realizing we do need our LLM results persisted somewhere or even regulated industries. We have a healthcare customer that every single response with generative AI needs to be persisted somewhere. So I think in terms of the ML ops industry, I think it's a, you're deep in it. There's an insane number of like point solutions that I can point solutions platforms that I could cannot keep track of and they're, you know, the ones that kind of die or fade like they're getting replaced by similar solutions that are more generative AI focused. But you know, I think at the core all of these organizations are looking for like a platform, a standard that will allow them to easily defined workflows in a safe, reliable, repeatable way. I think that's why you see so much excitement around MCP. You know, MCP's not like revolutionary from a code perspective, but it's somebody like, you know, defining and stating is a standard that we're going to follow when it comes to the access of external data sources or data that we can feed to a language model, functionality.
Hudson Buzby [00:40:41]: That type of standardization, I think like the whole market is very hungry for and there's a lot of opportunity for products, platforms to kind of define that and to increase its use.
Demetrios Brinkmann [00:40:56]: That's such a good point that it is like it didn't change, it just changed in name in a way. Like the actual practice of MLOps didn't change too much. The models got maybe a little bit bigger for a minute. And so you had to do certain things, but if you were at a certain scale, you probably were already messing around with GPUs and you were messing around with like trying to make your, whatever feature creation, feature extraction faster. And you needed GPUs and you needed to get beefy GPUs. And then we work with networking, but then that became very popular because of the training of LLMs. And so we lost our way, or not necessarily lost our way. We just got a little bit distracted for a minute and then came back and we were like, oh yeah, so we still gotta deploy these models somewhere and we still gotta make sure that they don't drift and, and that we can monitor them.
Demetrios Brinkmann [00:41:59]: And so it's a hundred percent the same problem, just almost like in different clothing.
Hudson Buzby [00:42:08]: Yeah, yeah, completely. I mean, yeah. And just, you know, expanded functionality, like the models aren't as, as focused even with, you know, agentic workflows. Like there's still way more functionality and possibility than existed with your simple linear regression models. But yeah, it's, you know, maybe not the exact same use case of how they're monitoring the data, how they're monitoring or configuring these services for testing. But a lot of it is the same or it's just a different spin on it. And yeah, organizations I think that were, that did have mature, sophisticated kind of MLOps systems in place I think are definitely kind of leading the way in terms of what they can do with generative AI and that at least it's more of like a lasting pattern versus you know, throwing up the ad hoc services without a lot of foresight or thought.
Demetrios Brinkmann [00:43:03]: Yeah, and it's also certain things got like the time that you take on them disappeared in a way. Like you were saying before, you don't necessarily do all this Experimentation, unless potentially you're fine tuning. You're not doing data curation unless you're maybe fine tuning. But what you are spending more time on is figuring out the prompting and figuring out how that that fits into things. And so you're spending more time in different areas. But they're, they're like, they rhyme in that regard.
Hudson Buzby [00:43:41]: Yeah. And that's, that's again going back to kind of like the concept of a gateway. Like, you know, this whole conversation is kind of focused on like everything that you can do, like pre deployment.
Demetrios Brinkmann [00:43:51]: Yeah.
Hudson Buzby [00:43:52]: Pre prompt interaction to make sure your code is, you know, safely securely deployed. And then you have the whole, you know, everything. You just mentioned creating guardrails around content prompt injections, making sure the service is doing what you want. That's a whole other kind of separate topic that a lot of the industry is focused on. I think that also kind of converges with we should have a centralized gateway where we can administer these different types of guardrails or we can administer these security policies that we want to enforce. Again, all of it just comes down to organization, standardization, some method, some system that needs to be in place versus Wild west ad hoc 5 code. Just throw services up. It works quickly in the short term, but to have the more mature systems there needs to be.
Hudson Buzby [00:44:48]: Yeah. Some form of a, some form of a gate in place.
Demetrios Brinkmann [00:44:52]: But you're not seeing enterprises on the services side like the infrastructure side of oh hey, we're going to spin up GPUs or we're going to go in contract a new GPU cloud because we can't get the GPUs for the price that we want. Your as JFrog, the gateways that you're talking about, you don't go into that side of the house, do you?
Hudson Buzby [00:45:16]: We are going to. Yeah. Our kind of. The intent of our kind of rebuilt a bit of our MLOps platform from the quack days to JFrogmill is the new release term. But we're available on all three clouds now and we'll be available on prem in a few months from now. But the idea would be that you could train on a GPU cluster locally and then deploy into the cloud, which is a common pattern that I hear from a lot of customers. You know, there's been enough boom cycles with GPUs where many organizations have bought GPUs and they kind of are sitting there and it is cost effective or cheaper for them to go that way rather than with on demand GPUs in the cloud. But I think most organizations at like an enterprise scale very.
Hudson Buzby [00:46:04]: It's rare to be limited to a single cloud at this point. Yeah, I think the, yeah, the default now is usually a combination of clouds, a combination of on prem and cloud and there is a desire to be able to seamlessly go through each of those environments without having to completely re engineer your machine learning philosophies through machine learning practices. Being able to yes, switch seamlessly is huge for a lot of teams.
Demetrios Brinkmann [00:46:34]: What other features and ways have you looked at expanding or evolving the what was once the quack platform into the now JFrog platform or the ML capabilities inside of JFrog or ML AI if we're going to get technical.
Hudson Buzby [00:46:55]: Yeah, so I mean a lot of the core MLOps platform still exists. We still do end to end machine learning ops. It's a really nice easy way to kind of deploy models and services and then everything that we've kind of what a lot of this conversation has been about focusing on the security aspect. When you trigger machine learning training or a build automatically scanning it, applying you know, organizational policies around, these packages are blocked or these packages are out of date. We only want to use open source packages that have 2,000 stars on GitHub or have been updated in the last six weeks or four weeks, whatever that might be. That's kind of what the larger JFrog platform and ecosystem is about is basically managing open source development at scale for enterprises and organizations, taking that and applying it to machine learning kind of automatically. And that's the idea because we know the data scientists, machine learning engineers, they don't really want to worry about security, slap it on them or you know, put the, the scans right in their face, make it easy for them to fix. And then we've kind of combining products with JFrog has a question cool product called Curation that basically kind of does what I just mentioned like creates, you can do like rule based systems as to what types of packages developers can pull in from open source languages and repositories.
Hudson Buzby [00:48:22]: Combining that with like a really easy way to deploy LLMs, you know, we've pre configured the model, it has the LLM already built into it where the, you know, the IT manager, security manager, DevOps manager at an organization can go and click I want LLAMA four, I want deepseek, I want Quen. And those models then become available to developers to either pull the artifact or deploy it as a service as well as they can control who's actually able to make requests to that model. And so we're calling that AI Catalog. And that's kind of. It also includes a gateway and some of the detection pieces that I was mentioning earlier about like, okay, here are all the services that are using OpenAI. Like do you want to block them? Is this okay? Or here's everything using llama4. Like we've determined that we can't use llama4 at an organization level. Now you can block those services from being redeployed and alert teams that they need to fix it.
Hudson Buzby [00:49:25]: That's I would say, yeah. Where the product is kind of landed. But we still do all the same mlops stuff that we've done before and have plenty of, you know, some of our older customers still using the platform the same way.
Demetrios Brinkmann [00:49:38]: Yeah. So it's like you thought about all of these different models that are the popular models and how can you just put your stamp of approval on them so that folks have the trust if like I'm going to use these. I know that it is not the Facebook Llama model which I remember back in the day that people were downloading and they were like, no, it was actually the meta Llama model that you had to get because the Facebook one was. That was the virus.
Hudson Buzby [00:50:09]: Yeah. We do some suggestions like that within the platform. Like we offer kind of like some statistics that show like the reliability or the viability of the model and then we also provide like again the kind of rules based framework where you can sort out if that model based on a security scan or based on the number of downloads we can filter it so that it only includes the models that you want your team or you can just go and whitelist select them and that works too. But yeah, creating a management system. And this is also. We spent the first year kind of after getting acquired into JFrog kind of just talking to a lot of customers and here's what we built. What do you want? And this was the kind of the biggest request that we got from so many customers that were basically saying we need a management framework.
Demetrios Brinkmann [00:51:07]: Just like a. Basically we need a governance layer.
Hudson Buzby [00:51:10]: Yeah, I mean both governance as well as just we need to have something in place. We cannot let it, let this be the wild west and teams just using whatever packages and models they want.
Demetrios Brinkmann [00:51:23]: Well, I imagine most folks didn't even have any visibility.
Hudson Buzby [00:51:27]: None. Yeah. And that's their initial reaction is when they don't have visibility, like the friend that you mentioned, they shut off hugging face access and that's a temporary fix to kind of delaying progress. Right. I think you're ultimately going to be doing a lot of generative development in your organization, you need to find a solution for it. You can't just block it off. So that's, yeah, the. And that's many organizations that we talked to that were like, we'd love to get into this type of development, but until we have something nobody's building.
Demetrios Brinkmann [00:52:02]: Wow.
Hudson Buzby [00:52:04]: Yeah. And then some more on this. More like, obviously a lot of them are using OpenAI still, but I think in more limited scope than like what their kind of envision would be.
Demetrios Brinkmann [00:52:19]: Yeah, yeah, it's the using it right now again, trying to find like the PMF for the products that they're putting out with AI. And even if they are at that enterprise level, I think most folks are trying to experiment with how they're going to fully leverage AI. And I know that I have some friends at an enterprise and they are constantly saying, like, I just want to know if it can work before I do anything. So the fastest way to know if it works is to use the best models out there. Because if I can't get it to work with the best models, then it doesn't matter if I can get like the 7B model running internally, fully like secure and everything, because it's not going to work. I pretty much guarantee that if it doesn't work on the best model, it's not going to work on the small model.
Hudson Buzby [00:53:18]: Yeah, I mean, yeah, that just highlights that you're not going to stop developers from using the best, the newest best model even. Even when as that improvement gets more and more marginal. If you're stuck on a problem and the new model might solve it. Yeah, why not?
Demetrios Brinkmann [00:53:36]: Yeah, yeah. And, and it is that though, like, to what you were saying, I try and get it working, but then once it's working and I understand that it can be done with AI, that's when I'm going to try and optimize and I'm going to make it fast and I'm going to try and make it cheap and I'm going to figure out all the ways to make this bulletproof. But let's like almost try to figure out how fast we can get it up and running. And does it work? All right, cool. It looks like I'm kind of there. And then you have that last mile problem of it kind of works. I think it could work maybe. And then you spend the last 10% of the time, which is way longer than that initial 90% of the time, trying to figure out the evals or figure out what if you need to do all these different prompt tricks or what you need to do to.
Hudson Buzby [00:54:28]: Arguing over the tone of one word. Yeah, trying just exactly. I don't know if it's 90, 10% at that point. That might be like 80% of the time. Yeah, it's, it's all that development work. And again, that's, I think there's so many tools to help with that aspect of development. There's so many, like go to Twitter and it's just flooded with examples of people and tricks. That stuff is, has been discussed, is being discussed and will continue to be.
Hudson Buzby [00:54:59]: But it is kind of like after that point, what do you do after you get something that's working? How do we turn this into a safe production service that actually is going to last? But I do think what you mentioned in terms of like portability, going back to that gateway concept, I think that is a, should be a very significant focus for any organization. Just how quickly, you know, functionally can we sub out this model from a cost perspective, from a latency perspective, all of those concerns, because we're just going to see more and more models that are suitable for these use cases, that have lower resources, that have lower parameter counts and are cheaper. Everybody's spending right now pretty wildly on AI, but that won't always be the case and there will be a time when shrinking occurs and people do have to get more optimized.
Demetrios Brinkmann [00:55:55]: So I'm fascinated by the whole product discovery that you did and you mentioned when you had that first year almost like of incorporating or acclimating into J. Frog. And so you got to go and talk with all their customers and they have a ton of enterprise customers and figure out like, what's the story? What's the biggest pain point that folks are looking for in this cross section of security, reliability, machine learning, AI. Can you walk me through a few of the stories of like, what were some ideas that you didn't go with, what were some things, or how did you eventually land on what you decided to build?
Hudson Buzby [00:56:43]: Yeah, it is a really interesting kind of place to be in. I think JFrog is almost as ubiquitous as like GitHub. So we work with over 80% of the Fortune 500 or something. So like 10,000 customers, there's access to way more accounts. So you go from the startup perspective where like you get a Fortune 500 company on the phone, you're stoked, that is your whole week and what your focus is on. And now you're able to have those like, more meaningful conversations. And I think it's helpful when you have a really. Our product was great to begin with and like we were saying, it's, you know, the mlops solution still isn't really solved and people are still struggling with it.
Hudson Buzby [00:57:28]: So those pain points still resonated with a lot of these enterprises. A lot of them had kind of solved MLOPs to a certain degree. They had built something because they had the resources over the course of a few years to spend on it to make it functional. But yeah, again it was just kind of just asking organizations like what you know, what do you need, what are you help? What would be helpful, what would allow you to build more. And yeah, this was the. It wasn't too difficult, I would say or like that much of a. A menu of options that we had. It was really get this platform out on all the cloud providers and having a flexible on prem solution.
Hudson Buzby [00:58:13]: That's definitely one of the major things we heard. And then making open source models easier deploy and ensuring more trust in them. One thing that we did play around more with was I think on kind of the concepts a little bit before the acquisition we were focused on kind of creating like, more like rag tracing elements or kind of like complex DAG style agentic pipelines. That was one thing that we started building. I think it all did kind of. We've had this idea of a gateway though for quite a while. We just weren't looking at it as a startup from a security perspective. It was more around prop management, content, guardrails, those types of.
Hudson Buzby [00:59:01]: That type of functionality which we still want to implement down the road. Now we're just starting from kind of the security layer in the enterprise layer.