Enabling Defense Missions with Local LLMs
Gerred Dillon is an open-source developer and Unicorn Engineer at Defense Unicorns. With 13 years of background in AI/ML, edge compute, distributed systems, and infrastructure engineering, he focuses on open-source generative AI development in local, sensitive, and air-gapped environments. Prior to Defense Unicorns, Dillon was the co-founder and CTO of a tech startup oriented around WebAssembly and edge AI/data solutions. An avid open-source developer, he has contributed directly to Cloud Native Computing Foundation projects including Kubernetes and KUDO.
Despite the power of best-in-class Large Language Models and Generative AI, the use of hosted APIs for models in highly sensitive, and regulated environments is being challenged. From fine-tuning and embedding sensitive data to creating small models in edge and air-gapped environments, users in these regulated environments need production-ready ways to run and observe models. Beyond that, both the software and the models that are being deployed need to have the Authorization to Operate in every one of these environments. LeapfrogAI is an open-source, open-contribution set of tools designed to meet the challenging requirements of these environments. Come learn about what makes these environments so rigorous, the work going on in enabling Defense missions to use Generative AI safely and successfully and hear more about how LeapfrogAI enables these missions.
And next up we have Jared with a beard that I think can compete with one of our speakers yesterday. I don't know if you were, if you saw yesterday, our speaker, um, oh my gosh, I'm spacing his name. Chris Bruso. I'll take, yeah, I'll take the competition with it. That sounds, sounds good. Okay. Yeah. Not only did he have a beard that could almost compete with yours, he also tied in a metaphor for the beard in his talk.
So I don't know. We'll have to send that to you later, but now, now I'm still unprepared. Yeah, I know. You're gonna be great. Okay, so I don't wanna eat into your time. Um, without further ado, let me pull up your slides. Whoops. All right. Thanks so much. No thank you. Uh, Hey everyone, uh, thanks for coming to this.
Uh, I'm talking a little bit about, uh, what local LLMs and LMS in production means for missions and defense and in public spaces. So, uh, without further ado, uh, just a quick intro. Uh, I'm a unicorn engineer at a company called Defense Unicorns, where we work on very critical missions for, uh, uh, defense and in other arenas.
So I'm an open source developer. Uh, I've done a bit startups. Currently I work on building generative AI for for critical defense missions. So let's talk about a little little bit of what that means. And we're gonna start off with the fact that regulated environments are incredibly challenging to work in.
So, The past few months, weeks has been a shell shock for these environments where just yesterday it was revealed that a massive global cyber attack happened. Data was exfiltrated from multiple states, mul multiple financial institutions, multiple in industrial components, as well as the US government.
And this is a story that has played out multiple times in the past couple years as the amount of data, the amount of gravity towards these systems and the importance of the system grows. So that's led to a very interesting question of what does generative ai, what do large language models mean for federal defense sorts of missions?
So we're talking about from one side, we're talking about like a lot of regulation coming from. Some of the largest creators of ai, current LMS out there, but also declarations of, uh, from, from a government side of how do we figure this out? How do we acquire this, how do we manage the risk of it? And going down to having industry experts and other other people come in and talk about these, uh, in, in the in, in spaces.
Because I think there's a recognition that solving this is critical. And we'll talk a little bit about why that is, but we can't just ignore this problem from these environments. So let's start a little bit, what is a regulated environment, right? Like we should define this term because that could mean that that could be easily be a weasel war.
It could mean nothing. It could mean a lot. The way we look at them at defense unicorns is they're first and foremost, very mission purposed environments. They're suited out to define and, and achieve a set of goals that is, Widely applicable or narrowly applicable to, to some other domain, right? And often these are very restricted, regulated, otherwise isolated.
We'll talk a little bit more about that, but, uh, in, in some way, these environments are restricted to who can access them, why, how, who can, you can develop on them. They're often very egress at ingress controlled, and it's easy to say air gap, but often these can be completely aired. These can be completely on the edge in, in, in various theaters or they just may have a lot of controls and restrictions around them, and so that, that often even has a fine meaning in and of itself.
I think that one of the most important things though, that these are often very high value targets, as we see with the cybersecurity issues up to a nation state level from an attacker and a defense perspective. So in that we're often talking about areas like finance. We're talking about healthcare, healthcare data and access to healthcare data and leaks of that is very critical, as is financial, financial data.
We're talking about civilian government data, and we're also talking about defense data that goes up into very sensitive information that creates very serious security issues for nation states if that were to leak. So what's the problem? Which is going and using these APIs, the these tools in these environments.
Well, first off, there's often a perceived resistance to innovation. And whether it's not true or not is not part of this talk. But if you were to ask a, uh, often I find in conversations these spaces are, are, are so heavily burdened by that regulation. They just don't want to innovate. They're living 20 years behind.
I find that often not true, but sometimes that's in the eye of the beholder. What is, what is absolute true is there's very high reg requirements for deployment and operation in these environments, and we'll talk a little bit about authorization to operate. We'll talk about some of the data needs. We'll talk about Iron Bank, but there are steps you can't just deploy.
You have to actually go through steps in order for this to go run even on a very granular level. And that can often differ who you speak to, the processes that you need to go through with that. Alongside that, you need to go through a lot of artifact control requirements. And uh, uh, most recently people have been dealing with that with containers.
But if you look at large language models and other generative AI model weights as an artifact, those control requirements will continue to become more and more important as well. Now, a lot of this may look like, okay, well, well, we just won't innovate there. And I would pause it. Not only can innovation not stop in this arena, That it is a, a critical issue if we do start to challenge the fact that innovation does need to happen in these spaces.
And I really wanna focus in on this thought that innovation and the availabilities models does not change anything. And, and pushing for the innovation is, especially in these regulated environments, in these defense missions, is very critical, uh, as part as of, of a partnership with open source spaces and, and spaces relevant to the talk that I'm talking about here today.
So I'm gonna refer over to a talk that, that Mark Andreessen put out just recently about why AI will save the world. And it has, it's a very long treat. Treatise around the, a lot of the fear, uncertainty and doubt around the regulation of smaller entities and smaller models and innovation in the, in the large language model and generative AI space.
And one, that one that stands out and uh, is the. The real effects for stifling innovation in an arena where others will continue to move forward with that innovation and in fact have their own views on it. And I won't read this quote out. It's worth reading this blog post to, to go and, and form your own, own opinions.
But the, the reality is that AI is a very competitive environment. That often has a and, and right now we believe that AI is at a very critical inflection point. If you look at all the innovations daily, it's almost a full-time job just to keep out up with what's going on in, in open source in the industry, much less the needs that are being created in real time from all that demand and all that supply.
So one thing I would, I would pause it in that article, is that power projection in AI is real. And so what does power projection mean? Well, Power and, and how does that apply to generative ai? Well, power projection in that generative AI space is really about changing the relative cost of conflicts fought across domains.
So what we're, what the real benefit of generative AI in these missions is an opportunity to make conflict domains far less expensive for these environments to work in. Occur, a far higher resource cost to adversary actors initiating and engaging in conflict. And if you look at that across domains, we're not just talking about connect domains, but cyber attacks.
We're talking about every domain in which nation, states and others interact with each other. And not taking advantage of that is creates an imbalance that, that we're working against. So, How do we start to enable this sort of innovation? What, what's, what's our next steps to get there? Right. And we're looking at it as we need to actually be in a community space.
So local generative AI solutions are, are critical for this. And one thing I really loved from the modular talk at the last LMS and production part one was seeing that small local LMS can really excel at specific tasks. And I'm speaking to the Stanford, I believe was a 3 billion model that that really outperformed at the time others on healthcare, uh, q and a questions.
Right. We're now in an environment where we're looking at new LMS or generative models a day. I think right now I'm working with Wizard 40 B for code that that's really trying to like build out, uh, on, and, and two days before that I was looking at STAR Code or Pro. So the rate of change in LMS is massive and the tools need to be as accessible as the rate of release of these models themselves.
Not only that, There was a problem early on where, okay, where do I get all the, the chat and instruction tuned, uh, tuned, uh, data for these models, right? And so we're starting to see these big corpus of open source chat instruction tuned, uh, fine data. And recently, I think in the past couple days, uh, just saw an entire fine tuning that was based off of effectively what open AI's new function tune modeling is doing.
So this is continuing to grow. It's not stopping right. From that perspective, then it's critical to be part of the community, right? Let's not build two classes of tools that nobody else uses. Let's enhance that while also gain the benefits of, uh, in both directions. So, okay. How do we get to that innovation?
Well, we wanna build open source tools designed for mission oriented environments, highly regulated environments that needs to be multimodal. And the important part about there is not about, uh, just LLMs, but speech to text, image, image, text image. All of these provide value in different ways, and the availability of those is critically important to be able to create a complete solution.
And we'll share some demos. I'll share some demos in a few, uh, demonstrating that and then having ML ops tooling around this. I'm talking tool. To go deploy and deploy in a variety of environments, benefits, everyone working in in, in the open source, as well as these environments that have these restrictions.
But I believe they have to have parity to existing open source tooling as well as proprietary APIs. So being able to go drop in llama index lang chain. Guardrails, uh, Microsoft's guidance tool. I th I believe that is critically important to be able to drive adoption there with small lms and being able to drop that right in ready to go, be it whoever's API is, is backing that.
So today I am, I guess, introducing formally for the first talk, but we've been open source for a little while. Uh, something that defense unicorns has been working on called Leapfrog ai and what LEAP AI is. Open source, open contribution suite of APIs and tools for everything from deploying, running, performing day two operations on everything you need in one package to do generative AI in these very adverse, highly regulated environments.
So out of the gates, we designed leapfrog to be ready for egress, limited in ingress limited environments. So anything air gaps are highly controlled. Um, Regulated. So these, these areas need complete ownership of their data. They can't put tho that data in other places sensitive in the sense that access to the APIs and the data itself is very limited, both from and, and maybe different for who is developing, who's creating the tools and who is using the tools.
And secure. So we, we were coming outta the gates ready to go with, uh, very hardened containers. We're looking towards authorization to operate so that, uh, that various arenas can go pick them up and go use them. And, and, and that is kind of built into the core of what Leapfrog is doing. So where we at today, uh, we are sitting with an open AI compatible api.
I'll show that in a few minutes. Um, We have l l M support. So actually building models out is the, is is really just creating new G r PC backends, which we've built a harness and tool to be able to do. Uh, and I'll show a little bit about, uh, about that. Uh, and we have a community space. We support both G R PC and HTP for backends for these models.
Um, and speaking of HGP models, we have speech to tech support. So that's, we're, we're using whisper, uh, large v2. In that example, we have support for embeddings and embeddings models, so a tech DevX support, uh, And it's designed really outta the gates to run a distributed environment. So, so every model that we have up there right now supports GPU and cpu.
Uh, and actually, uh, mps the metal shaders for that matter. And so, uh, when I say distributed, our default deployment instructions go over Kubernetes. They're assuming that you have a cluster of machines instead of often tools will expect you to have a big monolith that. You're running one thing at a time, you're loading in and out in this, uh, you can actually go start to scale leapfrog from day one.
And right now we're shipping with WE eight from a vector database perspective because we wanted people to really get started with being able to add in vector embeddings into these prompts very quickly. So I'm gonna take a quick look as I switch over to, I'm gonna take a quick look at questions while I switch over to, uh, my demo environment.
Uh, Okay. I don't see any for the moment, so I'm gonna switch my screen share and go over to, let's see here. First off, let's do a, a terminal based demo. And I'm just going, uh, throw up, let me make it a little bit bigger for the purposes of the, the slideshow. And I'm gonna go import OpenAI. And we wanted this experience right outta the gates.
We wanted to be able to do that with hugging and face. And as part of that caching models in a model Zoom browser, we'll talk about that. But if we just set the OpenAI, um, base, your or API base,
uh,
and we put in a key.
Right outta the gates. We can, we can start to look and list models that are out there available if I were to choose the right, um, API base. This is the cur of light demos. So we'll do a model list. And now we have our, our list of models. So we have stable M three B tuned here we have, um, text embedding 8 0 2.
This is actually stable. Um, sorry, mini lm, uh, uh. It's, it's in the repo. I can, I can demonstrate that, but or so all L at mini l L six v2 and we have whisper. So if I were to come in here, I can go create a completion and this is federating out. So it's hitting the front end api, uh, API server. And from there this completion.
Those over G R PC is able to stream back to another backend. And so ready out of the box is very critical for us that these community tools just work. Um, and so we have a response. We've got the tokens, so that's one, that's one way to use it. Uh, and out of the gates as well, we have an an embeddings response endpoint.
So for people working with our vector database, they can go get vectors just ready to go. So I'm gonna switch over to one more set of demos here, um, and then get back to the slides.
Let's see here. So we actually hosted a, a, uh, internal hackathon once this was starting to get released and, uh, we were wanting to use internally and so other unicorns might, some of my colleagues actually went out and built applications. Over the past couple days, uh, this was Thursday, Friday, demonstrating all the things you could do with this.
And so one of them built Doug is is the defense unicorns mascot. Uh, so we actually built out a, they built out a Doug Translate tool. So if we drop in a translation, and this is the speech from Rie Garger and launching the Soys, we can do that translation and it's even multimodal to where you can hit summarize with it.
The, uh, you know, stable three B is. Interesting in the translations, but that's, that's why we wanted to be able to have a zoo and a, and a propagation of model availability. So it takes about a hundred, uh, or 10 seconds per minute of audio. So it should return back here in a few seconds. Um,
and so we have that, that, that translation, we'll skip the summarize for now, but that goes back and hit stable LM three B for that. And then one more thing we did was with the code model, be able to go in and ask plot air quality data on June 6th, 2022.
And so this was actually able to generate out that plot, get that data for us, and, and be able to form that. And this is really representative, a lot of the real things going on in these environments and the things that they work on. So I'm gonna go back to my slides now.
Uh, bear with me. We go. So to look at the roadmap real quick, and Lily, on your end does tho do, do, do those show backup? Okay.
I'm gonna go with no. Yes. For, for, for that. Uh, what we're looking at from a roadmap. Oops. There we go. My page was unresponsive. So back from a roadmap perspective, um, lemme make sure that didn't drop me. I seem to have frozen.
Okay, we're back. Uh, so from a roadmap perspective, we're working on G GML and corporate quantized model support. We wanna be able to support more models in more places. Uh, we wanna be able to do a model view and a registry, uh, and then enable APIs for fine tuning and performing model availability for these missions, we get a lot of questions requests to be able to take a model.
Fine tune it and continue to use it. Um, proper APIs for document management and ingest and really just adding more models in. So with that, um, I wanted to thank everyone for coming to this talk. Uh, really appreciate the time and, and everyone's spent to come listen to this. And, uh, I guess I'll turn it over to questions q and a.
Um, but also talk about like how to come contribute. So we're open source, open contribution. This, uh, we're, we're working with various open source foundations for, to find a home for this. Uh, the repos available there. We have a Discord community. Um, and then come check us out at Defense Unicorns. And, uh, one thing is, is all of this would not be part possible without a tool called za.
Uh, that allows us to really quickly, another defense unicorns project. It allows us to very quickly deploy and get, uh, tools like this into production and air gap environments.
Woo. I feel like sometimes with the virtual conferences, you don't get the kind of like applause that you do in an in-person, so I wanna try and try and replicate it. Thank you so much. Thank you. That's a thunderous applause for, for, for one. So thank you. Yeah. All yours. All yours. You don't have to share it with anybody.
Um, Thank you so much. Let's give it a moment. There's a bit of a delay, so I wanna give people a chance to drop questions in the chat if they have them. Um, and also highlight the wee vate. Shout out the MLS community is actually partnering with Wee Vate to do a couple, um, hackathons. We have one in, uh, Berlin happening tomorrow.
So it's cool to see that you've used Wvi eight and have found it to be useful. Yeah, we, we interact with them. Uh, we've had a lot of success with WE eight. I, I can't recommend them enough. Uh, if I'm doing local work, I'll use Chroma, but for, for, for us, for WE eight, they also just announced multi-tenant support, which is really important for us, uh, in these environments.
So being able to fully bifurcate all of the collections at a system level critical for that type of work. Awesome. That's great to hear. Um, alright. People put your questions in the chat, um, and. I think what we're gonna do on this stage once we let Jared go is we're gonna have a dance break. Uh, Jared, you should tune into that after this.
Demetrios has a dance break for everybody to move a little bit. So maybe I'll send you to the chat. Maybe you can share some links there and answer any questions. Um, for folks. And yeah, this was awesome. Thank you so much. Yeah, no problem. And actually to follow up on that, I'm in the ML ops community Slack.
Um, I'm in this chat and, uh, we have our discord as well for longer term questions. And then again, open community so people can come, open issues. So available in lots of areas for, for follow ups. Awesome. I love to hear that. That's, that's great. Cool. All right, well, have a wonderful rest of the day. We appreciate it so much.
Thank you all.