LLMOps and GenAI at Enterprise Scale - Challenges and Opportunities
Andy is a Principal AI Engineer, working in the new AI Center of Excellence at Barclays Bank. Previously he was Head of MLOps for NatWest Group, where he led their MLOps Centre of Excellence and helped build out their MLOps platform and processes across the bank. Andy is also the author of Machine Learning Engineering with Python, a hands-on technical book published by Packt.
I'm a tech entrepreneur and I spent the last decade founding companies that drive societal change.
I am now building Deep Matter, a startup still in stealth mode...
I was most recently building Telepath, the world's most developer-friendly machine learning platform. Throughout my previous projects, I had learned that building machine learning powered applications is hard - especially hard when you don't have a background in data science. I believe that this is choking innovation, especially in industries that can't support large data teams.
For example, I previously co-founded Call Time AI, where we used Artificial Intelligence to assemble and study the largest database of political contributions. The company powered progressive campaigns from school board to the Presidency. As of October, 2020, we helped Democrats raise tens of millions of dollars. In April of 2021, we sold Call Time to Political Data Inc.. Our success, in large part, is due to our ability to productionize machine learning.
I believe that knowledge is unbounded, and that everything that is not forbidden by laws of nature is achievable, given the right knowledge. This holds immense promise for the future of intelligence and therefore for the future of well-being. I believe that the process of mining knowledge should be done honestly and responsibly, and that wielding it should be done with care. I co-founded Telepath to give more tools to more people to access more knowledge.
I'm fascinated by the relationship between technology, science and history. I graduated from UC Berkeley with degrees in Astrophysics and Classics and have published several papers on those topics. I was previously a researcher at the Getty Villa where I wrote about Ancient Greek math and at the Weizmann Institute, where I researched supernovae.
I currently live in New York City. I enjoy advising startups, thinking about how they can make for an excellent vehicle for addressing the Israeli-Palestinian conflict, and hearing from random folks who stumble on my LinkedIn profile. Reach out, friend!
Generative AI is not going anywhere, but many organizations are struggling to translate a very active research and development activity from POC to production solutions. In this brief talk, I'll highlight some of the challenges I think we need to overcome if we want to deploy GenAI solutions at scale and I'll also talk about some of the opportunities this presents.
LLMOps and GenAI at Enterprise Scale - Challenges and Opportunities
AI in Production
Adam Becker [00:00:00]: Excellent. Thank you very much for coming on board. For those of you that don't yet know. Andy. Andy, if I recall correctly, you are also the author of Machine Learning Engineering with Python.
Andy McMahon [00:00:12]: Yep.
Adam Becker [00:00:13]: Right, nice. And you're here to talk to us about challenges and opportunities with llms and Gen AI at enterprise scale. And with that, you need to share your screen. Right. I'm going to let you in. And please, you have the mic?
Andy McMahon [00:00:29]: Yeah. Thank you so much, Adam. Hi, everyone. Really excited to be here. Glad I'm kicking you off. I have 10 minutes to get through way too many slides, so I will do my best to get through them but still be comprehensible. Adam already introduced me, so I won't do that. I'll just crack on.
Andy McMahon [00:00:46]: The first thing I want to highlight to everyone today in this really short talk is that deployment of llms is really hard. Deployment of Geni is really hard. And I think it's clear that everyone, if you look around at your peers, other organizations, everyone is using generative AI, but not many people are successfully deploying it yet at scale, especially in larger organizations like the one I work in. So if I reference here from my book that Adam mentioned, sort of four stages of a typical machine learning lifecycle as I see them. So the discover stage where you get to know the problem, the play stage where you build a POC and then the develop and deploy stages, self explanatory. I think basically a lot of people are getting stuck at the develop stage just now. And what I want to talk to you about is some of the things I think about, about why that is hard, especially in larger organizations. So the first thing is, I think it's important to call out that.
Andy McMahon [00:01:40]: And a lot of you will know this. ML ops and LLM ops, classical machine learning and llms and generative AI, they are different. I've listed lots of reasons they're different here. There are tons more. I think some of the most important ones I want to call out are things like the fact that when you are trying to understand what business problem you can solve, you're now asking, can I solve this using a generative approach rather than a classic classification regression or unsupervised, et cetera approach. So it's a bit of a different framing of the problem. The data you're playing with is prompts, it's context, it's less tabular data, it's less features, although that can factor in as well. And then the processes we're using to lay out and build our solutions are a bit different.
Andy McMahon [00:02:22]: We're using different tools for pipelining, for orchestration. We're using different metrics. And that's important because a lot of organizations have built a lot of muscles, a lot of memory around classic mlops. So transitioning to this brave new world is difficult for them. I think another point as well is when I, as a kind of senior leader in industry, want to select a foundation model. I'm not too interested in what's topping the hugging face leaderboard. I'm more interested in things like what's the cost per query? How much will it cost me to complete this specific task? I want the LLM or the agent I'm building to do. I want to think, what's the cost per user, per response, per day, et cetera? I think about speed, latency, throughput, and I think, does this model fundamentally solve my problem? Will I see the benefits I expect and I hope for.
Andy McMahon [00:03:07]: And that's a bit different from what's topping the leaderboard today. There's a new stack emerging that's a challenge for organizations, especially where you have long lifecycles for taking in budget and trying to spin up new infrastructure and new tools. So basically we have to try and be more nimble and adapt to this brave new world. This diagram from a 16 z that a lot of you will have seen highlights the emerging LLM app stack. And what I like about this is there's a lot of boxes here that are things we're used to, like orchestration, monitoring, logging, caching, et cetera. There's some new boxes, but in each case we're adapting and changing how we view this. So things like embedding models and vector databases are brand new, but how we orchestrate models has to change as well, using things like Langchain, llama, index, grip tape, other tools like that. Another thing is that when we get to enterprise scale, we're often thinking about things in the largest possible sense.
Andy McMahon [00:04:02]: So if you think about things like pretraining, if you want to pretrain your own models, that's going to be a huge effort, a huge investment. If you want to fine tune even that's a huge investment. Storage is exploding because the models are huge. The data they need, especially if you want to run really large, say rag systems, all of these things cost more in terms of storage latency, as I've mentioned, can be a bit of a challenge. We want to optimize that, and all of this costs lots of dollars. And we need to try and think about that at enterprise scale because we're working to a budget, we have to justify ROI. It's a very clear cut kind of value equation we're solving. So some of my tips, some opportunities in here don't pre train your own model, unless you're Bloomberg and you want to do Bloomberg GPT, go for it.
Andy McMahon [00:04:45]: But anyone else, I think, generally stay away from it, unless that's your core value proposition. If you fine tune, use a scalable framework. If you do storage and latency optimization, use off the shelf techniques, quantization, memorization, caching, lots of other things out there. And for cost, I think try and develop a portfolio of tools, architectures and build for reuse. Specifically drilling down to some things that are going to be a particular challenge, I think are things like the new data layer or data fabric that's emerging in large organizations. So we had our data lakes and our lake houses, and we had our experiment metadata trackers and our model registries. But now we have to augment that with our vector databases. Think about things like our prompt hubs, and we have to build a lot more application databases, I think, than maybe we were used to, especially in data and analytics functions in large organizations, I think as well, monitoring.
Andy McMahon [00:05:36]: Monitoring is super important from an ML ops and LLM ops point of view. Something I think about all the time is how are we going to build in the correct workflow that takes together correct objective ground truth, subject matter, expertise, human evaluation, and then things like LLM evaluation that makes the situation a bit more complex. We then have guardrails. Again, I'm zipping through this, so it'll be good to see if there's any questions coming up. But the guardrails sense, I think, views something like Nemo guardrails, very easy to configure, very easy to build. What's not so clear yet is how you standardize this across very large teams. So at Nat West Group, where I work, we have 500 data scientists and engineers, and I think there's a big question there about how do you make those 500 scientists and engineers work to the same script? Same. If you're doing things on the output of your LLM, it's the same question.
Andy McMahon [00:06:29]: More generally, at the macro scale and thinking about processes and organization challenges, we have this kind of very dynamic architecture that's evolving. Things are not set in stone. It's not very clear how these things will change. I've already seen articles arguing that rag is going out the window, even though it's just arrived on the scene. And that's very hard for large organizations to keep up with so I think we need to adapt. We need to leverage things like building centers of excellence that leverage the learnings across our different teams as they build out use cases. And we also have architectural teams that are really able to adapt what has to happen at the platform level with what happens at that use case level and really learn from each other as quickly as possible, which is hard in a large. Moving on to another challenge I see in the kind of the organizational level is the fact that the team structures are going to change.
Andy McMahon [00:07:18]: So we've had our classic mlops teams where the engineers and scientists are kind of very clear roles. We're now going to augment that with new roles like AI engineer, and maybe not new roles, but people who are playing in the space that maybe traditionally wouldn't in certain organizations. So I know I'm seeing a lot more software developers working alongside our ML engineers and our data scientists as they build Geni applications. There's just more need of more front ends to be built as it's natural language interfaces, which are closer to the user, and more database work, et cetera, that comes with that. So that's important as well. And then kind of sort of wrapping up a little bit ahead of time, which is good, because every time I've practiced this, it's around 11 minutes. I think it's important to call out that we've been here before. Geni is so new.
Andy McMahon [00:08:05]: Everything seems like it's sort of just so transformational, so different. But we should remember that when data science took off, 80% to 90% of organizations couldn't put solutions in production. Organizations were confused about how to utilize these capabilities. There was a lot of hype as well then. And then we had things like mlops come on the scene. It didn't solve everything, but it started to mature sort of our practice. I think the same will happen with Gen AI and with llms. Then the final thought is that the organizations that are going to win in this space, I think, are the ones that can really industrialize that Geni development process and try and leverage all the learnings they've had from the mlops journeys they've been on.
Andy McMahon [00:08:45]: Just apply that to Geni with the caveats I've mentioned before, as they do that, they'll smooth the road to production. It will become more of a kind of ingrained behavior within the organization. We'll have to adapt our teams and our structures, our operating models, but that's a good thing, I think, to take advantage of this new technology, we have to embrace that new technology and the stack that's coming through from it. And then finally I think what's really, really important is we have to recognize that we don't have to reinvent the wheel. This is additive. A lot of what we've built in mlops over the past few years in ML engineering still stands. We're just adding new pieces to it. And I think organizations that do that will be successful.
Andy McMahon [00:09:23]: And with that, I thank you and end my snappy lightning talk.
Adam Becker [00:09:27]: Awesome, Andy, thank you very much. That was brilliant and full of insights. And a couple of things that stood out for me is just the pace at which these things are changing. Right. And just the organizational risks that are incurred by this. Even a question like, is rag going to be here soon?
Andy McMahon [00:09:44]: We're going to have a couple of.
Adam Becker [00:09:45]: Talks that are going to tease this question in a little bit more detail. We have some questions for you, and since you finished a couple of minutes before time, so I feel compelled to ask you those. Okay, so a couple of things here. The first is Vihan is asking from the chat, do you have a sense for how to estimate cost per query?
Andy McMahon [00:10:04]: Oh, this is a great question. So some of the things, when you're using a vendor supplied model, the openei as your service, et cetera, they give you quite clear information using AWS, bedrock, et cetera. So I think cost per query, at the most basic sense, you get that number per token. What we're finding difficult, or what I think is difficult is in many use cases, you don't have any tokens you need, but you often just kind of give a bit of an envelope and work with that. So I would say leverage the pricing mechanisms that are out there and don't try and reinvent the wheel, but definitely use what's out there and just try and be sensible in your estimates.
Adam Becker [00:10:36]: Yeah, for sure. Along that same line of thinking, I think people, just to clarify for the audience, do you mean to use the foundation model instead of. You don't think that anybody should go and tune one on their own or at least train one on their own? Right. It says here, don't pretrain your model. I think that people heard you say don't.
Andy McMahon [00:11:01]: My advice was, if you're starting in this space, you shouldn't think, oh, I'll go pre train my own model, I'll go build my own LLM. Bloomberg did it with Bloomberg GPT. I think it was a huge investment and I'm not sure what ROI they've got from that. They're obviously keeping it cards close to their chest. But basically, yeah. I think unless you have a very specific reason for doing it, like you are OpenAI or a company who's going to be a vendor of these llMs, you shouldn't do it. It wouldn't make sense for sort of some of the organizations I know about to do it. But it is always a case by case basis and as costs come down it might make more sense.
Andy McMahon [00:11:37]: I think just now it's very expensive.
Adam Becker [00:11:40]: One quick one here. Where do you think organizations are with respect to the application databases that you mentioned? So how is common knowledge, metadata, et cetera, being managed for these various applications? And do you see any opportunities for service or technology providers?
Andy McMahon [00:11:55]: Yeah, it's a really good question. I think there's a lot of work we can leverage from existing software developments. So just postgresql databases and things like that can often be very useful in this space. I think where I'm seeing a lot of activity from vendors that's quite exciting and interesting is, you know, kind of providing out of the box rag applications or out of the box geni packaged solutions that have the full end to end stack. So not just the application database layer, but also the interaction with the LLM, the pipelining, the orchestration. I think I'm seeing that and I think that's a really good offering because a lot of organizations aren't that far on their mlops maturity journey, never mind their LLM ops maturity journey. So I think that's more a kind of exciting space that I see vendors playing. Is that kind of pre packaged solution end to end? Here you go, configure it.
Adam Becker [00:12:48]: Awesome. Andy, thank you very much. You're going to be on the chat and on slack in case people have more questions. And, and thank you very much.