Kubernetes, AI Gateways, and the Future of MLOps
speakers

Alexa Griffith is a Senior Software Engineer at Bloomberg, where she builds scalable inference platforms for machine learning workflows and contributes to open-source projects like KServe. She began her career at Bluecore working in data science infrastructure and holds an honors degree in Chemistry from the University of Tennessee, Knoxville. She shares her insights through her podcast, Alexa’s Input (AI), technical blogs, and active engagement with the tech community at conferences and meetups.

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
SUMMARY
Alexa shares her journey into software engineering, from early struggles with Airflow and Kubernetes to leading open-source projects like the Envoy AI Gateway. She and Demetrios discuss AI model deployment, tooling differences across tech roles, and the importance of abstraction. They highlight aligning technical work with business goals and improving cross-team communication, offering key insights into MLOps and AI infrastructure.
TRANSCRIPT
Alexa Griffith [00:00:00]: My name's Alexa Griffith. My title is. I'm a senior software engineer at Bloomberg. I love whole milk, so I just do regular coffee and milk. I'm a savage. I'm weird.
Demetrios [00:00:13]: Welcome back to the mlops community podcast. I am your host, Demetrios. And today there was one gigantic piece of this conversation that I cannot stop thinking about and that was when Alexa mentioned how useful it has been for her to have mentors and be on teams that could choose problems with real business value and how that has translated for her in her career to working on a whole lot of open source projects because they were able to properly champion for the need to work on these open source projects and tie it back to how it would help the companies that she has been at. I love hearing that. I loved talking with Alexa. Let's get right into it.
Demetrios [00:01:34]: Let's get into this podcast with Alexa. Wishing you all a very special day. Hope you enjoy.
Demetrios [00:02:46]: But tell me about Japan because I really want to know. You were skiing there?
Alexa Griffith [00:02:50]: Yeah, it was awesome. It was crazy. There was so much snow. Over three meters of snow. I mean, it didn't stop snowing there. I went to Hakuba actually, which is right outside of Tokyo, but it was amazing. I know Niseko is pretty popular, but yeah, it was an amazing trip. I think Tokyo is really great too.
Alexa Griffith [00:03:10]: We went to Seoul as well, so I hopped around for a bit. I was gone about two weeks, so it was a long trip, but it was so amazing.
Demetrios [00:03:18]: Yeah, I know that you just got back and you are still having fun with a little bit of jet lag, which is always nice. I found a trick for when I do the trips across the pond to go back and visit family. And if you wear compression socks, then the blood doesn't like coagulate in your feet. And apparently by keeping the blood flowing, it helps in some magical way combat jet lag.
Alexa Griffith [00:03:48]: Wow, that's amazing to know. Okay, yeah, I'll keep that in mind for next time for sure. Maybe that's my problem, is that I didn't wear those socks.
Demetrios [00:03:56]: You should have worn the compression socks. That and a lot of water apparently helps a ton. And then seeing sunlight in the place that you're at as early as possible also helps.
Alexa Griffith [00:04:08]: But those are good tips.
Demetrios [00:04:10]: Yeah. So you've been in tech for a while. I think that you've got a really cool story. And actually what I want to start off with is that you've. You've written a lot about your tech journey. And one thing that you wrote about was using airflow in a past job. Right. And using kubernetes and airflow.
Demetrios [00:04:31]: And I am fascinated by what to this day do you remember as being like the biggest pain when it comes to airflow and using airflow.
Alexa Griffith [00:04:45]: It's funny you asked that because it has been a while. And we were using airflow like we were running it ourselves, building it ourselves, deploying it ourselves, and our data scientists were building a lot of dags with it. I remember that we also built a few pipelines for the data scientists as well. I was on a data science infrastructure team, so that was part of something that we own. And I. My first task was getting logs to show in airflow so that people could view the logs of their jobs. And basically we just had logs going to a database and then I was pulling from that database and showing them on a screen while the. The task was running.
Alexa Griffith [00:05:24]: So I think there were a lot of operational things that we had improved, and there are a lot of things that improved within airflow as well. We had like a fork of it at the time too. What I remember the most was just it would crash a lot. It couldn't handle a lot of workloads running and spanning at once. Now this was right when I started, like I said. So maybe there was a resource thing or maybe tuning the resources was also an issue. But I think our scheduler would kind of crash out a lot at what we were trying to do. Which also begs the question, like, maybe there was a better tool for the large fan out that we were trying to do as well.
Alexa Griffith [00:06:03]: But I remember that at the time was an issue. It was really great to start off writing about airflow for me because as you mentioned, I have a. I would say atypical, but I feel like a lot of people now in tech are a little bit atypical too. I didn't have a CS degree. I had a chemistry degree. I actually studied computational chemistry. I did research in computational chemistry. And so that's kind of how I jumped into the world of software engineering and a way to learn and digest that Information, I started writing about it and that's kind of how that came about.
Alexa Griffith [00:06:40]: And it was really great that my first project was airflow, because at the time so many people were also interested in how other people are using airflow and the problems we were having, how to best set up a dag, how to structure it and make the best use of it. So it was really great. It was a really great opportunity for me that led into a lot of other things.
Demetrios [00:07:01]: Yeah, it's, it's interesting that you say, like, oh, maybe this wasn't the right tool for the job, but back in the day when you were using it, I think it was kind of the only tool for the job. And there wasn't this maturity that you see now. There's all kinds of pipelining tools. Whether you're doing ML pipelines or you're doing data pipelines, you have a lot more options. But just four years ago, even you didn't have those options. And that's what spurred, I think a lot of people like yourself had experiences where airflow was crashing or it wasn't working in the way that they wanted. And then they went and they started their own companies because of that. And so you see now the maturity around the pipeline space.
Alexa Griffith [00:07:45]: That's so true. I mean, I remember at some point we started to discuss, okay, now should we look into Kubeflow? Because Kubeflow was becoming popular. Should we look into Argo workflows? I mean, we used Argo CD at the time, which is a great tool too. I mean, so nice to use. I remember when they implemented Argo cd, it was like, like, wow, this is amazing that we can manage our deployments and see all of our resources in such a great way. So, yeah, Argo was such a hit and still is. But yeah, I, I think we started to kind of think about other things, but we, we had invested and now I don't know if they're still using it or not. Maybe they have changed, but yeah, it's right.
Alexa Griffith [00:08:22]: I think even I've been in tech for, I think five years now or so. I, I, yeah, coming up on six, five or six, who's counting? But I think it's sometimes easy to lose that context of how much has changed, because I'm thinking, now there are other workflow tools, but then you're right, they weren't as mature as they are now and how quickly that changes. Crazy.
Demetrios [00:08:44]: Have you played around much with Argo workflows?
Alexa Griffith [00:08:48]: Personally, no. We have a team that manages Argo workflows, but I have some for jobs and it's super useful and from my understanding, pretty easy to use. Similar to Argo cb. It has a similar layout from what I understand, but personally, no, I haven't used it so much.
Demetrios [00:09:06]: Yeah. Cause I always wonder about the different pipelining tools that are out there. At the end of the day, you've got these dags. And I was just on a conversation earlier this week with a guy who was talking about how basically everything is a graph and everything is a workflow in one way, shape or form. Whether it is a technical workflow or it is a business procedural workflow or flowchart. You can look at it. And he mentioned that usually the business procedural flowcharts are much more complicated than the technical dags, because he was like, yeah, and his name's Alex, Alex Malawski. And he was saying how a technical dag in general you get like three or four steps.
Demetrios [00:09:58]: It's like, I wanted to do this and then do this and then do that. Going back to the original idea on why I was mentioning this with Argo workflows is that you have different folks with different backgrounds and needs that will come to pipelining tools with their ways and opinions of doing things and Argo workflows. I think folks who are coming from a DevOps background really tend to gravitate towards that. And then you have the airflows or the dagsters and the prefix and mages of the world, and folks who are data engineers can understand those or like those a little bit better. And then you have the Kube flows and the metaflows and zenmls of the world and folks who are like ML engineers kind of vibe with those better. And so you get all the, all this space to play with your pipelining tools that you like. And this is completely forgetting about the pipelining tools of the like, no code, low code pipelining tools. Let's just take those out of the picture.
Demetrios [00:11:05]: But it's fascinating to think about that and how each background lends itself well to one type of tool or one type of space that you'll get into.
Alexa Griffith [00:11:17]: Yeah, that's so true. I think you see that across a few different things, especially in the AI ML world where you have the infrastructure people creating things and then also the AI ML engineers using it. So even, you know, I think they favor or AI ML engineers favor Python. Like typically it seems like infrastructure engineers favor GO or YAML configs. And the differences in the how you interact with the APIs or the tools, for sure, I, I've seen that quite a bit.
Demetrios [00:11:46]: Have you found ways that you see the interactions between those like two Personas do better or worse? Because there is a very much like opinionated side for the infrastructure folks versus the opinions of the ML and even data scientists. That's almost like very removed opinions and workflows and what they need to get done. But I wonder if you've noticed things that have helped bridge the gap between these different Personas.
Alexa Griffith [00:12:19]: I think you're exactly right. It's. They are user of our platform and. But we're both engineers so it's an interesting, it's a, it's an interesting interaction and I think there's a lot of work now and it's not exactly always easy but to abstract away Kubernetes and those main. The things that what's a pod, what's a container, what's, you know, I mean you can get really like what's a virtual service? Do they really need to know that? And trying to abstract as much away as we can from them. I think that that seems to be the move. Just be able to clearly tell you based on the three different types of errors that you get in your YAML spec, if there's an error, like what does it actually mean for the person that deployed it, who maybe doesn't really know much about Kubernetes, like what action do they need to take? Because there should be some action they need to take, something wrong with their service they can fix. And trying to bridge that gap is something we've been working on a lot that I found quite interesting too.
Alexa Griffith [00:13:16]: And at first my reaction was what do you mean they don't need to know about Kubernetes, you know, from our side? Like of course they need, they have a pod, they're running it, they have a container. But I mean, you see it more and more and in general, Kubernetes has so many abstractions on top of it to build all these tools. I mean that's what a lot of Kubecon is, a huge Kubernetes conference. That's a lot of, you know what it is a ton of different tools that people have built on top of Kubernetes to make it easier to run in all these different ways. So I think that seems to be kind of what we're working on is trying to make sure they don't have to need to see all that stuff.
Demetrios [00:13:51]: Speaking of Kubecon, you spoke recently, right, at Kubecon in the Kubecon AI day.
Alexa Griffith [00:13:58]: Yes.
Demetrios [00:13:58]: Can you talk to me about what you what your presentation was on?
Alexa Griffith [00:14:02]: Yes. Recently I gave a keynote at Kubecon for the Envoy AI Gateway. It's a new open source project that the engineers from Bloomberg and Tetrade have been working on together. Again, it's an abstraction layer basically on top of the Envoy Gateway project, which is a server proxy that is used in a lot of our or is used in our inference services to manage things like traffic race request. It logs and monitors traffic flow. So it's super useful for us. We use it and when all these gen AI models are coming out, there's a lot of different complications that come with them. Compared to the inference services in the past, they're way larger.
Alexa Griffith [00:14:48]: They use tokens for those reasons. They have a different set of problems that need to be solved. So no longer do we want to rate limit a service based on a request, but would be really useful is to be able to rate limit a service based on the number of tokens because that's the unit of work that we're trying to, trying to distinguish against there. So that was a really cool experience for me because I never talked in front of that many people before and it was really cool to introduce a really interesting new open source project as well on behalf of, you know, all the engineers that have worked together at Tetrait and Bloomberg. So I'm super excited about that.
Demetrios [00:15:30]: Yeah, it's always fun. Especially when I think I heard a stat that 95% of people would rank public speaking as a greater fear than dying.
Alexa Griffith [00:15:43]: Oh my gosh. Really?
Demetrios [00:15:45]: Yeah.
Alexa Griffith [00:15:45]: Okay, that's funny that. Yeah, it is for a lot of people. That's true. And I was nervous, but in an excited way.
Demetrios [00:15:53]: So talk to me about the difference or how KSERV and the proxy that you set up, what is it called? It's Envoy.
Alexa Griffith [00:16:03]: Envoy AI Gateway.
Demetrios [00:16:05]: Yeah, Envoy AI Gateway. Which is different than Envoy, right? There's an Envoy and then there's an Envoy AI Gateway.
Alexa Griffith [00:16:12]: Yeah, Envoy AI Gateway is the new project that builds on top of Envoy Gateway. It basically just adds some more features that are specific for these LLM gen AI models.
Demetrios [00:16:23]: Okay, and how does that interact with kserv? Like where and how does the stack look with those two?
Alexa Griffith [00:16:31]: Yeah, yeah, for sure. So I mean, I've kind of assumed that most people know what Kubernetes is here, but these are all open source projects that are built on top of Kubernetes, which helps you to run your services at scale and to manage them. So. So Envoy is a service proxy. So every request that comes into your cluster or clusters is routed through can be routed through Envoy. So this helps specifically with things like, you know, traffic control, traffic management, observability. It can do request routing. So things like that about managing the request and how it moves through the system.
Alexa Griffith [00:17:10]: But yeah, so knative is. I'll just mention that also it's another tool or a building block of things we use. So Knative Serving is the main project within K Native that we use is also open source. It's great for helping to run applications with this idea of being serverless without having to worry too much about the infrastructure. So again, it's abstraction on top of kubernetes with the goal of just making it easier to set up and run a service that serverless. That by serverless you basically mean something that can provision resources dynamically doesn't really care about so much about what server. These resources you don't have to specify that it figures it out for you. So it makes it easier running.
Alexa Griffith [00:17:55]: It's all to make things easier to run because if not, you have all these YAML files with all these configs you could be copy and pasting them and you're setting up all these resources to make sure everything runs. I really like running kubectl tree. There's a tool called kubectl tree and from that what you can see is a tree of the top resource you have. Because we in Kubernetes all these different resources like deployments, pods and there's a few more depending on what you're running and it can show you all the different resources that are made and you can go into the namespace and like see what the configurations are and what they're doing. I find that super helpful for like understanding these types of things. But basically these tools help you to automatically set up all these things so you don't really have to worry about it. So they're all making some assumptions that most everyone needs these configs. And usually for most tools that if you need something more specific, well, you can add that or change it as well, but not so you need to know and specify everything yourself.
Alexa Griffith [00:18:51]: But knative is. So that's the instruction to help you just to get things running in a serverless way. It helps you with auto scaling. So scale up and scale down your services based on things like usage and also to scale to zero, which is something we actually use quite a bit because GPU resources are limited. So some services, if they can, they can use this thing called scale to zero, this feature. And if you can make a setting like, oh, if I haven't gotten any request in an hour or two, then scale it to zero and let someone else use that GPU resource instead of taking it up without being active. Some can't use this, but when you can, it's a quite useful tool to have. So yeah, so all that to say those are the building blocks of kserv.
Alexa Griffith [00:19:37]: Kserve is another abstraction that simplifies actually running AI models, AI services, so inference services themselves. So basically what it is is you can have this really short YAML. It's like this big. If you're using a model out of the box or some AI like Ollama or something, they have support for that. And what you can do is you can just easily get it running in a serverless way. So that's what's nice about it, is that you don't have to worry about all these configs. The goal is that you don't have to have this whole team that knows everything about how to run AI models. They can focus on other things like building the platform as well.
Alexa Griffith [00:20:16]: But yeah, so it makes it a lot easier to run these services.
Demetrios [00:20:20]: Okay, so so basically the gateway is on the front end. As the traffic comes in, it can dynamically provision and then you have the Knative, which helps provision. So it sees like, oh, gateway is saying that we need more resources, let's provision those. And then kserve is built on the Knative in that oat. One of the resources that we need is to be able to ping this model. And the model might be a large language model or it might be some kind of a random forest model. It doesn't really matter in that way. Is that what I'm understanding?
Alexa Griffith [00:21:01]: Yeah, yeah. So, so KService, specifically like Knative, can run just serverless services no matter what, but kserv is specifically for AI models, ML models, things like that. Exactly. So it has a lot of out of the box tooling specific, just like Envoy AI Gateway is specific for AI models. So is kserv. It has a lot of out of the box tooling and features that are specific to what you need for an AI model. So it can support like easy configs for running a lot of out of the box models. Like it easily sets that up.
Alexa Griffith [00:21:31]: Or you can have a custom predictor is what we call it. You can like write your own predictor. So we just basically spin up this service, you get an endpoint, you can hit it, and we have this unified API which is really useful because all of these different model providers can have different access patterns about how to hit the model. So one really nice feature of kserv is that no matter which model you're using, you can always use the same unified API and an extension of this or Envoy AI Gateway is an extension of this. It also, as one of the free MVP features, has a unified API. So if you're trying to reach a model in Bedrock or you have something on prem, it shouldn't matter. You'll have one unified API going through the Element AI gateway and it will, it will under the hood, direct the traffic with the correct, the correct structure to wherever you're. You're using your model.
Demetrios [00:22:25]: Oh, nice. Yeah, so that, that was actually one of the other questions that I had is like does this only allow you to use AS on PREM services or open source models or can you also dynamically send some stuff to the OpenAI API and then maybe you want to send some stuff to Anthropic or whatever. If you do have your own services and your own open source models, you can send it to those too.
Alexa Griffith [00:22:53]: Yeah, so Envoy AI Gateway allows you to be able to run cross cloud, like hybrid cloud, and that's one of the big features. And one of the reasons that that problem arose as well is because all of these different cloud providers have different ways of accessing their system. But yeah, I mean case of itself is, you know, similar to something like SageMaker, Google, Vertex. If in cloud you can definitely run Kserv as well if you're running, if you want to manage your own, you know, inference services. I mean it's definitely helpful if you don't want to use one of these cloud providers if you want to. Maybe if you're worried about cost savings and you want to manage something yourself, a lot of people can use it on cloud as well. But I mean also there are these products for inference running inference services that are also very similar, have the same goal.
Demetrios [00:23:41]: So ba. So the Envoy AI Gateway is. And just stick with me here because I know I'm a little slow on it, but. No, no, you're good first time. I'm really digging into it and I really like these AI gateways. It's not the first time that I've heard of it, but I do find that it is a problem folks have, especially when you get rate limited so easily. If you're using external services, you want to have some kind of a fallback plan. And so it's almost like Envoy AI Gateway is an abstraction out and you can throw whatever endpoint you need underneath that.
Demetrios [00:24:25]: So whether you're using a SageMaker endpoint or a vertex endpoint or a K Serve endpoint, you can link those all up to the Envoy AI gateway and it will figure out where the request needs to go, depending on what it is.
Alexa Griffith [00:24:41]: Yeah, yeah, exactly. Yeah. It has a unified API where you can easily specify what you need to do and it'll auto route for you, which is great. It's all about just making it easier to run and easier to manage and yeah, you start to see patterns. Like I said, everything's kind of starting to be built on top of Kubernetes and built on top of these other tools. Always the goal to just make it easier to run and not have to worry about the infrastructure so much. Yeah, I mean there are a lot of also really cool or really interesting problems brought on by these large language models in Genai systems as well. The model sizes are so large that downloading the model takes a long time as well.
Alexa Griffith [00:25:20]: So one interesting problem that KSERV is starting to work on is the model cache. Being able to cache models and not have to download them every time a pod starts up for every pod. So that's something like little things also that will be super helpful like working on GPU utilization as well of course, because GPUs right now are resource limited. So a lot of these issues around running and getting enterprise AI up and running, being useful and being optimized is something that we're definitely working on.
Demetrios [00:25:53]: Yeah, that cold start is. It's like you can't.
Alexa Griffith [00:25:56]: It's a while sometimes.
Demetrios [00:25:58]: Yeah. And you can't really base services off of that if you're like, oh well, yeah, just come back tomorrow and maybe it'll be running. We'll see.
Alexa Griffith [00:26:08]: Yeah. I mean it's wild how large models have gotten in such a short amount of time. How much more they're able to do as well. I mean I think. What was it? Llama 31 just came out and it has over. It can take over 128,000 tokens. I mean it has a. It also has a crazy storage size.
Alexa Griffith [00:26:28]: Right. Like it can't fit on one node. So yeah, it needs at least two nodes around, which is also makes it more distributed and you have to tackle those problems as well. So I mean, I think it's great. There's just new problems and things are changing a lot. But it's a very fun space to work in as well.
Demetrios [00:26:47]: Yeah. Well, now talk to me about celery. What is celery?
Alexa Griffith [00:26:50]: That Celery. Yeah, it's been a while. That was my first open source contribution. So it happened because there was a little silly bug that we needed to fix, but it happened from working on Airflow. I remember they didn't allow. It was something like they did. It was. I think Celery is.
Alexa Griffith [00:27:09]: I don't know, I don't know if I might be saying the wrong thing, but basically it's a part, a component that Airflow uses and there was. I was trying to set up a connection, like a private connection with Google Cloud and it was something that just wasn't supported yet fully. And they just said all these options don't allow, like options A through C don't allow. And like my options started with the letter C or something. So I got my first open source contribution by saying allow this, you know, which, I mean that's how it is. But it was a cool feeling to have my first open source contribution. Now I'm involved in open source way more. But it's really cool to make an impact and have other people view your code from other companies.
Alexa Griffith [00:27:53]: That's really cool too. So that was my first attempt into that.
Demetrios [00:27:57]: Now that you've had so many issues, accepted or PRs merged in the open source realm and you've done a lot of work just like as a core contributor. Right. What are some things that you would tell yourself back then or as you were earlier on in the game, about contributing to open source?
Alexa Griffith [00:28:20]: I think what's really nice about all of the open source work I've done is that it had a business value, like a very clear business value, and that makes it a lot easier. And I think I got a bit lucky with that because like going to Bloomberg, my goal wasn't to work on open source, but it's a great part of it, you know, it's amazing that we can do that and we use it every day here and everything we're doing, it's features that we actually need, you know. So I think that that makes it. I say if you. I'd say if you really want to work on open source, I love learning on the job. I think hobbies and side projects are great for the fact of learning. But when you learn on the job and you're using it in production and you're deploying it, I mean that's the best way to do it if you can. But I think if you really just want to get involved in open source and that's a goal of yours and you can't find a job that does that now or you, for some reason you, you won't.
Alexa Griffith [00:29:15]: Yeah, I think just, you know, most communities are super accepting and there are a lot of labels for good first issues and if it's something you're really interested in, a good, a good first issue. And a lot of them also have community meetings, you know, once a month, bi weekly. So if you want to, you know, someone to help you on your pr, it's probably good to go to those and get to know people as well. But I mean, if you're in open source, you're also someone typically who is, you know, really a supporter of open source and everyone's coming together and contributing. So I've found that my experience with people in the open source community is very positive.
Demetrios [00:29:54]: Going back to what you said, when it comes to having a clear business value, what are ways that you've found you can demonstrate that or champion for that, to make it very clear to whoever you need to that this is the right problem to be working on, that this problem has business value, and that you should be spending your time on it.
Alexa Griffith [00:30:22]: Again, I feel a bit lucky in the space I'm in because usually it's pretty clear, like because Geni has come, because LLMs are so large, like, we need this stuff, you know, it's very apparent that we need it. So I think, you know, from a resource perspective, it's very apparent from a usability perspective, there's a lot of room for growth in this area. So I feel like that's been a little bit easier, but something I think about quite a lot that I've seen, especially if I think about my first job, because it was a smaller company and there were just a few principal and staff engineers, but they were like rock stars. I don't mean like in a, like a code or rock star, but they were just great. They're really good and they dug in and they're really good at finding business value problems. I think that is a key skill and I thought a lot about some of it is lot. Sometimes you fall. Like, I think for example, in my first company, I fell into the best team I had, the best mentor that I could have possibly gotten.
Alexa Griffith [00:31:21]: I mean, I feel extremely lucky and grateful for the opportunity that I was given because I didn't, I didn't really know coding that well. Like I, I did some Python stuff in school, but I, I got that experience and because they're so willing to help me and because they were so good at finding business value problems and then giving it to me because to help me grow. I mean, it made like a world of difference. I, I think sometimes if I would have been on another team or at a different place, would it have been the same? But I think seeing them work, it made me think a lot about this topic. And I don't know exactly what the answer is, but I would like to.
Demetrios [00:31:59]: Well, the, the idea of like being able to sniff out key business value problems and having a good eye or nose for that is so enlightening, as you say it, it's like, yeah, that is a skill right there.
Alexa Griffith [00:32:16]: Yeah. And I do think something I try to be clear on when we're designing something, what are the requirements? And if you don't know, then you should find out to make sure it's worth your time. I think what I have tried to do, and again, I still would like to get better at this goal, of course, but what I've tried to do is at least know, try to figure out what's not a good business value. You know, there, I think there are some clear signs that maybe it's not. So how many people are going to use this versus, like, how many people? Is this generating any money for us or what purpose does it serve? I mean, depends on what the product is. But like, do you need that new button? Have we asked them? Is it actually useful? Is that something they would really use? I think getting customer or client or user feedback is super helpful as well, because I think the worst thing you can do is make something that no one cares about. Yeah, it's a waste of time, it's a waste of money, and it's super demotivating.
Demetrios [00:33:12]: Yeah. And then you go and try and like justify what you've been doing for the past three or six months.
Alexa Griffith [00:33:18]: Yeah, yeah, yeah. And the worst thing too is also like working really hard on something no one cares about as well. So I think knowing where to manage your time and your energy is super important and just making sure as much as you can that it's gonna be something really useful.
Demetrios [00:33:35]: I also wonder because there are certain things that maybe, you know, they're very important, but also if you don't properly evangelize them with your stakeholders, then it could still fall flat on your face. We had this guy Stefano on here like a few months ago and he talked about how he built the most incredible ML platform ever with everything that you could ever want. All of the top features that he read in every blog, you know, and then he said, we released it to Crickets. None of the data scientists wanted to use it because we didn't properly go out there and evangelize how good it was for them to start using it. So they all kept just doing their own thing in their own ways and not coming on the platform that he spent all this time making absolutely incredible, you know, and having all the, the features that you would want and you would think are almost like table stakes or everybody must need them. And he fell flat on his face even after doing that.
Alexa Griffith [00:34:46]: Yeah. And I think another quote I really like is, and this is, was the motto of the engineering, engineering department at the time when I was there in my teams was as simple as possible, as powerful as necessary. So I think one thing to speak to that is to get something out and then iterate on it so that you know, it doesn't have to be perfect. And not saying that this was but just one point that that made me think of is that it doesn't need to be so perfect. Just get something out that's an MVP and be able to start iterating on it and get people using it and get feedback. Because I do think a lot of times we think we know best about what people want, but sometimes it's just not true, you know, especially as engineers making a platform for another engineer, I think we, we think even more like oh, we know what they would want but sometimes it's just not true, you know. And like, like you said, like sometimes people need to adopt it or like have time or maybe, maybe it wasn't exactly what they're hoping. But another point to that that I think you bring up a lot too is that not only did I have and still have really great mentors, but also having someone who champions you and makes sure that your work is out there and known and publicized.
Alexa Griffith [00:35:55]: I mean I do a lot of, you know, self publish publicization as probably like you do as well. You know, like we're very public people, we're posting a lot and like does that help me? Yeah, I think it does a lot actually. I mean, I think, I hate to say the loudest person in the room gets heard. I mean that's not always true, but you need to, your name needs to be known or like the project needs to be known and put out there. Like I'm sure there's some like amazing apps, but if they don't have any marketing, they maybe they don't have any users. So you don't have any users. You can have the best app in the world, but what does it really matter, you know?
Demetrios [00:36:30]: Yeah, like first time founders focus on product and second time founders focus on go to market, basically.
Alexa Griffith [00:36:39]: Yeah. So I mean, yeah, it's super, super important. The whole marketing part of it, marketing yourself is super important as well. Like I, for example, I create a. A brag document for myself. I got this idea. I'm forgetting the, the tech influencer's name. She was big on Twitter a while ago, but she created these zines, like these comic book magazines.
Alexa Griffith [00:37:03]: And it's an idea from her. I have to look up who she is. But to write a brag doc about yourself. So every quarter I do it. Every quarter I have certain. I have certain, like, I have certain sections about like presentations, like general work. And I summarize all my work every quarter and then I give it to my manager or whoever's doing my review. And the feedback I've gotten is that.
Alexa Griffith [00:37:24]: That's also super, super helpful. Just on the topic of evangelizing yourself and your products as well, I think it's. It's really important in tech.
Demetrios [00:37:32]: I need to do that just so that on the days that I feel down, I can open that up and be like, look, I'm. I'm not as much of a failure as I'm making myself out to be.
Alexa Griffith [00:37:43]: Yours would be pretty awesome too. You do a lot of stuff. So, yeah, it's good to remind. It's good to keep track, you know, and then it makes writing a. Because sometimes you forget what did I do? But having having a record of it is super nice.
Demetrios [00:37:55]: Yeah, it's. I think there is a part of human nature that you get into these slumps and having something like that and going through something like that is very useful. I heard even a story of like John Lennon and the Beatles were in the recording studio and Lennon was down and was. Didn't want to record anything. But this was after a whole, I think, like, week bender. But as it happens, after a week bender, you're kind of depleted and he's not wanting to record anything. And then people started playing him some songs back or reading him lyrics back that he wrote and he got like, oh, yeah, I wrote that. Oh yeah, maybe I do have.
Demetrios [00:38:37]: Maybe I am an okay songwriter, you know, so even the best of us, it can happen to us. And having something like this is quite useful. It's like just to get your spirits high again and then get you back on track.
Alexa Griffith [00:38:51]: Yeah, for sure.
Demetrios [00:38:52]: But going back to this motto of what was it? Only as powerful as necessary and as simple as possible.
Alexa Griffith [00:38:59]: Yeah, as simple as possible, as powerful, as powerful as necessary.
Demetrios [00:39:03]: That is Such a great motto to like live by because it's so easy to over engineer stuff, it's so easy to grab the shiniest tools and before you know it, you're way in over your head.
Alexa Griffith [00:39:18]: Yeah.
Demetrios [00:39:18]: And you're like, oh, maybe this is going to take a little longer than expected because the scope has creeped.
Alexa Griffith [00:39:25]: Exactly. Yeah. Yeah, that's super difficult. But at least starting with that principle, I think helps a lot. Like I said, get your requirements, understand what you need, and then try to grow from there instead of trying to solve this huge problem in the beginning. I think that's applicable to a few different scenarios.
Demetrios [00:39:44]: Yeah. And I'm thinking about the requirements and also how you were mentioning, like, things that are red flags when you're starting a project that can be almost like early warning signs that this might not have as much business value as you think. And. And I'm wondering if you have any more. There's like, it does the team that's creating it, is it bigger than the actual people that will benefit from it? That is a huge red flag. Right?
Alexa Griffith [00:40:12]: Yeah. So that's also like, how many users are wanting this feature? Like, how many people have we asked that? How many? Who. Where did it come from? Who actually wants it? Like, what's the cost of running it? And that's what I also mean by requirements. Just like, what it. What's the criteria for, like, why do we need this? What purpose does it serve? So, yeah, I think if one person says, oh, hey, it'll be really nice to have this. I think I have some questions about if it's worth my time, but if it. Sometimes it's obvious that it is worth your time. You know, sometimes obvious that we definitely need model caching.
Alexa Griffith [00:40:46]: That's obvious to everyone, you know, but other times, do we need this button? I don't know. Maybe, maybe not.
Demetrios [00:40:53]: Yeah. Yeah. I, I wonder about like anything related to GPUs and like saving GPU time or saturating the GPUs. More like getting the most out of the GPUs. That feels like something that is. Yeah, we should go and work on it. But again, you should come at it from this. I almost like a discerning eye to be able to ask questions and recognize from what you're getting if it's actually as valuable as folks are making it out to be.
Alexa Griffith [00:41:29]: Yeah, exactly. So. And there's a lot of different ways to do things. Like, for a good example of this in Caserv, we started adding some of the OpenAI protocols, like chat Chat completion. There's more, but we just started with that and we started supporting a few different type of tasks like auto routing, what kind of tasks you want and then for the GPU thing there are a few different ways that you can manage GPUs working together and I know that right now they're working on a solution that uses two of those because those are maybe the two most popular, the two that are most needed. So I think that approach is also pretty good. You don't need to do everything at once. At least just get something that people can start using.
Alexa Griffith [00:42:06]: And then I really like using chat, the chat endpoint here. Now I would really like to use this one as well and I think that's, that's super helpful.
Demetrios [00:42:14]: How much are you going to the non technical stakeholders that are on the business side of the house and having conversations with them about what they may want since like with the advent of LLMs it opens the aperture of who's using the AI solutions.
Alexa Griffith [00:42:35]: Yeah, I, I mean that's a good question because I feel like I'm pretty low on the stack. As a platform engineer, we just discuss a lot with the AI ML engineers, but we do have product people that we talk with regularly and I know they are really good about getting all of that different feedback and kind of funneling it to us in a way to create products and features. So for us I believe that's the way it works. At Bloomberg it's really cool because even a lot of the management and product people are super technical. They grow a lot in Bloomberg. Bloomberg is really great at keeping people. I know a lot of, a lot of tech, in tech, a lot people move around quickly. It's just so common.
Alexa Griffith [00:43:15]: Right. What I'm super impressed with at Bloomberg is there are a ton of people who've been here for 10 plus years, you know, so they've moved up and around in Bloomberg. But yeah, so that's one interesting part about Bloomberg. But as far as how we get it, it's usually through product which I do like that, you know, streamline way of doing things. But I do agree and there's like a whole philosophy around this about going to the users and asking them what they want. And I think that's a really helpful and I think that's the way to go. Like you should always be doing that. You know, if you're not doing that, it's kind of worrisome I think.
Demetrios [00:43:50]: Yeah, again, red flags. That is something you might want to look out for because you're, you're Potentially not finding the highest levers or the, the biggest levers that you can pull.
Alexa Griffith [00:44:02]: Yeah. I think if you can also creating a personal, like a personal relationship with say the AI or ML engineers that are using our platform to go sit with them and just be like, hey, if you have someone that you consider like a super user that's really using it a lot, always in the support chat, like, show me how you're using it. And then maybe they'll be like, oh, yeah, by the way, I don't know what this button is, or oh yeah, by the way, I always have to do this every time. Maybe you would pick up on things. I think that is also a really good way to figure out what are people's patterns in using your tool and what are the pain points. Because they might not say it in a survey and they might forget it, or they might not say it in the chat when they're doing something, but by sitting with them and observing with them, you could figure it out.
Demetrios [00:44:42]: Yeah, I am just probably, I would say like 90 of the tools that I use, I am never like blown away by the experience. But that doesn't mean that I go and I explain what could be better about the tool to the people that are creating that tool. And I am assuming that that's like the majority of folks. Right. Because it takes so long to explain like why this is a pain and how big of a pain is it for me. And so then usually I'm like, yeah, whatever, I'll. I'll do something else with my time.
Alexa Griffith [00:45:19]: Yeah, you can pick and choose your battles.
Demetrios [00:45:22]: Yeah. And so you going and sitting with someone, you almost like force them to show you what is painful. Because if, if I had somebody that was right next to me and watching me do my thing, I'm sure I would be very vocal.
Alexa Griffith [00:45:34]: Yeah, same. Yeah, kind of. It's a bit more intimate in the way, like it makes you feel more comfortable, you know, than just entering something on a survey. It takes more time, but it's more personable.
Demetrios [00:45:48]: The higher. Yeah, higher leverage, 100%. The, the other thing that I was going to ask you about is when you look at the stack and you think about the traditional ML use cases and almost like tabular data, low latency, that type of stuff that you are creating a platform for or you're helping folks serve those use cases versus the new LLM AI world that you're creating a platform for or helping usher in those use cases, where do you see them diverging and where, like we talked about the Envoy, AI gateway. And that's specifically for an LLM use case. Right. How do you see those two worlds playing together in a platform? Like does the platform support all of it and almost have like this sprawl. Does the platform have pillars that depending on what the use case is, you kind of plug into the platform over here or over there. Like how do, how do you visualize that or look at it?
Alexa Griffith [00:47:02]: So a few things. I think we're trying to make the platform as easy to use as possible and at the end of the day you're just deploying a model somewhere and we try to make sure the user doesn't have to care too much. Like I was saying, trying to abstract away a lot of those Kubernetes concepts as much as we can. Try to make sure the user doesn't care too much about what or where. Like why should they care if it's on prem or in a cloud? It's just running where there's resources, you know. So for AI models across hybrid environments, we try to create a platform that's very seamless. You don't really know or care where it's running. And that's the whole point of serverless and like being able to run hybrid cloud.
Alexa Griffith [00:47:45]: Just be able to use resources for different things. As far as different products running on Kubernetes, I would say training would be like a different tab in our ui, you know. And where does it differ? I think it differs in inference. Services are long running jobs and things like training jobs usually are not. It depends on what you're training and what you're doing. So the way these jobs run, you're still deploying something to Kubernetes though. So the basics are still the same. I think some of the smaller features are different.
Alexa Griffith [00:48:14]: Like we have a model registry so where you can save your models, pull versions of them just by a click of a button and have different change different configs slightly have different artifacts. So I think that's something that is needed in inference that maybe isn't as needed in training or some different features like that. That's kind of where I see them diverging. Did that answer your question?
Demetrios [00:48:36]: Yeah, and I like this idea of like the things that are the same are that whether you have a gigantic model or you have a very small pruned model or a fine tuned model, or a distilled model or a random forest model, it's a model and it's going to be sitting on some kind of Kubernetes service. Like you just want the model, you don't really need to think about how much traffic can that model handle, what the actual, like where it sits, which cloud is it on, is it on Prem all of that stuff. I really like the idea of, like, let's abstract away everything that we can so that if you have a data scientist that is putting a model out there, they know they can just get that model out there and their job is to create the best model and then the rest is taken care of for them.
Alexa Griffith [00:49:35]: Yeah, exactly. And that's why the Unified API is so nice too, because we give you an endpoint and the endpoint will look very similar no matter where you're running it. And what you will use to predict or chat will be so similar in the same format no matter where it's running. And so that's a really good plus of these tools, I think.
Demetrios [00:49:58]: Do you distinguish at all between like the models that are for LLM worlds? You can distill models and. Or maybe you prune them or you do things to make them smaller, or maybe you have an ensemble of models and you have the gateway and whatnot. And so there's things that you potentially want to do to LLMs that you wouldn't necessarily do to traditional ML models. Like if you're training those models, like you said, and you have the model registry and there. And maybe there's feature engineering happening in the traditional ML world. So do you distinguish there too? Or is it just like you were mentioning where these are different tabs on the platform depending on your use case.
Alexa Griffith [00:50:45]: So for that, so what your infra service is running, is it running this model or that model? That's just a config or one little part of the small YAML that you're putting into the inference service. And then the kserv will make a lot of assumptions underneath the hood, so you don't have to write all these big things about. They're specific to that model, how it runs, even like what's the port that gets the Metrics? We know TensorFlow is on this port and another model's on this port, so we'll automatically open that port or be able to pull from it if you're getting metrics. So a lot of things like that, it's just you need to tell us the keyword or key config one thing and then we do the rest. But that's in the YAML itself, so that's within the tab, the inference product itself.