Sign in or Join the community to continue

The GPU Uptime Battle

Posted Nov 11, 2025 | Views 63

# GPU Clusters

# Production-grade AI Systems

# VAST Data

Share

Speakers

Andy Pernsteiner

Field CTO @ VAST Data

Andrew Pernsteiner works as a Field Chief Technology Officer at VAST Data, which is a Software company with an estimated 700 employees, and was founded in 2016. They are part of the Executive team within the C-Suite Department, and their management level is C-Level. Andrew is currently based in Portland, United States. They used to work at MapR and Dell EMC XtremIO and have used the following emails: @vastdata.com, @igneous.io, @mapr.com.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Most AI projects don’t fail because of bad models; they fail because of bad data plumbing. Andy Pernsteiner joins the podcast to talk about what it actually takes to build production-grade AI systems that aren’t held together by brittle ETL scripts and data copies. He unpacks why unifying data - rather than moving it - is key to real-time, secure inference, and how event-driven, Kubernetes-native pipelines are reshaping the way developers build AI applications. It’s a conversation about cutting out the complexity, keeping data live, and building systems smart enough to keep up with your models.

+ Read More

TRANSCRIPT

Andy Pernsteiner [00:00:00]: A lot of times people start on a laptop and then the farthest they get before they go and ask infrastructure people for help is they put it on a desktop machine underneath their desk. That's as hard as they get. And then they're like, but how come I can buy this for. At best buy for $500 and you're telling me it's going to take $10,000 out of my budget to manage to support this?

Demetrios Brinkmann [00:00:18]: Yeah.

Andy Pernsteiner [00:00:21]: Because the average person now has a sense on what AI could possibly do. The expectations are so much higher now.

Demetrios Brinkmann [00:00:30]: Yeah.

Andy Pernsteiner [00:00:31]: Why does it take so long to do this thing that if I would have asked you three years ago, it would have taken months. I expect it now immediately look at what's possible with generative AI. Now, just because everyone with a phone in their pocket can experience it, an average person can experience it. Right. So it's a. It sets the bar a lot higher for how people have to. Or I'm sorry, to. To how the, the people who are offering services to the higher layers and the app developers now, they' to be able to vibe code something in a weekend.

Andy Pernsteiner [00:01:03]: What might have taken them weeks and weeks and weeks just to prototype and get to a place where they can show somebody something. The expectation now is that you could do it overnight and iterate on it quickly enough that you could have it to where it needs to be in a very short time span. In some cases it's true. But you know, even, even just pet projects that I work on, I might get it to a demo stage in a weekend. But there's so many edge cases, corner cases, data is very messy. And I think that's like one of the challenges that we see people struggling with in like the space of customers and clients that my organization deals with. A lot of times they move a lot more slowly. We have actually a couple different breeds, I would say, of client.

Andy Pernsteiner [00:01:50]: There's some who are sort of leading edge. And a lot of times the model builders are thought to be leading edge. They are in some ways. But, um, but they also have to manage billions of dollars of infrastructure and there aren't that many people on the planet who know how to do that. And most of them are older now. Like, it's not like the, the kids coming out of college are not figuring out how to go and manage a ten thousand or a hundred thousand GPU cluster.

Demetrios Brinkmann [00:02:13]: It's a lot of responsibility. You can mess up really bad.

Andy Pernsteiner [00:02:18]: Well, and mess ups are so expensive. Like we, there's one, one of the people that we talk with has a. They manage a large GPU based farm and they measure everything in terms of minutes of GPU downtime and it's always millions of minutes because they have lots of GPUs. So if they have an issue, whether it's during a training run or whether they're sort of doing pre training processing or doing anything, if there's a hiccup, like if somebody fat fingered a file and they deployed it and it takes everything down for five minutes, 10 minutes, you multiply that by the number of GPUs and that's the number of GPU minutes. And there's a actual cost associated with that.

Demetrios Brinkmann [00:03:00]: Very big cost. If you're talking hundreds of thousand GPUs, you're giving me anxiety just by thinking about that.

Andy Pernsteiner [00:03:05]: Then you multiply it by depending on the cards you're buying, it's 30 grand a card and you pack a bunch of those into a machine and now all of a sudden it's like, oh God.

Demetrios Brinkmann [00:03:14]: But I want to go back to something that you mentioned where it's like the expectation of me being able to vibe code something over a weekend and then I can show that to someone. It's a little bit paralleled to what you were talking about earlier with me being able to develop something locally. And maybe it is in the predictive ML days, it's me developing a model locally and then going and bringing it to the team and saying, let's deploy this. And the platform team is sitting there like, wait, what? And the person who developed it, it's like, yeah, this cost me, you know, a few hours and a couple hundred bucks on my laptop.

Andy Pernsteiner [00:03:57]: Yeah.

Demetrios Brinkmann [00:03:58]: Why are you telling me it's about to be ten grand if I need to deploy it?

Andy Pernsteiner [00:04:01]: Yeah, I think the realization it's actually not so dissimilar from researchers who were sort of building scientific experiments and coming up with hypotheses in a small setting with a small amount of data. I think when you expand the amount of data, it's not just a matter of having to process more, it's that the amount of variety that's in that data is going to get larger and larger and it isn't so easy to manage. Well, the other one that oftentimes surprises me is people don't think about parallelism when they start. They're not thinking about how is this going to scale in terms of the processing that I'm doing. Can I shard it across multiple instances of database, multiple instances of storage, multiple instances of gpu? Am I thinking about how much memory I have to move back and forth between machines because ultimately. So our engineering team builds parallel systems. But it's funny, every time we get a new client, the first thing that happens is they complain about how slow it is because if you buy something new, that's, that's what you do. Like, I wanted it to be faster.

Andy Pernsteiner [00:05:16]: And usually it's because what you're doing is a little different or you're not used to how to use it or whatever. But if we find issues on our side, it's oftentimes because there's a bottleneck in either a piece of code or how we're sharding something. We thought, oh, we never thought about sharding it in this way, at this granularity because it's hard to, it's hard to think in terms of parallel programming. It's hard to implement something that can scale as the programming scales. But oftentimes the, I don't know if it's the platform services team or the data architecture team. At a lot of these research institutions, they'll take what the scientists like. Part of their job, besides sort of managing infrastructure, making sure that they have services available, is to help the end user, developers, scientists, researchers, whoever it is, to help them understand, okay, how do I get the most out of what I'm using? Because if you run a job, if you have 10,000 GPUs at your disposal, but everything you're running ends up bottlenecking on one, then what's the point?

Demetrios Brinkmann [00:06:18]: Yeah, why have 10,000?

Andy Pernsteiner [00:06:20]: So that's like a, and I don't think you see that until you have scale. Like if I, if I can fit it on my laptop or even on a few relatively large servers. I'm not, you know, I'm not spending my energy thinking about latency in that way. I'm thinking about either time to first token or time to something on a very much. On a much smaller scale. Maybe I'm not thinking about how many thousand users I can support simultaneously. Maybe I'm not thinking about sort of how I'm routing. If you're building something with agents, how you're routing things to make sure that you're spreading your load appropriately.

Andy Pernsteiner [00:06:55]: And, and then we have to start thinking about costs. Like I think a lot about model routing in terms of choosing the right cost based model associated with, you know, what, what needs to be done. And that's, I mean, because at first it's, it's kind of an experiment, like a thought experiment. Let's think about it. Oh, let's answer the simple questions with a cheap model. And if it, if it doesn't pass a test or if the re ranker said it's not good enough waterfall it, pass it somewhere else. And so you think about that sort of hypothetically, but then when it actually starts happening, the, the and, and this is where, when people deploy in, let's say, you know, a cloud provider where they're renting by the hour, that's where you start noticing the cost escalations happen in very unpredictable ways because now you're using these more expensive resources much more than you thought you were.

Demetrios Brinkmann [00:07:46]: So it's, it's, it's like the unknown unknown of technical debt. Yeah, you don't recognize it yet because you haven't even realized that you're going to hit this crossroads at some point when you get to a certain scale that you now need to be thinking about these things, but you're so oblivious to it that you're like, yeah, okay, we're just going to do it and bring, you may know or have an inkling that you're bringing on technical debt. But then when you really go to deploy at scale, your, your platform team comes to you and says, you know, maybe we should be sharding this.

Andy Pernsteiner [00:08:24]: Yeah, yeah, yeah, technical debt's an interesting one. Like I'd love to say that we haven't accumulated any. Can't even say that with a half straight face. I think that, you know, what makes me excited about more development teams using AI for their day to day work isn't necessarily that they'll become more productive. It's that like the tasks that usually are really boring are ones that we could probably find a way for them not to have to do. Right. I can take a senior developer and instead of having them go and spend their time refactoring and going in and documenting their code, they can spend their time talking either to end users or to someone upstream to understand the requirements better. Because ultimately it doesn't matter if you're building agents or you're just building like a basic application, there's never a fit at first for the user community.

Andy Pernsteiner [00:09:16]: There's a fit for you and your mind. Right. Like, oh, this does exactly what I think it should do. But whether you're trying to sell something or whether you're trying to provide a service, the requirements gathering is such a huge part of it. And like I try to get my team and the rest of the people on sort of tangential teams to focus more energy up front on getting as many Requirements as possible. We have like a lot of engineers who want to go show off the thing they made. I'm like, that's great. You can show it off.

Andy Pernsteiner [00:09:44]: But if you're going to spend 30 or 45 minutes showing something off, why don't you shave off like 20 minutes of that and just sit there and say, what would you want it to do? But, but get in like the actual details. Don't, don't spend time up here because like, they'll just tell you what you want to hear. If you're up here, you have to get down to the level where you ask them what ifs. You think about worst case scenarios. You think about all these choices because once you hit the go button, if it doesn't work hard to bring it back.

Demetrios Brinkmann [00:10:12]: Yeah, you got a mom, test them. Have you heard that book or have you read that?

Andy Pernsteiner [00:10:18]: I've heard of it. I haven't read it. No.

Demetrios Brinkmann [00:10:20]: You got to give all your engineers that. That is the quintessential, like when you're building something, you want to ask in a way that isn't giving these leading questions like, hey, would it be cool if we had X, Y, Z?

Andy Pernsteiner [00:10:35]: Yeah.

Demetrios Brinkmann [00:10:36]: Instead it's like, have you ever felt any pain around doing this? Interesting. Okay. And then, and there's a signal there if they say that. And then a lot of people might say, yeah, I felt that pain. But they haven't. And so you can ask questions like, so have you googled how to fix that pain? And if they've googled about it and they've said you, like when you really hit a nerve is when people are like, oh, I'm in this, you know, online forum, I'm in this community that's all about that. Or I googled it and I found this or that, whatever it may be, there's a very clear signal when somebody really has a pain around something that hasn't been solved.

Andy Pernsteiner [00:11:12]: Yeah, that's a good point. I, I probably take, I probably do something similar. I just haven't thought it through in that, in that way. I have oftentimes a skeptic or pessimist side of me that like, I don't try to expose it in that way, but I never like it when I show something to someone and they like it.

Demetrios Brinkmann [00:11:32]: Yeah.

Andy Pernsteiner [00:11:32]: I want them to say what's wrong with it.

Demetrios Brinkmann [00:11:34]: It's so hard though, because people, you like, when we're looking at it, for someone to be able to be honest to you like that. It's almost like I feel like I'm Gonna hurt your feelings if I tell you the truth. Right?

Andy Pernsteiner [00:11:48]: Well, I think that's a vulnerability piece there. Like, I feel like if. If we ourselves become vulnerable in someone's presence, it. It's not like it gives them permission to become vulnerable, but it gives them the sense that they could probably also start sharing more about what. What they actually feel.

Demetrios Brinkmann [00:12:04]: This is a safe space. We can.

Andy Pernsteiner [00:12:06]: Yeah.

Demetrios Brinkmann [00:12:07]: Let you. Yeah, I. I could see that. I do think you inadvertently are putting a lot of pressure on someone to give you that if you show them what you built, because they know that you've put in time and hours. And so it's almost like, yeah, cool, this is great.

Andy Pernsteiner [00:12:23]: I get it. Yeah. Yeah.

Demetrios Brinkmann [00:12:25]: And so that I don't want to tear down what you've built. Usually the way that it manifests in me is I'll say something like, oh, yeah, and could you do this or could you do that? What about if it was like this? And yeah.

Andy Pernsteiner [00:12:41]: It'S fun. I actually enjoy having conversations with people that have different personalities because it helps me understand a bit more. Because ultimately, people who are building applications, agents, anything that's being built, ultimately it's because somebody wants it and somebody is still a person so far. Right. And so because of that, there's not one kind of person. Uh, we're not all machines.

Demetrios Brinkmann [00:13:09]: Yeah. Or I was gonna say, you can just have only people giving you feedback that are Dutch. They'll give you the straight and narrow. They don't care about it. They're very direct on their feedback, and I think you'll be fine. Then if you can get a lot.

Andy Pernsteiner [00:13:24]: Of, to know Dutch feedback, go to Holland and ask people what. What they think.

Demetrios Brinkmann [00:13:29]: Yeah. Let me give you this quick demo. But I also want to touch on the idea that you glossed over a little bit of the boring stuff and how that's really where a lot of value can be brought, because nobody likes doing the boring stuff.

Andy Pernsteiner [00:13:48]: I also think it's boring because it's kind of hard. I even think about in personal life, the stuff that I set aside because I say, oh, it's boring, or it's not interesting, or it doesn't solve a big enough problem for me right now. It's this kind of. You could think of it even as technical debt in some way. And I think, like, if you're working on a problem or if you have a new idea and you want to create something, at least for me, I like to think about, okay, well, what do I. What do I want to make? And I'm going To focus all of my energy around what I want. I'm not going to. And if something gets in my way a little bit along the way, I'm going to just find a way to move around it and keep moving along until I get to the place that I want to go.

Andy Pernsteiner [00:14:32]: But that doesn't really help when it's time for other people to use it or other people to interface with it, because all of those things that I went through, they're probably going to have to go through it too. And so when I think about the boring stuff, I think about like you start on your laptop building something and then it's time to go deploy it for real. So then your platform team has to go and think through all the things that you didn't want to think about. You didn't want to think about any of those things. I don't want to think about sharding or scaling. I don't want to think about the server crashes. When does that happen? That happens, right? Like, especially if you come from a mindset where you've been deploying into the cloud, unless you're cheap and you're only using spot instances and you had to think about that, right? If you were deploying somewhere where all that stuff was taken care of for you at a small scale, then when you want to go and do it for real, someone has to think through those things. What happens when a server fails? What happens when a link fails between two different locations? What happens if your user has a slow connection and they don't, you can't send back to them quickly enough? Have you thought about timeouts? Have you thought about all of those boring things that end up like killing an application and killing a user experience?

Demetrios Brinkmann [00:15:41]: Have you done much work on like Chaos Engineering?

Andy Pernsteiner [00:15:46]: So you know, you're like the third person in a week. You brought that up in a way.

Demetrios Brinkmann [00:15:50]: Because it sounds very similar. This idea of how can we think about the ways that it fails. Well, let's just like Chaos Engine.

Andy Pernsteiner [00:15:57]: So yeah, there was a client that we're talking with who wants to go deploy large scale systems and they asked me this question. If we do that regularly, and we do, but not on purpose, we have to build large systems to test, right? I don't have millions of GPUs, but I have lots of stuff that can simulate it. I don't have millions of servers, but I can have potentially hundreds or tens of thousands of servers for a period of time to go and do things with. And because hardware, because data center space, because bandwidth, because all of the things that are physical are finite. We end up having to shuffle things around a lot because one team wants to go and build an app and we say, well, you're going to have to wait until Monday at midnight UK time, and then you have access for four days. Then you got to give it back. And that process of turning it over and giving it back means that I'm going to call you up and say, okay, it's been four days. You're like, oh, I'm almost done.

Andy Pernsteiner [00:17:02]: Now we're taking it. And so then we have to shuffle everything around. And then everybody's scrambling to figure out, how do I make sure that, like, think about it in the terms of, like, checkpoint restores. Right. How do I make sure everything that I'm doing is ready for that last moment notice where I wasn't planning on it?

Demetrios Brinkmann [00:17:21]: Yeah, you want to squeeze as much juice as possible out of it.

Andy Pernsteiner [00:17:24]: Yeah. And I think so. So Chaos Engineering, there are QA teams within our organizations who do things like that. We have some people, we can call them black box testers, but they're people who are less familiar with how our systems work that do dumb things, and that's just their job.

Demetrios Brinkmann [00:17:41]: They are the chaos monkeys, Right.

Andy Pernsteiner [00:17:43]: They're like, well, how come I can't do that? Well, because you just don't do that. No one would ever do that. But I did it, right?

Demetrios Brinkmann [00:17:48]: What a great job.

Andy Pernsteiner [00:17:51]: Yeah. But then they gotta file the bug reports. Like, what did you do? Oh, I don't remember what I did. I don't write it all down. Like, I, I. And the thing is, like, I, I'm, I'm so, I'm so resistant to, like, documenting bugs fully. I just like to write. I, I, I clicked this button, it did this thing.

Andy Pernsteiner [00:18:06]: I didn't like it, or it, it failed, or give me the stupid error and. Okay, so where are the logs? What's your reproduce reproduction step? How many times you reproduce it? I'm like, oh, come on, guys. Can't you just, like, can't you do that out? Can't you figure that out?

Demetrios Brinkmann [00:18:19]: It's like a perfect use of AI.

Andy Pernsteiner [00:18:21]: It actually is. And I think at first a lot of the development community on our side was a little resistant. And there's probably more than one reason for that. Some of them are a little bit older, and so there was like a trust issue. Although it's funny. So I have 12 or 13 people on my team, and probably if they end up watching this, they're going to get mad at me. Or whatever, but. Because I forgot how many there were.

Andy Pernsteiner [00:18:47]: Yeah, but, but there are some who are super forward looking in terms of using AI for as much as they possibly can all the time. And they're super eager and they're out there with it. Sometimes I have to look at them with a skeptical eye and say, maybe you're leaning a little too heavy. It's okay. Like, I'm not going to stop you, but I'm just letting you know that there is like a reality that we're also trying to work on. And then there's others who are so skeptical that I asked them. I, I have a thing. I'm like kind of a jerk of a manager sometimes.

Andy Pernsteiner [00:19:16]: I'm like, every time I have a one on one I'm like, so show me something new you did with AI this week that you didn't think of doing last. Right? Whatever it is, I don't even care. It could be personal. Like I used, I took a picture of something and asked ChatGPT to figure out what it was. Right. Whatever. It doesn't even matter what it is. And like I think because I'm just trying to get people not necessarily to use it all in for everything that they do, but just to think about it outside the context of what they hear about.

Andy Pernsteiner [00:19:42]: Just, just every time I have some kind of thing that I don't feel like doing, I'm like, let me just try this. Right. But I do think it does take dedicated time because if you just try stuff and it doesn't work the first time, that doesn't mean that it's bad. Like in my opinion, it means you just didn't construct what you were doing and develop your requirements well enough before you asked to do something.

Demetrios Brinkmann [00:20:04]: I actually gave a talk on this recently and how that is such a bad ux when we say see that something doesn't work, but we have no feedback on. Was it because our prompt wasn't good enough or is it because it's actually not possible?

Andy Pernsteiner [00:20:18]: Yeah.

Demetrios Brinkmann [00:20:18]: And we don't have any clarity on that.

Andy Pernsteiner [00:20:22]: I had, I, I flew over here from Portland a couple days ago and they didn't, they, they, they decided not to have wi fi on the plane. They didn't tell me until after the plane took off. So I couldn't text anybody and let them know. And I like, sometimes I like to just zone out, not work on the plane. But sometimes I kind of need to. Yeah, right. But what I thought was funny. I don't know what your opinion on this is.

Andy Pernsteiner [00:20:43]: If I have Internet access. I see you have slack on your screen. Slack is probably where I spend, like, 80% of my time. Now. If I don't have Internet for the first bit of time, I get a little bit nervous and frustrated. But then I spent the rest of the plane besides, I watched some silly movie or whatever. But then the rest of the time I spent just really thinking through something that, like, we as a team need to get done and creating a prompt to try and create a flow to do it. And if I would have had Internet access when I was trying to do it, I probably would have gotten really lazy about it.

Andy Pernsteiner [00:21:16]: Yeah, right. I wouldn't have thought all the way through as much as I possibly could because I'd have instant access to, like, you know, one thing or another. And I'd say, oh, I'll just ask an alarm to tell me what I should do. Right.

Demetrios Brinkmann [00:21:27]: 100%. And actually, that was one of the things. One of the product things that Ben Young, who I interviewed from Sourcecraft, said that they took a stance on with their coding agent was that when you hit Enter, it doesn't submit it. Because he was like, we want to subtly tell people to, like, keep going on their prompt. And so you have to do the shift enter for it to actually get submitted.

Andy Pernsteiner [00:21:56]: That's interesting because there are oftentimes where I'll hit it, hit the enter button, whether it's like in cursor, whether it's even just using a. Like a chat bot, and I immediately go, oh, man, maybe I should have changed what I said. Especially then I'm too lazy to hit the stop button sometimes.

Demetrios Brinkmann [00:22:13]: Not every time. Yeah, I. I am the same way. It's like, well, let's just. Maybe it works.

Andy Pernsteiner [00:22:17]: Maybe it's close enough.

Demetrios Brinkmann [00:22:18]: Yeah, maybe I don't have to try that hard.

Andy Pernsteiner [00:22:20]: It's.

Demetrios Brinkmann [00:22:20]: Yeah, that's a funny one on how much you got to work at it. But again, like, I think there's another thing with the. What did I want to say around the boring data.

Andy Pernsteiner [00:22:35]: Yeah.

Demetrios Brinkmann [00:22:35]: And how. And the messy data. And how. I think before we were talking about how you really can't understand until you've lived it, the messiness of data and enterprises and just big companies.

Andy Pernsteiner [00:22:54]: There's whole industries built around cleaning data, but then ultimately cleaning data just puts it into a form and some person's deciding what that form is. And that doesn't mean that it's clean for the next person who comes along. Right. And now a lot of times we're telling people we're being trained. I Guess you could say to keep as much as possible because you don't know when you're going to need it next. And there's this feeling like if you don't have it and you like. Especially when I talk with organizations that aren't accustomed to being data driven organizations, ones who are in manufacturing or ones who are in retail or even in finance, where the product that they sold wasn't data. But now that there is an increasing realization that data itself has value to it, there's this increasing desire to keep more and more of it.

Andy Pernsteiner [00:23:52]: But it doesn't mean they know how to, how to store it effectively. It doesn't mean they know what formats to put it in. Again, there's this expectation level that, oh, if we dump it somewhere, there's going to be some magic easy way to just extract value out of it later. I don't have to think about it that much, right? Yeah.

Demetrios Brinkmann [00:24:10]: What a disaster.

Andy Pernsteiner [00:24:11]: Well, yeah, but I still have this hope in the back of my mind there's going to be this amazing thing that's going to come along. Somebody. The other day, the other day was probably like months ago. They, we were talking about how, you know, a lot of organizations can only extract value out of like this very small percentage of the data that they have, right? They could have, they could have documents, they could have hundreds of millions of documents going back decades and they're really only extracting value from like a very small percentage of it. And then, you know, the, the, the sort of counterpoint to saying, oh, well, they need to extract more is like, look like it's just like in his case he was saying drilling for oil, there's oil that you can drill for that's really easy. And so until oil gets so expensive that you're going to go spend the time and energy to go after the hard stuff, you're not going to bother. And even the model builders are kind of thinking in the same way because they think, well, you know, our techniques will get better over time and so why bother going too deep, right? A lot, a lot of the big models basically trained on common crawl data for, well, I mean they still do, but that was one of their primary sources of data for the first bit of time. And that's easy to get.

Andy Pernsteiner [00:25:24]: And depending on which sort of export of it that you used, it's relatively clean. Like the news feeds are relatively clean and deduplicated and all this other stuff. And so it's easier to deal with the easy stuff first and the hard stuff later. And I Think enterprise organizations. Like, if you think about, like when we talk about financial services companies, I'm not talking about the ones who are sort of leading edge in terms of how they're using AI or creating AI first practices. I'm thinking about the ones who've been a bank that's been around for 100 years. They have a challenge with regulation also, but extracting value out of that data, it's going to be a huge effort for them. But the thing is they have so much of it, and especially the retail banking world, they have so much about people's transactional history and their buying patterns.

Andy Pernsteiner [00:26:13]: They have something of value. They just, you know, a lot of work. It's a lot of work. And the techniques haven't necessarily caught up to the, to the difficulty that it takes. So I think it's about finding like a middle ground in terms of saying, you know, we want to get. We want to be able to understand our clients better. We want to be able to have a data product we could sell to somebody. But let's not just over.

Andy Pernsteiner [00:26:37]: Let's not overhype ourselves in thinking that we could take our, whatever hundred petabytes of data sitting in a cold archive somewhere and be able to offer some service on top of that. That's. That's ridiculous.

Demetrios Brinkmann [00:26:50]: It reminds me of the whole big data's dead movement. And I think it was the mother duck guys that. That made that big.

Andy Pernsteiner [00:26:59]: Right.

Demetrios Brinkmann [00:27:00]: Because it's like the majority of this stuff that we're using is not big data. We may have big data, but what we're using is very small data. And so how can we. I. You're building on that by saying, because small data is pretty good for what we need right now, and figuring out how to use that big data in a way that's going to be more valuable than what we're getting from the small data. That's easy and quick. Yeah, it's a lot of squeezing for the juice.

Andy Pernsteiner [00:27:33]: Yeah, I think there is. I went to like, I went and shifted sort of companies a while ago and switched over to one of the Hadoop distributions back when Hadoop was the big great thing. Right. And I think, I think that most of the people that I interacted with were DBAs who were trying to pad their resume with some new skill set. And there was a lot of hope. There's a lot of hope for all of these, this ecosystem of tools that we're going to basically change the way that data warehouses operated and change the way that organizations looked at data. I wouldn't say that it didn't live up at all to its promises. I think there was a lot of transformational change that happened, but it definitely wasn't the thing that solved data problems.

Andy Pernsteiner [00:28:18]: And that's why I don't really see Hadoop as it was as something that organizations are continuing to want to foster. They're looking at other ways of, of managing data. I think that like databricks, for example, did a really good job kind of coming in at the right time and coming up with a way to basically take the ideas behind the big data movement and then apply them in a more practical way to things that people are actually using. Right. And I'm it. You know, they, I, I interacted with them a lot back when they were, they were just a training organization. If you go back what like 10 years, they trained people on how to use Spark. And at the time I thought, how is that a business like they.

Andy Pernsteiner [00:28:58]: And I don't know, I don't know if you ever went to any of their sessions, but those were probably the, like, they, their high level 101 sessions were like Deep Scala and Java development as like the 101 intro to Spark. And so I'm like, okay, so there's a really high bar to getting into this. And I thought, okay, I don't know how this is a business. But as they got more and more developers in the camp of Spark, they realized that we can offer, well, okay, we probably already knew this all along, but they started offering a service and I think in a lot of ways they developed a really good ecosystem around it. We came at things from a slightly different angle, which was we started with problems that were hard to solve for the research community. When I say research, I think about a lot of the sort of government funded labs, I think about sort of institutions that their job is to basically process large amounts of data either for experimentation purposes or for drug discovery or for something. Right. And so they had challenges around processing large amounts of data.

Andy Pernsteiner [00:30:00]: So we didn't start by trying to create frameworks for them. We started by sort of being the backend lower layer. And then as we start to move up the stack more and more now we look for what the frameworks are like. I don't want to, I don't want to time myself completely to Spark because there will be transition there. I don't want to tie myself completely to SQL even because while that might have been the way that everybody interfaces with structured data, I think that it isn't necessarily the way that people will in the future. And so I want to make sure that the lower layers are solid enough that anything we put on top is going to be easy to do. But I also want to keep looking for what the next sort of way of looking at large scale data processing is. And my mind always goes to like, what are the parallel frameworks that are emerging?

Demetrios Brinkmann [00:30:52]: Right, well, it goes back to that. The platform engineer is now going to have to have the data scientist or the researcher as their customer.

Andy Pernsteiner [00:31:01]: Yeah.

Demetrios Brinkmann [00:31:02]: And they are going to come with requests and requirements continuously. And so the platform engineer is going to want to provision those requests and be able to offer them quickly to whoever their customer is. So how can you make sure that you have a plethora of different ways to offer whatever it is they need? Oh, all of a sudden it's not Pandas, it's Polar's. Cool, we got that too.

Andy Pernsteiner [00:31:29]: Yeah, yeah. And I think that, well, I think to me it feels like the, the skill sets of all the people who are focus, infrastructure, platform engineering, data services. It's kind of like the same analogy I made earlier where I want to look to developers to get closer to their end users to understand the requirements better. And the only way to do that is to make it so that the things that they used to spend most of their time on, they don't have to really spend their time on it. A platform engineer shouldn't have to figure out how to provision a Kubernetes cluster. It should just be like there should just be a playbook for that. They shouldn't have to figure out how to manage the networking components in between everything. Because really what they want to offer is a service.

Andy Pernsteiner [00:32:13]: They don't want to offer metal or containers or any of that. They want to offer a service and but to offer the right service they need to make sure that the layers that they are responsible for are either, I wouldn't say self managed, are managed in a way that makes it so they can spend more of their time interfacing with their customer. Whether that's a data scientist, whether that's a data engineer, it depends on where they sort of see themselves in that world. And it's a good time for everybody to think about. Well, you know, because I've dealt with like classic IT people for forever and a lot of times there's the grumpy people, right? The people who, they get a ticket and they just go, oh crap, this person wants something again. Now what do I have to do? But really to kind of get ourselves further in our careers or in like you know, what we're doing and fulfilling ourselves with. It's really more about figuring out, well, what could I do better for them? So that when I get a request from them, like, it's not that it's easy for me to fulfill, but I understand them well enough now. I understand what their mindset is.

Andy Pernsteiner [00:33:23]: So it isn't like I'm on the other side of a fence. And really what I try to do whenever I meet with people who sit on one side of the fence, the other is, well, how can we get both sides a little bit closer to each other? How can I talk to the data scientists and data engineering teams and help them understand that? You should think about what the infrastructure and the platform looks like because it'll help you do your job better. And then on the platform side, you should understand why they're doing what they're doing, not just what they're asking you for. But why do they want that? Because sometimes the platform engineer says they might be able to say, well, wait a minute, your real goal is this. You're asking me for that. I think I could probably give you something else that's going to help you to do it better. Because this person's not an expert of their realm. But if you bring people closer together, they don't have to leave their worlds, but they should try to get them as close together as possible so that when they have a conversation, it's not only something that's productive, but it just gets them all sort of helping each other to get to the same place.

Andy Pernsteiner [00:34:26]: Right.

Demetrios Brinkmann [00:34:26]: I've heard it put, and I really like the visual description of. You can look at it like there is a slide with a clear distinction of two colors, like two boxes on the slide. Or you can look at it as a gradient.

Andy Pernsteiner [00:34:44]: Yeah.

Demetrios Brinkmann [00:34:45]: And the two colors are very clearly. There's a line where one color starts and one color stops. Or there's a gradient of that. And I think what you're describing is, let's get to that gradient state.

Andy Pernsteiner [00:34:56]: I think we are in it. We just have to recognize it and say that that's what's happening. It's just like people talk about, like the spectrum, however you want to think about what that means. I don't think there is anybody who's not on a spectrum of some kinds. Like there's. Like we're all there in some way or another. Right. Like different parts of ourselves align with different sort of values, align with or are.

Andy Pernsteiner [00:35:19]: Are sort of more. Or have more propensity to behaving a Certain way to thinking a certain way. And so it's like if we all go into it, individuals go into it, realizing that, okay, you came to me with a problem. Part of me felt like that was, you know, you being lazy when you ask for this. You know, you asked me to provision you a database and you asked me to provision you a place to store data. You asked for me to provision you some GPU hours, whatever it is. There's a part of me that feels like, well, that was lazy. You didn't explain to me all the other specifics that you need.

Andy Pernsteiner [00:35:53]: You didn't do that. But then there's another part of me that thinks, well, that's because you don't know. And if I help, you know, then when you come to me, I have a much higher confidence that you thought through the things that I'm going to have to think about if people start to think about what the other person has to go through. Because, like, it's just like if someone makes a request, they had to sit there and write it down. It's not like they just said, oh, I want this. And like, there's a process they have.

Demetrios Brinkmann [00:36:20]: To follow and just think it, and then it magically.

Andy Pernsteiner [00:36:23]: We would all love that. Maybe, I don't know, it sounds a little frightening, but like, someone had to think through it. So it's like, okay, well, that person. And the other thing is that when you're a platform services team or an infrastructure team or another team, where people have to ask you for things, like, you as a platform person have to realize that if the way that you interact with your user community is negative, they don't really want to ask you for anything. And so it's like, you know, and, and, and that's why making the life better of the people in the middle, if you want to call it that, helps the people up above get what they want, not just faster, but they just get like a. They get a better experience. And it's. And a lot of times people focus on being able to iterate faster and sort of being able to fail fast and to get MVPs out the door as quickly as possible.

Andy Pernsteiner [00:37:17]: But it also is good when, if I understand holistically what you're asking me for, I can give you something that's probably more comprehensive. So you're not going to have to come back to me and ask for another thing, right? Oh, you want a database and you want a bucket and you want a container. But you haven't really talked to me about networking. You haven't really talked to me about clustering this. You say you want a database and you said that you don't care how big it gets. But you just told me that this app is supposed to do X, Y and Z. Probably we need to move you on to something else. Right.

Andy Pernsteiner [00:37:49]: And so that way the pain is lessened for this person because they don't have to go redo everything and maybe they're responsible for migrating data back and forth. And the person up here feels like I got what I wanted, even though I didn't really realize it had to scale and stuff they like. And as they scale it, they don't have to make more requests. Right. So I think about things in terms that's like that to me is an optimization.

Demetrios Brinkmann [00:38:12]: But what you're talking about basically is being empathetic to the other's needs.

Andy Pernsteiner [00:38:18]: Yeah, that's all I want to go for.

Demetrios Brinkmann [00:38:20]: But also one thing that's very clear is there's no technology involved in that. Right.

Andy Pernsteiner [00:38:25]: That's not, not at first. Yeah.

Demetrios Brinkmann [00:38:27]: It's just like a human to human type of, or potentially like culturally in the organization. Hey, we go about things like this.

Andy Pernsteiner [00:38:36]: Yeah.

Demetrios Brinkmann [00:38:36]: If you are so lucky to have it be a cultural piece. But I remember a lot of times in the beginning of the ML Ops community a lot where it was almost like ML Ops was synonymous with solving problems, with tech. But we forgot about, hey, there was this whole DevOps movement and, and a lot of the DevOps movement was cultural change as much as it was like bringing tech into the game.

Andy Pernsteiner [00:39:02]: Yeah, yeah.

Demetrios Brinkmann [00:39:03]: And this feels a lot like, yeah, let's remember that no matter what we're doing, still there should be this cultural or human piece to it where we're just trying to figure out and be empathetic with the other person to recognize what they're looking for. Because as you said it so eloquently, I'm an expert in one thing, you're an expert in the other thing. Let me help you help, help me help you get you what you need.

Andy Pernsteiner [00:39:30]: Yeah. And I, you know, because I, it, it's hard to pull this off properly and you know, where I work, it's easy because everybody can talk to everybody. There's not like really walls built between the organizations. I mean, as you scale, like when I started there was like, there are two people in the States and there are like 30 people in Israel. That's it. Now we're at like 11 or 1200 people spread out everywhere and silos have developed in their way, but they aren't because we've done it from a cultural standpoint. We've always kind of kept like a wide open, you could call it reporting structure. Now it's all about, well, how do we build ourselves for scale? How do we get to a place where we can be more effective? Because when you grow really fast as an organization, a lot of times you're moving quickly and you're not thinking about the technical debt that you're accumulating not from a software standpoint, but from an organizational standpoint.

Demetrios Brinkmann [00:40:27]: It's like organizational debt.

Andy Pernsteiner [00:40:29]: It's the same. Yeah, it's the same basic thing. And I like, I. But as long as there's like, at the core of all the people's belief that they don't have to create walls between each other as long as they just know that in some way. But it has to be for a lot of people who come from other organizations where that wasn't how it is. It takes a bit of unlearning to be able to get there. And it, and it isn't always easy. It's not like, oh, everything's great.

Andy Pernsteiner [00:40:58]: I can just ask anybody for anything. It means that everybody's asking you stuff all the time, right?

Demetrios Brinkmann [00:41:04]: The implications of this type of way.

Andy Pernsteiner [00:41:06]: But it means that everybody's also like trying to help each other all the time. And so it's, it, you know, for me, it's the only. I can't operate at places where I have to go through all, all of the process, right? But then when I go in, interface with customers who have very well defined process, it's, you know, you can't, you can't break their walls down. You have to find ways where they can punch holes in it at least to look through the other side and make sure that they can get, you know, that they can fit this whole sort of dynamic nature of working together into a process that's still regulated and conforms to some kind of practices, right? And so like, like, you know, and then if I talk about like what our software products do, a lot of times they're just meant to make it easier for the infrastructure and platform engineering teams. It makes it easier by making it so they don't have to manage as much at the lower layers so they can offer more and more services up on top of. We don't iterate as quickly necessarily as some of the, you know, some of the people at your event last night, they're iterating very quickly. They'll do builds every day or even maybe more frequently than that. Right? That they're either deploying to prod or putting staging.

Andy Pernsteiner [00:42:20]: We're more conservative in some ways because the platform can't be unstable. You have an app that impacts 10,000 users, that's a big deal. But you have a platform that impacts 10,000 apps, that's another thing, right?

Demetrios Brinkmann [00:42:34]: Yeah, there's a big multiple on that.

Andy Pernsteiner [00:42:37]: And, and so we're trying to make it so that the people who have to manage the platform, because those are the people who sleep in data centers, those are the people who like, they live, they live in maintenance windows. Like that's their. Their life is like the next maintenance window. Like I like, I spend a lot of time like traveling around and meeting with clients, different places, and it's invariable. Or it always happens that we'll be out for dinner, we'll be out for something, they'll get a seven page, they gotta go, they're in the car on a laptop, we're on a train on a laptop. They're doing stuff. I mean, I sometimes am in that place. Right.

Demetrios Brinkmann [00:43:14]: Last night after you did the release.

Andy Pernsteiner [00:43:16]: Well, it wasn't like that. I mean, mostly I try to get it to the place where to my team. My role oftentimes is interfacing between sort of the customers, engineering, the sales organization, sometimes product management. Like I kind of go within all of these groups, but oftentimes I'm the black box tester before we release. Right. QA does all their unit tests, they do the regression smoke tests. People on my team go focus on functional areas and go and try and do it the way a customer would do it. And then like usually the last week before release, I say, okay, now it's like, stable enough, let me go play with it.

Andy Pernsteiner [00:43:58]: Let me pretend like I'm dumb and set it up for the first time. I'll go through the docs and make a comment every line. I don't know what this means. It's just like one of those things. And so the last week has been a bit of that. Because I just don't. Because what ends up happening is if a client tries to use something that we've just built, the first place they're going to come is to my group to try and figure out how to make it work. So it behooves me to know how it works.

Andy Pernsteiner [00:44:24]: I'm like, A, how it works and B, how can we make the docs better? How can we make the UX better?

Demetrios Brinkmann [00:44:29]: So if they don't come to you.

Andy Pernsteiner [00:44:30]: Yeah. So they can get what they want out of it. It's just. It's the same basic phenomenon that happens between the platform and the app team. It's like, okay, when the platform team is ready to offer a service, let's say they go, oh, we finally have container as a service or whatever they offer. Like, when they roll that out, it behooves them to be able to make sure that they thought through, what is Joe over in finance going to do with this thing? So that, that way when the service is there and he hits the go button on his app, that I don't get a SEV1 call on Saturday night. Right. So our world is definitely in the world of people who are supporting the sort of people who are making the applications.

Andy Pernsteiner [00:45:13]: And the best thing that we can do is make sure that they don't have to wake up at midnight. Right. That they don't have to worry about it as much, that we don't have such a thing as a maintenance window to fall into. If we have to update software, do some kind of expansion of systems, that we can do so without it being disruptive to their users. Because ultimately their customers who are relying on these services, they. They end up complaining to this team and that team, like they're buffering the complaints that would just come to us. Right. So we do much better to make sure that they have what they need.

Demetrios Brinkmann [00:45:48]: Yeah. I feel like I distracted you a little bit on how you travel around and you see the customers and they're the ones that are sleeping in the data centers and they're the ones that are getting seven calls and they're the ones that are having to do this. So how can you make their lives easier?

Andy Pernsteiner [00:46:03]: Well, I mean, for us, I think the way we make their lives easier is to understand their lives as well as we can, to really. And I mentioned earlier, like, requirements gathering. I think that one of the things that we have done in the past and it worked well for us at first, is we took a sample set of people who said they needed something. We built for what they said they needed, and we tried to apply it to as many people as broadly as possible. And at first, I think that works, especially if you're solving problems that are really hard to solve. Then the people who start to use it, at first, they're fine with some blemishes, they're fine with some inconsistencies of how things work. But as the number of clients, users, customers, whoever, as that scales, it's the challenge that we're at right now is like, well, how do we make sure. That we're making really the right bet in a direction.

Andy Pernsteiner [00:47:02]: Right. If you think about like one of the metrics that we measure in terms of our success, besides revenue, everyone measures revenue, right? That's an easy one, is uptime of our systems, right? Like that's like we're, we're very, we focus on that very, you know, very closely. And so, you know, we're always trying to strive for six, nine, seven nines, whatever, right. And I think that the goal, and you know, if you look at how those numbers are calculated, obviously, like it's not like the simplest thing in the world, but ultimately the way that we determine if we are meeting that is, did the customer ever have some kind of an outage that was related to us? Right. And it's not only the platform that we look at, it's like, like if a customer, let's say, they upgrade. And the way that you do something in the new version is slightly different, right? Like the API, you know, the APIs are the same, but maybe the behavior is slightly different because we've added some functionality or something like that. If that causes them to have an issue, that's still our issue, right? But the only way to know that is to really know what they're doing. It's not enough to know, oh, they need the API call to return a 200.

Andy Pernsteiner [00:48:15]: Like that's not enough, right? You need to know not just what's coming back to it, but then what are they doing with it afterwards, who are their consumers? So when I talk to clients who are platform engineering teams, who are, who are infrastructure engineering teams, of course I talk to them about what they say they need, right. But I always try to get a little bit further and like, who's asking you for stuff and what are they asking you for? And do you know why they're asking you for it? Because if I can know that, then it makes it much more likely that if I give you something that works, that it's not only going to work for you, it's going to work for them. And then that means that you have less reason to worry about what you're doing day to day, right?

Demetrios Brinkmann [00:48:58]: Less reason for them to come to you and then you to go to.

Andy Pernsteiner [00:49:03]: Yes, basically. Basically I'm trying to, I mean, it's not just protectionism. A little bit of it is. Right. I'm trying to protect ourselves from escalations, right. Like my, my, I, I've got, I would say that my role has evolved over probably 20 something years, but it's always Been in some escalation kind of role. Like, I don't know why.

Demetrios Brinkmann [00:49:25]: So you've gotten good at trying to minimize those.

Andy Pernsteiner [00:49:28]: But I try to minimize them before they happen.

Demetrios Brinkmann [00:49:30]: Exactly right. You're preemptive.

Andy Pernsteiner [00:49:32]: Like, and, and like that, that is a hard thing because I deal with a lot of support engineers, a lot of escalations. Engineers, like on our side and on the customer side. Right. Everybody's an escalation engineer if you're on a call with them at midnight. Right. Like, that's not, that isn't like the regular app developer who's doing that. Like, it's someone who's providing a service. Like, I worked at Amazon before us, but back.

Andy Pernsteiner [00:49:54]: But we were on corporate systems, right. So we'd manage like tens of thousands of servers and everybody had pager rotation and like, even, even app. App developers had pager rotation. But from my standpoint, I only had like a certain sphere of influence that I had to worry about. But I had friends. And if they had to get on call because something failed, that was a problem for me too. So I try to, like, help make sure that whatever they were using wasn't going to give them a page. Right.

Demetrios Brinkmann [00:50:23]: Like, it would, it would be a problem for you too, because you couldn't finish your night or because they would.

Andy Pernsteiner [00:50:27]: Because I couldn't hang out. Like, we want to get a beer. Right. Like I thought.

Demetrios Brinkmann [00:50:30]: And they're probably pinging you like, hey, do you know what?

Andy Pernsteiner [00:50:33]: Well, they might do that too. That, like, from, from that standpoint. But I even think about it in terms of, like, look, when I first started working there, I was like, what, like 22, 23, something like that? Like, I didn't. It didn't matter as much to me, the business. What mattered to me was like, I had enough money so we could go out and have fun.

Demetrios Brinkmann [00:50:49]: And now we can't have fun with.

Andy Pernsteiner [00:50:51]: Laptops sitting in a, sitting in a bar. I mean, I'm not saying we didn't do that. Like, like, usually fixing sub ones with a beer next to you isn't the best practice.

Demetrios Brinkmann [00:51:01]: Not a fun Friday night. I wanted to hit on the, the gigantic GPU clusters and how rolling updates happen.

Andy Pernsteiner [00:51:13]: Well, I can tell you our experience.

Demetrios Brinkmann [00:51:14]: Yeah.

Andy Pernsteiner [00:51:15]: And I'm, I'm not going to name names because they're like the, Some of them would let me name names, but I don't like to call people out that way. Right. So some of the data centers being built now are being built very quickly and power is not always so reliable. In fact, it's the opposite of reliable, Right. I think that it's too easy to think. Well, I don't know how closely you follow these things, like how many gigawatts are getting deployed in Texas or in Norway or the Stargate projects or any. And the reason I pay attention to it in part is because it affects me personally, but it's also just because it's like an excruciatingly large amount of capital being spent very quickly. And I'm like one of the data centers that we worked with a larger model builder on, they built in like 120 days, and there was a building, but they had to fit it out with all the normal things you put in a data center.

Andy Pernsteiner [00:52:12]: And it was a gigantic data center. I don't know, hundreds of megawatts, right? Hundreds of thousands of GPUs. And they want to make it, I don't know, five times as big or whatever. Crazy thing, right? Of course. But when people are running fast in that world, the effects of something going wrong are magnified immensely. And so power has been one of the biggest challenges. And where we've been very successful is that unlike other platforms that offer the data services we provide, we don't worry so much about what happens when the entire data center goes dark. We don't worry that we've lost any data in the process because of how we've created an architecture to deal with that.

Demetrios Brinkmann [00:52:55]: Why is that? I'm not sure I fully understand.

Andy Pernsteiner [00:52:57]: So, basically, okay, let's just talk about basic stuff. Let's say you're a client and you copy a file to a file server that goes over the network. The file server receives it, it parks it somewhere for performance. It usually doesn't park it straight on the media that it's going to go to because that takes too long. So maybe it puts it into a buffer. Maybe it puts it into a card that has some battery back in case there's some kind of a power outage. A lot of times most file systems, let's say, would use a journal of some kind. Journals, unfortunately, have to be replayed, which means when you turn everything back on, you don't get to do anything while the replay happens.

Andy Pernsteiner [00:53:34]: If there's any inconsistency found in any part of the metadata structures when you're mounting the systems. And then you have to not only replay the journals, you have to do a file system check. And the only safe way to do that file system check is when nobody's writing to it.

Demetrios Brinkmann [00:53:46]: So now you're just waiting Everything's offline.

Andy Pernsteiner [00:53:48]: And that could be hours on big systems. Hours times all the GPUs. Millions and millions of GPU minutes. Someone's getting fired. That's all that means, right?

Demetrios Brinkmann [00:53:59]: Or a team.

Andy Pernsteiner [00:54:00]: Yeah, well, hopefully not, because then what are they going to do the next day? I don't know. But, but so from our standpoint, we've always looked at it, it's kind of like when we were talking earlier about the person who's developing on their laptop and they're like, nothing's ever going to. What do you mean failure? Why do I have to think about failure? Why is that thing? But we had to, because at first the places that we went into, if I think about the DOE labs or NASA or NIH or any of these big HPC based research organizations, they're used to dealing with, with scalable file systems that have just big problems, right? Like if there's a massive power outage, they know there's going to be some kind of like day long waiting for stuff to be okay again. Which, you know, it's not like it was good when they had 10,000 compute nodes that needed to access data, but it's much worse now when the GPUs cost that much more. And so, and especially there's this whole race, everyone's trying to race to, to build the next model to get one step ahead. It's dizzying. And so because we're always in that race or they're always in that race and we're kind of there with them, it's very important that if there is some kind of unpredictable thing that happens, that we can recover from it very quickly. So one of the data centers that we were working on, they would have power outages randomly almost every day.

Andy Pernsteiner [00:55:21]: We're not. The whole data center would go, a whole data hall would go. People usually split things up into data halls where they'll have like 10,000 GPUs over there and 10,000 over there and 10,000 over THERE. And there's different power feeds coming in, all of them. You lose one of them. It sucks, right? But it's not the end of the world. You still have most of your capacity and Most of your GPUs still online running. But when you bring that one back up, you don't want to wait that long for your GPUs to be able to do something with it, right? And so we of course don't like it when the power goes out all the time.

Andy Pernsteiner [00:55:54]: We get tons of alarms, we get all the support Engineers trying to figure out what's going on. But we've gotten used to it now because we've seen this is becoming more of a regularity. Because again, a lot of times when these places are built out, they're not necessarily building them out with full redundancy for everything because all the money's going to GPUs. None of these people want to spend the extra money on the other stuff because they want to buy that extra little slice of GPUs to get them to a usable state. And so sometimes they cut corners in those ways. And so we provide a platform that can deal with that kind of issue. The other part of it is that when we started out, it's funny, I was at a user conference last year and one of our customers from NYU had told me that basically they were used to their systems. If they had some kind of upgrade they wanted to do, like a software upgrade to get new features or to fix some bug or whatever, they would have to take downtime for their systems to do it.

Andy Pernsteiner [00:56:53]: And I never heard of that. Nowhere I worked before had ever had a problem just doing software updates in a non disruptive fashion. And so I just, it was a very foreign concept to me. But then I talked to all my colleagues who used to work at other places like, oh yeah, we would take outages. When we do upgrades, we tell the customer to plan for, you know, an hour to two hours to 10 hours worth of downtime just to do a software upgrade. So they never wanted to upgrade. And I'm like, but if they never upgrade, how do they get the bug fixed? How do they get this thing? Like, to me it's a foreign concept, but it isn't for a lot of people who've been dealing with the low level infrastructure for a long time. And so being able to offer new features and services to people without requiring that they have to take some level of downtime or lose a large percentage of their performance for a period of time is a big deal, right? Huge.

Andy Pernsteiner [00:57:40]: And again, we created an architecture that disaggregates the state layer and the logic layer to make sure that when we're making software updates in the logic layer, we don't necessarily have to do anything at the state side of things and do so in a rolling fashion. So there's always a large percentage of performance available and the entire system is available during any of those things. So it doesn't matter if an upgrade happens in the middle of the day. We don't have to take a maintenance one to do it, you can do it anytime. And so we try to like we've, we've created that reality for the high performance compute and now the large scale AI sort of community. And we try to bring it to more and more people because it feels like it was a hard problem for us to solve at first, but we had the benefit of spending a lot of time up front thinking about the way that other systems and platforms have been built and we just, we didn't actually implement any of our code for a long enough time. Like we, like when our, when our original CEO and founder and our first developer actually alone, who spoke last night, he was the first code writing developer at Vast. When he first started.

Andy Pernsteiner [00:58:53]: They spent a solid, I don't know how long, if it was a whole year or some percentage of a year just writing down what they needed the architecture to be like, just thinking through the architecture before they wrote any code. Then they wrote code for one plus years before we deployed anything to any even alpha customers. Because they just wanted to get everything that the foundational layer as solid as possible before anybody was going to rely on it. And so we were, you know, because of that, it's actually easier for us to do new features. It's easier for us to build new things because the foundation is solid enough that we don't have to go and refactor anything down here anymore. Like we can all do it up at the, at the higher layers.

Demetrios Brinkmann [00:59:35]: Dude. So the thing that sticks out to me the most on this is that as if GPUs weren't unreliable enough.

Andy Pernsteiner [00:59:46]: Oh, on their own, yeah.

Demetrios Brinkmann [00:59:48]: Now you're talking about how unreliable the power is.

Andy Pernsteiner [00:59:51]: Well, I mean like if you want to, if you want to deploy 100,000 GPUs, you need at least 100 megawatts worth of power to do it. Where are you going to get that from? It doesn't exist. That's why you see all these gas turbines being rented or bought. That's why you see, see micronuclear coming up. These aren't reliable, proven ways of doing power. How many electrical engineers out there, like the, the I say like, you know how like during COVID like the traveling nurse thing was like it was a thing that was a profession. They could make good, good, good income doing it. Now traveling electricians, they go relocate to some crappy place for like three to six months.

Demetrios Brinkmann [01:00:24]: Gigantic data center.

Andy Pernsteiner [01:00:26]: Yeah. And, and then they end up, you know, basically working around the clock and making a ton of money or whatever. But the thing is that they're not doing things necessarily according to the standard practice. They're trying to find the fastest way to get the most power online as quickly as possible. And that's why it's kind of interesting. So we sponsor and attend a large supercomputing conference every year. It's called Supercompute. Really Original.

Demetrios Brinkmann [01:00:54]: Original, yeah.

Andy Pernsteiner [01:00:56]: But it's mostly historically been sort of what I would consider to be classical old school HPC administrator and infrastructure managers. Right? Like people who had to build out HPC systems, people who understand everything down to the cables being used between all of the racks of servers. And that isn't a skill set that's being cultivated for new college grads. But the irony is that all of the big large scale AI projects require that skill. And so there's a smaller and smaller number of them because most of them are getting older, they're in their 50s or more. Because a lot of the HPC environments are, you know, sort of, you know, classically lower paying jobs with the DOE or other places.

Demetrios Brinkmann [01:01:41]: It's so wild. I mean, is it because it's hardware too?

Andy Pernsteiner [01:01:45]: That part of it, part of it's hardware? No hardware is that interesting to most people, right? Unless, you know, you like want to build some cool gaming rig or do whatever and have little neon lights on it.

Demetrios Brinkmann [01:01:55]: Yeah, nobody wants to be plugging in. Is it this port to that?

Andy Pernsteiner [01:01:58]: No one wants to think about that. They don't want to think about polarity of optics. They don't want to think about like what the signal rate is on a transceiver and whether it's compatible. Like, you know, Nvidia is great about releasing, you know, new cards every year that people have to figure out, okay, now how much power do I need in Iraq? Right? Like it used to be like 10 kilowatts in Iraq was enough, then 20 was enough. Now like 100 kilowatts in Iraq is, is like pretty normal. 250 is now what we're seeing at some of the places. And a lot of people won't understand what that means. But it hasn't escalated gradually.

Demetrios Brinkmann [01:02:30]: It's just jack straight up.

Andy Pernsteiner [01:02:33]: And then you think like, how do you get that much into a rack full of servers? And then you have to think about the cost of what one of those racks are, right? You could have easily multimillion dollar racks of servers that take up this chair width of floor space.

Demetrios Brinkmann [01:02:51]: And it's just sucking energy.

Andy Pernsteiner [01:02:53]: Sucking energy, processing data, doing all of these things. And then now you got people who are managing like hundreds of those or thousands of those in a single building.

Demetrios Brinkmann [01:03:00]: And it's going offline intermittently.

Andy Pernsteiner [01:03:03]: It can. Right. And they do their best to make sure that it doesn't. But they need to have a platform that's reliable enough that if something goes wrong, it's not like starting over. Right.

Demetrios Brinkmann [01:03:14]: And so I didn't quite catch is the whole thing there with what you're doing. The value prop is that it's not just starting over.

Andy Pernsteiner [01:03:25]: So basically, look at it like this. Let's say you're in the middle of a training job, hopefully you're checkpointing. That's like a thing, right?

Demetrios Brinkmann [01:03:32]: Yeah. My buddy Todd was telling me again not to derail this one, but he was like, checkpointing is a fascinating art because you don't necessarily know, should I be checkpointing every second, should I checkpoint every day?

Andy Pernsteiner [01:03:46]: It's a cost analysis. Honestly, you have to sit there and do a cost analysis. But basically you do a checkpoint where you're dumping GPU memory and state. And it takes some time. Nowadays, most people are doing what's called asynchronous checkpointing, so it doesn't necessarily impede the job. But when you, when you checkpoint, you're basically saying, I want to take a little bit of a performance hit and time hit on my overall job to make sure that if everything goes to crap and everything going to crap could be like a few GPUs failing. Right. The wrong GPU is failing the wrong time and then there's like a certain amount of restart that has to happen.

Andy Pernsteiner [01:04:22]: Let's imagine the whole data center went out, right? And let's say you're checkpointing once an hour or whatever. That means you probably have to go back an hour. But what happens when the system that holds your checkpoints isn't online for another couple hours now you're screwed. And then once it does come online, you still have to read back the checkpoints and that takes time. So it all ends up coming down to time. And I think that, like, in the way that people have dealt with this in the past and they had less reliable systems, is they would say, okay, well, a percentage of our GPUs are going to write to this. For checkpoints, the percentage is going to write to that. That way I don't lose everything.

Andy Pernsteiner [01:04:57]: They come up with all these different chart checkpoints. Yeah, they come up with more complex strategies because they can't rely on the single platform. They have to split it up into pieces. But when you split things up into pieces, then you have to orchestrate your Jobs around having all of these different pieces you have to manage like it becomes much more difficult to operationalize. Everybody wants one place they don't like. You know, it's like, it's like even, even when you're, when you're thinking about in personal life, you don't want to have 10 different places. You have to look for something every time you want to look for it. Right.

Demetrios Brinkmann [01:05:29]: Could be in one of these ten places.

Andy Pernsteiner [01:05:31]: Yeah, it's, it's much easier if you know where everything is going to be and you don't have to think about it in the, in, in the wrong terms. And it also think you have to think about it 20 in 20 in terms of scale too, because as you were mentioning earlier, your friend in the finops world, they don't even necessarily know what their forecasts are going to look like. Well, having platforms that can scale in different dimensions, sometimes you need more performance in one sense and sometimes you need more capacity in another sense. If you're fixed in how you can scale, that becomes challenging. That was one challenge that people often have with Hadoop was they would have jobs that would run slowly. But because of the architecture, they had to also add significantly more expensive storage along with it because everything was tightly coupled. Every node had some amount of storage and compute in it. And the only way to get performance is to shard across all the nodes very evenly.

Andy Pernsteiner [01:06:22]: And so if you needed to add more performance, you had to add more capacity and reshard. And there's all these issues and layers. And so we again looked at that and said that's not the way to do it. That's why we separated what we call logic from state and disaggregated things so that you could scale the performance layer without having to scale the data layer. You could scale the data layer and not have to pay the cost of scaling the performance layer if your needs changed. So part of building out a platform that's going to meet unpredictable needs is having the ability to scale in multiple dimensions without having to go tell your user community they have to go point somewhere else now. Right.

Demetrios Brinkmann [01:07:00]: But are you also thinking about it as far as like, we've got the GPUs, but what if folks want to use the AMD GPUs or the TPUs?

Andy Pernsteiner [01:07:10]: And so we're not so picky. Right. And I think that we, I want, I'm not going to say we're not playing any favorites. Like obviously Nvidia is. They're big, we like them. Yeah, they honestly. And they actually have Taken quite a large proportion of not just GPU estate. Right.

Andy Pernsteiner [01:07:30]: They, they bought a company called Mellanox, which is the high speed networking interface that most of the GPU farms are using. There's other vendors in that space, but if you're buying the GPU, like if you're buying the GPUs from them, then they can toss in the networking along with it. Or what they would do is build out a reference architecture that says use these networking switches, use these networking cables, all this other stuff, because. And then if I'm basically a cloud service provider that's trying to build a GPU cloud so I can rent out GPUs to a customer, if I buy the GPUs from Nvidia, I'm going to get much better support from them if I'm also buying other parts of the stack from them. And so in that way they control a lot more estate within the data center and they're influencing the decisions of a lot of these people who are building out these scalable systems. And then, so, you know, there's that side of it. But in terms of if a client wants to use other GPUs, I mean, you know, whether even, even intel makes GPUs, right. We haven't seen them as much in terms of popularity, but we're not, you know, we, we kind of try and take a more agnostic approach to looking at it.

Andy Pernsteiner [01:08:42]: I would say that like for a lot of inference use cases, I don't necessarily know how important it will be to have certain GPUs in the future. I think for training, Nvidia definitely has a leading edge in terms of basically creating Cuda and creating the world that people live in when they're trying to come up with how to do training. But in terms of inference, I think that they have projects, they have a project called Dynamo, which is basically their inference focused suite of software stacks. Um, but I think there's a little bit more flexibility in the inference space and you know, I'm looking forward to when we like because from our standpoint we see people doing a lot of training and we see some inference. I think that the transition, it is shifting. Yeah. And, and we don't know what the power draw, what the workloads, what those look like as much as we do for training. They're probably much less predictable.

Andy Pernsteiner [01:09:39]: Yeah, they probably scale in different ways and you know, it'll be interesting to see.

Demetrios Brinkmann [01:09:45]: It's funny, I remember reading a Facebook engineering post back in the day talking about how they, the Traditional predictive ML stuff that they were doing was lots of bursts. It was like short, fast bursts. And now this generative AI stuff is really long, intense, heavy training. And so the, the workloads were very different. And if you think about the inference, it's more on that like bursty side, it feels like to me then especially it's not like a 40 day training job.

Andy Pernsteiner [01:10:19]: Yeah, yeah, yeah. I think what the challenge is for people who are offering GPUs as a service or the people who are buying GPUs and trying to maximize their utilization, the challenge is figuring out how to keep them as busy as possible but not take the risk. You know, because the, the other thing is, is that we, we, we talk to clients a lot about GPU failure rates because they're not negligible. Right. And I think that especially because of how hard things are getting pushed, like you see failures happen much more frequently. I think people who are building systems on top of that, whether they're agentic or not, I think, I think thinking about failure before you go build a whole application stack is probably a good practice. Yeah, right. Things are going to fail.

Andy Pernsteiner [01:11:08]: And I gave a talk in London, maybe it was two years ago now, and it was like an AI centric conference that I was at and I asked everybody who is. There's probably 200 people there. I asked everybody if they knew what an Nvidia DGX was, which is a GPU server. Nobody knew zero like most people knew.

Demetrios Brinkmann [01:11:28]: What a GPU was, but the specific number or the specific.

Andy Pernsteiner [01:11:33]: Well, they didn't know what a DGX was. They kind of knew what a GPU was. The thing is that most people were, at least at that conference, they were consuming based on having a service available to them. They didn't think about infrastructure, which is good. I'm glad that they get to live in that world. They live in the world where they don't have to think about those pieces. But now that we're at a place where the scale is much larger, those are the pieces that cost all the money. It's not.

Andy Pernsteiner [01:11:55]: And that's unfortunately the way that this world operates. Right. So we have to make sure that we're using this very sort of expensive resource the best we can and knowing ahead of time that you're going to deal with failures because infrastructure will fail. Yes, of course software is going to have bugs too. Right. But you know, in the world of a developer, there's no bugs. There's no bugs. Right.

Demetrios Brinkmann [01:12:18]: But even hardware, where is this magical Fantasy land use. I don't know.

Andy Pernsteiner [01:12:21]: I don't know. I think it's there somewhere. One of, one of the architects said, well, you know, if I don't write any codes, there's no bugs. I'm like, yeah, but I need you to write the code. He's like, well, stop complaining about the bugs. I'm like, I don't know what to do.

Demetrios Brinkmann [01:12:32]: Dude, I like this guy already.

Andy Pernsteiner [01:12:34]: I'll, I'll bring him by. He's fun.

Demetrios Brinkmann [01:12:38]: It's making sense. Yeah, but, but anyway, yeah, going back to the, this idea of like the magical fantasy land of no bugs.

Andy Pernsteiner [01:12:47]: Yeah.

Demetrios Brinkmann [01:12:48]: Infrastructure is going to fail.

Andy Pernsteiner [01:12:49]: It's still going to have problems. And, and even if you don't have bugs, you didn't necessarily think about what it's like to have variable levels of latency between different things. Especially now that you can't fit enough GPUs in the one place to train some of these models now, I mean, they want to, they'll build gigawatt level data centers, but that takes time. So in the meantime, while they're in the race to make the next model or they're in the race to build the next application, they have to find ways of using stuff across multiple geo distributed places, whether that be within a major cloud service provider or in a data center they're renting or whatever they're doing. They need to be more flexible. Developers need to be more flexible. Application architects need to be more flexible. And thinking about how do I make sure that when I'm running, even if it's not training, if it's any kind of application that it can tolerate what happens when I need to request data that's somewhere else.

Andy Pernsteiner [01:13:45]: And so one of the things that we've also done is try to find a way. I can't defeat the speed of light. Like it's just not possible. But we can cheat a little bit, right? We can get more predictive in how we prefetch data that's going to be loaded in for an application or job. We can allow the scheduler to give us hints so that we know what data it's going to need. So we can move it from physical place to physical place in a way that's asynchronous and that by the time the job needs it or the application needs it, it's already there. Right. Like we spent a lot of energy coming up with a way for people to create like a global mesh so the platform can extend beyond four walls of a data center into a place where you could have some stuff that's deployed in cloud service provider A and in a data center and cloud service provider B in multiple geographical places.

Andy Pernsteiner [01:14:33]: And it will still function and you can start to help them see the right way to interface with it so that the application is not, I wouldn't say able to tolerate it, but takes advantage of it. Right.

Demetrios Brinkmann [01:14:44]: Like when you use Instagram and you upload a photo and it's already uploading as you're writing the caption.

Andy Pernsteiner [01:14:50]: Yeah, yeah, yeah, yeah. I think ultimately it's. It's funny, I interviewed for a job, like, forever ago, 25 years ago, maybe now. And my buddy, because it was his dad who's gonna give me the job, he's like, trying to get me prepared for the interview. And he's like. He asked, he said to me, he's like, what's the most important part of a network? And I was, no, what's the most important part of like, a computing environment or a computing network or whatever? And I was like, well, it's the switches and the this and the, you know, whatever I was trying to think of. He's like, no, I'm like, well, what is it? He's like, it's the user that's the most important part. Like the.

Andy Pernsteiner [01:15:26]: And so when you think about whether it's app design for a phone, whether it's app design for a large scale sort of enterprise app, like, none of it matters if the user doesn't get what they're asking for. And I think we, regardless of whether we're at the lowest layers or closer to the top, where they're interfacing directly with the user, have to remember that because ultimately that's success or not success. Right. Like that. Like you could say, yeah, we gave you seven nines of uptime and the performance is the fastest that you can get anywhere. And it's like, yeah. But the user couldn't get what they wanted. And so then they complained.

Andy Pernsteiner [01:16:01]: Right? Like, that's ultimately what it comes down to.

Demetrios Brinkmann [01:16:03]: Yeah. Oh, man, what a. And that is for anything. It's not just for software. Just when you're interfacing with folks or you're working with folks, you want to make sure that they understand the value that you're bringing. And if they don't understand that as.

Andy Pernsteiner [01:16:19]: The user, then there isn't any value.

Demetrios Brinkmann [01:16:22]: Exactly.

Andy Pernsteiner [01:16:23]: So then you haven't accomplished your goal. And I think even if. Whether I'm plugging in cables in a data center or whether I'm designing an application to transform data from One form format to another. Like, no matter what I'm doing, there is always someone on the other side of it somewhere. And really, whether it trickles down to me or not, if I think about that when I'm doing something, if I think about that when I'm educating someone on my team or one of our engineering teams to build something, if I think about that and I make sure they're thinking about that, then there's a much better chance for it to actually be successful. Right. Like if you get a product requirement doc, and you look at it and you're like, okay, I'll check this box and that box and that box and that box.

Demetrios Brinkmann [01:17:10]: Done.

Andy Pernsteiner [01:17:10]: If they're, if they're not thinking about the mission of the user, then it doesn't really accomplish the goal, even if they check all the boxes. Right? Yeah. And so I don't know, like, to me, it feels like most of my time is spent interfacing with clients and trying to figure out what they actually want. And then sometimes trying to figure out what they want that they're not asking for. Because oftentimes when people say, yeah, my leg hurts, like, well, maybe, maybe you should stop hitting it with a hammer all day long. I don't know.

Demetrios Brinkmann [01:17:43]: What they're not asking for is for you to take that hammer away from them. That makes sense. Oh, that's classic. Is there anything else that you want to hit on?

Andy Pernsteiner [01:17:55]: Well, I'm curious in terms of your thoughts. Like, you know, you started a. Basically you have a community here in the city. Well, I mean, that's. The event that I attended was here in the city. Like, what's interesting for you? Like, what's, what's new for you that you're actually excited about?

Demetrios Brinkmann [01:18:13]: Oh, there's so many, there's so many levels to that. Right. And I think I mentioned before we started this conversation with the full stack AI and all the different ways and areas that I'm seeing people interact with AI now, because obviously it's on everyone's agenda, it's on everyone's roadmap, and everybody's thinking about it or, or doing stuff in it.

Andy Pernsteiner [01:18:34]: And.

Demetrios Brinkmann [01:18:36]: I would say some of the coolest stuff that I've seen recently. And this is high recency bias.

Andy Pernsteiner [01:18:44]: Yeah, like yesterday. Yeah, this morning.

Demetrios Brinkmann [01:18:47]: Exactly, exactly. I really like what my friends Soham and Neil are doing with like prompt optimization and prompt compression in that regard. Because what they're telling me is like, yeah, you got these agents, they go out and they do stuff on the Internet. They come back and They've scraped a webpage. 90% of that data that they come back with is absolutely useless. But it all just by default gets thrown into the context. Yeah. And then the, the LLM call has to figure out, was this useful or was it not?

Andy Pernsteiner [01:19:21]: And you're paying for that.

Demetrios Brinkmann [01:19:23]: And that's their whole thing. Yeah, they're like, hey, we. You want to save some money? Yeah, check this out. Here's what we created. And, and so I think that's, that's a cool one to look at. I think there's also. I like this. Give me a second.

Andy Pernsteiner [01:19:35]: When you say prompt compression, like, I'm just trying to think of what that means. Is it. Go ahead.

Demetrios Brinkmann [01:19:43]: What I understand from it, from the 10 minutes that I've spent with it, is, is there's a lot that you can do. You can take out words or you can take out whole sentences in a prompt that are not really that informative or they're not very effective. Like you can take that out, take out tokens. Let's just call them tokens because that's what they are. So you can take out a lot of tokens and still get the same result.

Andy Pernsteiner [01:20:10]: Yeah, yeah, no, that's good to think.

Demetrios Brinkmann [01:20:12]: About, but they're doing it with like another model is of course, stripping out the tokens.

Andy Pernsteiner [01:20:17]: A cheaper model, hopefully. Exactly.

Demetrios Brinkmann [01:20:19]: A very cheap model. And actually, yeah, on, on the cheap one that you were talking about earlier when you're thinking about how you get surprised with cloud bills, because you're thinking, yeah, we'll just have this policy where we try to answer the question with a cheap model. And if it doesn't, then it goes to a bigger model and a bigger model and a bigger model until it gets answered.

Andy Pernsteiner [01:20:39]: Right? Yeah.

Demetrios Brinkmann [01:20:39]: And it reminds me of back when ChatGPT came out. I interviewed this guy who created, or he was a researcher at Stanford, I think, and he created Frugal GPT. And it was basically a way to try and concatenate prompts, or if you have the same prompt, you can ask it in a different way and get the same answer so that it's much cheaper. One of the strategies that they introduced in that paper was, hey, let's try with an open source model first let's create a router and maybe like a classifier model that will say, oh yeah, this can go to a cheap open source model instance or oh, this is a little more complex. Let's send it to the bigger model because we don't think it can get answered. And now is when I'm seeing that idea really getting put into practice. Like that's kind of mainstream. Mainstream in quotation marks because we're still super niche on this.

Demetrios Brinkmann [01:21:39]: Right. But it's, people are thinking about that idea much more than when it first came out. And so I like seeing that, Yeah, I like seeing, I mean all the GPU stuff. That's why I love nerding out with you about it because I feel very new in that area. But I do recognize that like you're saying there's a lot of folks who have been working on hardware and building data centers and their job is the most important thing ever right now. And there's not a lot of new graduates that are going into that area.

Andy Pernsteiner [01:22:16]: Not as much. It's like my son, he went to school for welding and because he, he, he probably saw too many late nights of me being in front of a computer really like freaking out, he's like, that's not my game. I'm not going that route. So he went, yeah, no, because. And it's much more well suited for his personality and what he wants to do. But it's a very hot commodity world to be a welder or someone in the trades now because it isn't what people are going to school for. It's not looked upon in the same way that it used to be in terms of it being like a career. But the, but you know, there's a world of supply and demand.

Andy Pernsteiner [01:22:54]: Like they don't have robots doing all that stuff right now. It's going to take a long period of time. It's just like the drilling for oil thing. It's going to be too expensive, expensive to automate most of that stuff for a long time. And so like I think there might be a, you know, sort of a resurgence of people kind of looking into that world. I don't know that there's going to be a resurgence of people going in to be infrastructure engineers and people who are doing those. So we have to get smarter about making it easier for everybody to do it.

Demetrios Brinkmann [01:23:21]: Yeah, yeah, that. Or just pay them a boatload and then there will be a resurgence.

Andy Pernsteiner [01:23:28]: Yeah. If you, there, I mean there are, I will say that the, the, the, the well funded model builders are definitely paying pretty well when it comes to hiring people who are used to managing large fleets of servers. You know, but that, that isn't like a never ending road, right? It's not.

Demetrios Brinkmann [01:23:46]: You can't count on that.

Andy Pernsteiner [01:23:47]: Yeah.

Demetrios Brinkmann [01:23:48]: Especially if you're just going into it right now. It's like, is it gonna still be.

Andy Pernsteiner [01:23:52]: Yeah. Is that really gonna be a thing? And it still isn't interesting to a lot of people.

Demetrios Brinkmann [01:23:56]: Yeah.

Andy Pernsteiner [01:23:57]: Right. Also, I think, I think there's like a breed of mind, I would guess you could say, where you want to fix problems that nobody else wants to fix. Right. And that isn't like that. Like a lot, a lot of people want to think that they're going to come up with some new idea that's going to solve all these amazing problems, but they usually think about it in the higher level of, we'll call it the stack. They don't usually think about it down here. Right, yeah.

Demetrios Brinkmann [01:24:21]: Plugging in again and the whole act of plugging things in isn't as satisfying sometimes. But actually, you know what I was going to say that I'm really stoked about these days, or that I've been thinking about these days really quite a bit because I had to write a talk. And so originally the talk was going to be on what have we seen in the community over the past year and how have things developed? What are the kind of questions that are being asked? That kind of thing. Like a trends type of talk. And then as I was doing it, I was like, wait a minute, there's this thing that kind of bugs me that is a lot more close to my heart.

Andy Pernsteiner [01:25:03]: Yeah.

Demetrios Brinkmann [01:25:03]: And that. That's this idea of the UX of agents. And I kind of proposed this idea in the talk that everybody wants to be an agent and nobody wants to be a tool.

Andy Pernsteiner [01:25:16]: Yeah.

Demetrios Brinkmann [01:25:17]: And so that is creating this horrible user experience for us because we have to go and interface with many different agents and it's just like a little better software 2.0.

Andy Pernsteiner [01:25:29]: Yeah, yeah, yeah.

Demetrios Brinkmann [01:25:30]: It's like, okay, I'm going to go to your website and then I have some chatbot that is an agent in the background.

Andy Pernsteiner [01:25:34]: Cool.

Demetrios Brinkmann [01:25:35]: That's horrible experience for me, people.

Andy Pernsteiner [01:25:36]: I mean, I think it's like this idea that an agent would offer more value than a tool in terms of differentiating from something. Well.

Demetrios Brinkmann [01:25:44]: And you don't want to give up that relationship with your customer.

Andy Pernsteiner [01:25:47]: Yeah.

Demetrios Brinkmann [01:25:48]: Because now if you're just a tool, like who is the gateway to the Internet? Because they're going to be the ones that capture all the value.

Andy Pernsteiner [01:25:55]: Right.

Demetrios Brinkmann [01:25:55]: And now you're going to have to figure out, how can I get my tool, which is really my product, to be consumed by your agent? And that's where I think there's this huge tension that's happening. And so for us as consumers, the better user experience, I argue, is that we interface with one agent that can go off and do anything on the Internet. Because I don't want to have to go and find the place. I want to just tell my agent, go file me a claim for this or whatever it may be. I want to learn from the agent and then have the agent act and do things and be able to have it check all of my stuff and do things on external services. But it's my agent. But that means that Amazon now has to like, maybe it's just a.

Andy Pernsteiner [01:26:47]: It's just an NCP tool. Yeah, it's just an, it's just an API interface. And like, you get to see their website. Doesn't look amazing though. Yeah, I gotta say.

Demetrios Brinkmann [01:26:57]: It'S okay if that happens, but they don't want to do that. You know, maybe Amazon's a bad example because they're, they're going to throw billions of dollars at the gateway so that they're the number one provider, agent or tool, whatever it may be.

Andy Pernsteiner [01:27:11]: Well, you open up your phone, you're going to have like the top five apps on your phone, whatever that is. Right. For a lot of us, it's some kind of communication app. And you know, for others it might be, you know, something to either buy things or potentially, you know, like in, in OpenAI's world, you would just. Only just have chat GPT and you just do everything.

Demetrios Brinkmann [01:27:30]: Like.

Andy Pernsteiner [01:27:30]: Right, it was like that. Right. And. And I think I can see value from myself in having that life where I just open up my phone and I only have one thing I have to think about. But there's also a trust issue in terms of wanting to have a little bit more control over my experience.

Demetrios Brinkmann [01:27:46]: Well, that. And now all of a sudden I have to trust that the prompts that I'm putting in and everything that's going into that context window is A, being executed properly and B, like, what if some of that stuff that I want or that I put in the context window is sensitive? How do we know that that's not just like now floating around the Internet and we've got data brokers on my context windows?

Andy Pernsteiner [01:28:15]: I mean, you probably will, which is scary, right? So, I mean, the EULAS on these things are probably. I don't read them all.

Demetrios Brinkmann [01:28:24]: Nobody, let's be honest, nobody reads.

Andy Pernsteiner [01:28:27]: The longer you make them, the less they'll read them. The longer you make them, the more you're trying to hide in them.

Demetrios Brinkmann [01:28:30]: Like, it's just. Yeah.

Andy Pernsteiner [01:28:34]: Yeah. That's an interesting thought process though, because even we, like, like we offer sort of differentiating software and services, but ultimately the people who are consuming would just like to have a relatively simple API interface where they get the thing they want. Right?

Demetrios Brinkmann [01:28:51]: Yeah.

Andy Pernsteiner [01:28:52]: And how do we differentiate in that world? I think that having reliability and consistency, like, because that's like the definition of a tool to some extent. Right. It's hard to make a tool. Maybe it doesn't seem like it differentiates, but in some ways it kind of does. If you have a tool that you could always rely on, you're always going to use it.

Demetrios Brinkmann [01:29:11]: Yeah.

Andy Pernsteiner [01:29:12]: So I don't know, like, it's not as sexy sounding because, like, an agent seems like it's smarter and can do more. But ultimately, if you have a tool, if you do something really well and you can expose it in a way that's very consistent and reliable, that has value. Yeah.

Demetrios Brinkmann [01:29:29]: Well, then the. The thing is, is like, are the agents going to be able to place a value on that and go back to it more and more? Which maybe they are. I'm not arguing they aren't. Yeah, it was a little bit of a trip when I went to San Francisco and. And I told these folks who were creating a mcp, like, glossary or MCP registry is what it was. I was saying, hey, you guys should have user reviews of the different MCP servers. And they were like, we're going to have agents reviewing the MCP servers and saying how useful the documentation is from the agent angle. And I was like, you're kind of cosmic here on this agent reviewing the agent thing.

Demetrios Brinkmann [01:30:18]: But I could see it. And then you're like, well, then I'm gonna have to have my agent or my tool market to the agents. And I start going down that road and I'm like, this is now crazy.

Andy Pernsteiner [01:30:30]: Yeah, like, the. It's not even my head hurts. It just, like, gets in a place where I lose track of where I am in the universe. Like, it's just. How can you even comprehend it? You actually brought something up about privacy, though, which is really interesting in some ways, because I think a thing that people forget about when they think about scale is security and privacy. Like a lot of the clients that we talk with, it's again, you start with your laptop. You make some cool little graph rag thing on your laptop, and it works great for all your stuff. And you're like, now I want to roll it out to my whole team.

Andy Pernsteiner [01:31:04]: And then they're like, yeah, but I don't want you seeing my stuff. And then it gets stuck. And I think that that's A place where people have been stuck. And that's another area where we try to focus on, okay, how do we create an interface at every layer of infrastructure that ensures that you can have sort of lineage governance and authorization, not just for source data, but also for the derivatives. And then ultimately, if you're performing some kind of a rag search or the vectors lab so that it knows who you're supposed to be, and all of these other pieces, like, those are things that there's lots of different solutions for that are disparate from each other, but they're focused on one area or another. We're trying to kind of be. We're at the ground layer, let's say. And if we can enforce it all the way down so that individuals don't get access to other data regardless of how it came in, then we've gotten to a place where now you can scale this thing that you built.

Andy Pernsteiner [01:32:02]: Right. And have some level of security. Like, everybody's like, people like to talk to me about security, and I hate talking about security. I'll be honest. Here's the thing. If security is easy to implement, then more people do it. And so the requirements may be very difficult, but if you can. This is why I like to focus on the foundational layer and make sure that you don't have some other way in.

Andy Pernsteiner [01:32:28]: Because if you implement security to policy layer up here, but there's another way in to where the data is, then you haven't actually solved the problem. You only solve the problem for everyone who's going up here. Right. And so that's why I think that scale isn't just like, am I making or productionizing applications? Isn't just, can I make sure it operates if there's failure? Can it operate if it gets bigger and bigger with more users? Can it operate when there's more user communities where I can ensure that they're all safe? Right. That's like a hard thing to do.

Demetrios Brinkmann [01:33:00]: It's like when Germany invaded France in World War II, you know, and France was like, we got you. We're never letting you come an inch into our territory ever again. And they build all these barricades. And then Germany's like, well, I guess we could just go through Belgium.

Andy Pernsteiner [01:33:19]: Yeah. I mean, wow, okay. Down the war path. It.

+ Read More

Watch More

ML Battle Stories

Posted Apr 10, 2023 | Views 483

# ML projects

# Log Transform

# Etsy

Efficient GPU infrastructure at LinkedIn

Posted Mar 28, 2025 | Views 1.2K

# GPU

# LLM

# LinkedIn

Posted May 19, 2022 | Views 388

# Run:AI

# GPU

# CPU

# Run.ai