Sign in or Join the community to continue

How Universal Resource Management Transforms AI Infrastructure Economics

Posted Jan 20, 2026 | Views 56

# AI Agents

# AI Engineer

# AI agents in production

# AI agent usecase

# System Design

Share

Speakers

Wilder Lopes

CEO / Founder @ Ogre.run

Wilder Lopes is a second-time founder, developer, and research engineer focused on building practical infrastructure for developers. He is currently building Ogre.run, an AI agent designed to solve code reproducibility.

Ogre enables developers to package source code into fully reproducible environments in seconds. Unlike traditional tools that require extensive manual setup, Ogre uses AI to analyze codebases and automatically generate the artifacts needed to make code run reliably on any machine. The result is faster development workflows and applications that work out of the box, anywhere.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Enterprise organizations face a critical paradox in AI deployment: while 52% struggle to access needed GPU resources with 6-12 month waitlists, 83% of existing CPU capacity sits idle. This talk introduces an approach to AI infrastructure optimization through universal resource management that reshapes applications to run efficiently on any available hardware—CPUs, GPUs, or accelerators.

We explore how code reshaping technology can unlock the untapped potential of enterprise computing infrastructure, enabling organizations to serve 2-3x more workloads while dramatically reducing dependency on scarce GPU resources. The presentation demonstrates why CPUs often outperform GPUs for memory-intensive AI workloads, offering superior cost-effectiveness and immediate availability without architectural complexity.

+ Read More

TRANSCRIPT

Wilder Lopes [00:00:00]: People in Africa, for example, they are trying to get second hand data centers from banks in order to build AI data centers. So you can only imagine, like what is a second hand data center that was used in a bank, by a bank for over 10 years. What does it look like? You know, can you even program that thing?

Demetrios Brinkmann [00:00:27]: I'm sure you get this all the time being in Oklahoma. People mispronounce your name, I guess, left and right. They. Yeah, and it's spelled so perfectly, it's like Wilder. You are the wilder one, but you pronounce it differently, right?

Wilder Lopes [00:00:44]: Yes. We say Welder in Brazil, but you can call me Will.

Demetrios Brinkmann [00:00:49]: Yeah, all right. Will or Will the Wild. I'm excited to talk with you about this whole generation of hardware and accelerators that is being almost like forgotten about because we think that we need the newest H100B100 whatever, insert, whatever the top notch hardware is from Nvidia that just came out. And you're taking a bit of a different approach. Can you explain this approach to me? Sure.

Wilder Lopes [00:01:28]: So the basic idea here is that we already have enough compute power outside of Nvidia GPUs in order to run a lot of this AI workloads. So what is this compute power? This is basically CPUs and other accelerators that people are not using using. So we need to understand that since three or four years ago, the situation changed from, you know, prioritizing training to prioritizing inference. And this is a huge, you know, mentality change. In my first company, the, that was founded in 2018, we are just focusing on training. How can we make training better? How can we make that more efficient? And the answer was usually like either making better algorithms for the best hardware or no, just buy like more hardware. And this usually meant Nvidia GPUs. Great.

Wilder Lopes [00:02:31]: And then comes like 2021, 2022, we see inference start to start becoming this main workload. And then GPUs the way they were made before were not the silver bullet anymore. So now you see a lot of companies, I run a lot of workloads, AI workloads, inference workloads, using just a machine with CPUs. And how is this possible? Well, there are many things in there that are maybe too technical for this conversation, but that's the kind of understanding that I think that a lot of people forgot because hardware became so chip cheap up to 2021 that, okay, let's just buy more hardware. But now we don't even, we don't even have the resources to make the Hardware that we need? You know, we have to wait for six months. So are we going to wait six months to get the best hardware in the world or are we going to start doing work right now? And that's the mentality that I think we need to readopt because we forgot about that.

Demetrios Brinkmann [00:03:40]: Well, so I know that there's been a ton of work that happens around things like Llama, CPP or ccp. I can't remember which one it is. I always get it confused. And there's ways that you can obviously distill models or you can make them smaller. But our first instinct is let's try and get the biggest model on the biggest machine possible. Can you just talk to me about what the shift is like if we're going to be running it on CPUs or if we're going to be running it on a bit older hardware? How do you think about that?

Wilder Lopes [00:04:24]: So the great thing about the GPU approach is that you have to consider that the models are very big, but the data is even bigger. So what you're trying to do is to get a bunch of GPUs together to work as one and then they're going to have this pool of memory and that where you're going to be able to fit the, the model and the data. Great. But doing this is very, very, very expensive, especially because, you know, those GPUs are, they have a lot of power, computing power embedded in them. When you go to inference, the main roadblock is not the compute, it's memory. So if you are trying, if you're thinking about creating or designing a system for inference only, then you want to prioritize your resources to get as much memory as possible while still keeping a good compute power. But you don't need to have the best, best, best in the world. So with this, if you understand this, you can see that systems today that are equipped with CPUs only can benefit from memory expansion in order to run inference.

Wilder Lopes [00:05:41]: And there is a technology that has been, you know, in the market for a while now that's called cxl, which is basically RAM on asteroids. And a few key companies around the world sell that, like for example, Micron here in the US and these are like chips that they're basically only people who deal with data center know what CXL is. If you are just like an AI guy or if you're working on in other areas. People never even heard of that. But this can be really helpful in order to expand the ram, the memory capabilities of A server. So there are companies, as I already mentioned, that are selling just an expansion card and then you can get a server that has already a good amount of RAM and then expand that even more, enabling you to load, to load models and load data on the go. This way you know, you don't need to spend hundreds of thousands of dollars getting a GPU system that will have a lot more compute power that you need just because you are after the run that it carries. So that's the idea.

Wilder Lopes [00:06:56]: So there are technologies that are getting more and more popularized such as cxl and you will see a lot more now because you know the price of RAM is increasing a lot since the, since December 2025. So you will see a lot more people trying to leverage cxl.

Demetrios Brinkmann [00:07:16]: What are the trade offs that you have to be thinking about when you are doing something like that? Like leveraging cxl?

Wilder Lopes [00:07:25]: I wouldn't say like CXL per se. I mean there is a matter, there is an issue with capability here and compatibility because not all servers are CXL compatible. But this is going to change quickly. But I think the main trade off here is going to be tooling. So as I said, people are so focused on GPUs and the tools work so well, they they don't even want or can't take the time to do the transition. Get that all the tooling that works out of the box with GPUs to work on other chips or CPUs.

Demetrios Brinkmann [00:08:07]: By tooling you mean like Slurm and Kubernetes or what kind of tooling are you talking about?

Wilder Lopes [00:08:12]: Slurm, Kubernetes and mainly like the lower level tools such as Cuda. For example, how do you program a gpu? How do you program, you know, all those advanced chips? So if you compare side by side like CUDA with other tool that I don't know, hip, for example from amd, CUDA seems to be way more developer centric or developer friendly at least for the AI crowd. If you are a data center guy, you will see, okay, I can, I can handle that. But this is the main trade off. And that's exactly, you know, one of the points where I've been working with, with a few leads and clients.

Demetrios Brinkmann [00:08:59]: Yeah, because a lot of times, especially for the folks that are doing AI workloads, all of the CUDA stuff is abstracted away. And it's even like sometimes you're looking at managed Slurm and you're looking at managed Kubernetes. You don't have to even be thinking about that layer. Either you're just doing your stuff and it works. And so it feels like what. Where you're saying there is this issue, like we've got all this hardware, we can leverage this hardware that we haven't been thinking about, but we need to make it more developer friendly so that you don't have to be thinking like, oh, I was proficient in Cuda, or maybe I wasn't even, I didn't even think about Cuda. Now I've got to learn some other random thing because this doesn't even run with Cuda.

Wilder Lopes [00:10:01]: Right. And you know, I'm all about all for abstraction. However, abstractions are great if you, great if you live in an ideal world. If we were able to get all the compute power we needed from Nvidia and if we were able to fabricate that, yeah, sure. But that's not how the world works. And on top of that, there are different applications and there is, you know, companies have already invested hundreds of millions of dollars, even billions of dollars buying infrastructure that wasn't AI friendly. And now they just want to, they, they are asking people like me and say, hey, how can I make this thing that I already have to work for AI? And I mean, there are many ways you can answer this question. And that's a real need.

Wilder Lopes [00:10:59]: You're just, you're not going to just throw that hardware away. I mean, that's a waste.

Demetrios Brinkmann [00:11:04]: Yeah. So you're almost like dragging forward this expense that companies have had. You're bringing it into the new age of AI.

Wilder Lopes [00:11:16]: Right, Right. One very common use case, and you will see more and more in 2026 is the, we call it like the conversion of data centers that were previously dedicated to bitcoin mining into AI powerhouses. There's a lot of money into that, like from private equity and venture capitalists who are investors in data centers that today are not profitable anymore because bitcoin mining is not profitable. But they do have the compute power sitting there and they have the contracts for, you know, power and like electricity. So with those two things combined, if you can have tools and knowledge in order to convert that to be AI friendly, then you already have a data center ready to go to run AI, at least inference.

Demetrios Brinkmann [00:12:14]: And coming back to the developer who is just try and run an LLM on a machine that isn't necessarily a gpu. Where and how should they be thinking through, like the best ways to go about that? I still keep running through my head like, is it, should we be thinking about smaller Models, should we be thinking about a shirt, a certain shape of AI workload, or is it that you can kind of retrofit any type of hardware to the needs that we have?

Wilder Lopes [00:12:59]: Right, so let's start with this model size. So there is a lot of interest in smaller models and I think that's the way to go for most use cases. I mean many people wrote papers like proving that the small models perform even better than bigger ones for a very self contained use case. So I think that's the way to go. I see a lot of developers going this direction and yeah, that's great. The thing is when you start to scale, even if it's a small model, you still have that data sitting there. You want to still process that data. Right? So and you want to process that fast.

Wilder Lopes [00:13:40]: This, if you continue to use the same tools or if you continue to adopt the same hardware that we do today for bigger models, you're not going to get that because first you're working with a small use case, so you probably don't have the money to spend the latest and greatest hardware. Second, once you, even if you had that thing, it would be an overkill. So what is, how can you solve this equation of like the relevance and the size of your problem and being compatible with the compute power that you have and the compute power that we have? As you know, society is already great. I mean, think about it like, when was the last time that you consider like, you know, changing your personal computer, like getting a new one? Like the computer I'm talking to you from right now is like it's three years old and I'm a computer guy. Like you know, I could easily be buying like the latest and greatest and I just don't need, I can extract maximum performance from an older machine. So this philosophy I think will be necessary for developers to continue to move forward. Because again, supply chain is a problem. Even if you had the money, you couldn't buy the latest and greatest today.

Demetrios Brinkmann [00:15:13]: So there's the small model revolution that we're seeing. And I also wonder about the idea of coding agents and the shape of that workload. And if you have any thoughts there.

Wilder Lopes [00:15:31]: I've been working a lot with agents from a user standpoint. Like I use tens of agents to do several work in parallel for me. And what is great, I think the way agents work is like from a developer perspective, I give them very well defined tasks, meaning that they don't diverse, they don't get out of their lane. This reduce computational need. So you can do that Work for one agent in a very small containerized, so to speak, hardware and software space. So if we scale our work in order to use hundreds of thousands of agents, then we can have like smaller machines or hundreds of thousands of smaller machines providing the infrastructure for those little guys. So again, I see that you don't need the latest and greatest to run agents. So that's the way I think.

Wilder Lopes [00:16:39]: This is very important to emphasize because when I talk to people who are not technical, they are like, you know, they are CFOs, they are CEOs who are just trying to, to go into the AI era. They always ask me, hey, but I don't, I need to buy a GPU server. No, you can start with this regular Dell server that you already have in your office. And this is going to be more and more evident as people start to get or to embed AI agents within their, you know, software, SaaS, software using a lot of this and also indeed developer workload.

Demetrios Brinkmann [00:17:23]: Yeah, it's almost like how far can you get with just the consumer hardware that you have? And once you've maxed that out, then you can upgrade to the PRO version and really see what is needed. But I appreciate your vision a lot because it's, it's kind of saying, hey, let's take a step back and let's leverage what we already have. And for a lot of the workloads that we instantly just default to, oh, let me go grab some GPUs off of wherever it may be. Like the newest NEO cloud, we probably don't need that in a way that we think we do. And so getting creative on how we.

Wilder Lopes [00:18:11]: Use.

Demetrios Brinkmann [00:18:13]: The hardware that we have is fascinating. And it makes me think like, up until now we've been talking about how to leverage hardware that potentially you've already bought, but then you can retrofit it and you can add it and almost like pimp my ride of my old hardware.

Wilder Lopes [00:18:35]: Right.

Demetrios Brinkmann [00:18:37]: I feel like there's also a world where we're going to start seeing these offerings happening just like we have the NEO clouds that are offering GPU time. Why? And it probably is already happening and I just haven't seen it as much. But why not have the, hey, this is a really cheap option for you and it gets the job done.

Wilder Lopes [00:19:05]: I think this will happen. People are already starting to change that because in a sense, cloud providers have done this for over a decade now. If you go to AWS and you try to get a machine there, what you're getting is what is a virtual cpu. It's basically, it's a virtual things, you're not getting the whole hardware. So what they're doing here is they're basically proving that you can run a lot of those tasks using less hardware. And we've done that for over a decade. Obviously they charge a lot for that and I believe it can be way cheaper. So as people wake up to this problem, you know, they, they've perceived that they're not going to be able to buy the latest and greatest from Nvidia.

Wilder Lopes [00:19:56]: So how can they better use the machines that are in the cloud? Um, the way to do this is to change the way to deploy stuff. So with the Nvidia tools, with the AMD tools, it is somewhat easy. So everything is abstracted as he said. So you are seeing a Python code and everything is done in the back, in the back end for the other chips. And here I'm including as well the new chips that are arriving the market. Right. So we can talk about that later on. But there are one of the biggest attractive points for developers when they see a new chip is, oh, I can do the same task for, you know, it's much cheaper and it's going to be faster.

Wilder Lopes [00:20:44]: However, I need to learn an entirely new language and no professional developer, at least very few will do that. They don't have the time. They just don't have the time. They're excited about it but, but they need to get work done. So in this point, like we need, I see that there is a huge gap there in terms of infrastructure to help those guys. So how can we help them to do this transition as easily as possible, as smoothly as possible and there aren't enough tools out there.

Demetrios Brinkmann [00:21:22]: Yeah, the idea of learning a new language is great in theory, but then I guess what is probably happening now is that you'll just send Claude code over to learn the language for you and hope everything goes well.

Wilder Lopes [00:21:40]: Right? Yeah, the Claude code is a very good, you know, window into this world because it already has some, might have some knowledge about this either. Not a language per se, but a framework that's so obscure that almost nobody knows and it can give you like a nice little platform in order for you to build your application at. As things get better or faster then we're going to be able to just, hey, we're going to get some agents in charge to doing this and then get, just get the results. But again, when you're talking about enterprise and really like production grade software, we are not at the point right now where we can leave agents by themselves to do that. So we need still like to give the tools to the human developer so they can learn and assess if what the agents are doing is correct.

Demetrios Brinkmann [00:22:49]: Yeah. The phrase that comes to my mind is hope is not a strategy.

Wilder Lopes [00:22:55]: If.

Demetrios Brinkmann [00:22:56]: You'Re just throwing a bunch of coding agents at it. That is literally the definition of hoping everything is going to work out. Right. The other thing that I was thinking about too is talk to me more about the other chips because I think we hear a lot and we see a lot. Especially now. Recently Grok got semi bought by Nvidia. And then you have, what is it? Cerebras is another new chip. There's all of these chips that are coming out and saying for inference, we're faster and we are going to be a better bet.

Demetrios Brinkmann [00:23:38]: Cheaper, faster, whatever. Better.

Wilder Lopes [00:23:42]: Right. There were multiple bets in the last, I would say five years, seven years. And let's go through their perspectives. A lot of them were about saving energy. Let's make a very efficient chip that can do the same job for a fraction of the power consumption. To my knowledge, most of those companies have failed because unfortunately in this area, nobody's thinking about saving energy. You know, that's not a first priority. And a lot of those companies had like amazing chips and they, they weren't just picked up because people don't care about that feature specifically.

Wilder Lopes [00:24:30]: And again, the developers needed to learn something completely new. The technology is still there, can still be useful, but it didn't become mainstream. The other perspective was about how can you do the math inside a chip in a completely different way? And I was involved with that in my very first company. We were doing some stuff. We were starting to make a chip in order to do the mathematical operations in a different way. Not based on linear algebra, based on something else. And this kind of stuff is also very, very, very interesting. And people get excited about it.

Wilder Lopes [00:25:14]: But then you hit the wall of fabrication. There are so many problems that need to be solved at the hardware level that it becomes impossible. Like you can't even model that, you know, like using a regular fpga, because the benefits that come from those new mathematical strategies, you cannot even prove them physically implementing an fpga, you would have to do like a new FPGA or a new chip completely from scratch just to prove that it works. And as you know, this is very expensive, so you need to have hundreds of millions of dollars.

Demetrios Brinkmann [00:25:53]: And this is on the level of the, like the silicon or the wafers that you're creating.

Wilder Lopes [00:26:00]: Not only silicon, not only the Silicon and the actual architecture and you know how you build, but also in the microcode, which is like, think of like a firmware for the chip. So you know, the operations are done differently there. So technology still exists, still exciting, but it requires like it didn't find a killer application. What I see today, especially in the last two years, it has become more and more mainstream, is still like coming from this idea that by changing the math you can do something better. There are companies like I think tensordyne, they're actually from Germany, the headquarters in Munich, and they're doing something that's great is basically they're using logarithm logic arithmetics in order to do AI computation. Because in AI, if you know, I mean, I'm sure you know this, you don't need this huge precision in order to calculate the weight. So you can do that with much lower precision. And when you adopt logarithm arithmetics, you can even like decrease it further.

Wilder Lopes [00:27:15]: So I mean, I recommend you to have a look at those guys. I think, I think it's great what they're doing. I don't think they have a chip yet, but that's something. I've been in touch with them, I wanted to test their chip and those are basically the approaches. And I think again, just to summarize, if you chips that have prioritized saving energy have failed because nobody cares about it, unfortunately, the ones who are probably going to win are the ones who really focus on one thing only, which is in our case, AI inference. How can we make the best inference chip in the world? And that's what GROK was trying to do. GROK is amazing. I mean what they were doing with such like.

Wilder Lopes [00:28:01]: And every chip has, you know, the amount of memory there, like ridiculous, very small. So they needed to put thousands of chip together just to run on. I don't know. Okay, don't quote me on that. I don't know if it's hundreds or thousands, but just to run one LLM. But it was fast. Very, very, very, very fast. And that's why like even Nvidia had to go out and shop for them, you know, they had to admit to the world, yeah, our technology cannot do that.

Demetrios Brinkmann [00:28:33]: And do you see fundamental approaches that are different coming forward? Like on top of this, do you feel like there's still room for the innovation in that regard?

Wilder Lopes [00:28:48]: Yeah, but it's not in silicon. I think in photonics, there are a few photonic chip startups coming out right now and they have some impressive benchmarks and you know, Photonic, you're doing like computation straight with light. So when you do that, you can get rid of a lot of the limitations that exist in the silicon world. But what's probably going to happen is to have like a photonic chip working together with a CPU like it always is, right? There is an accelerator that communicates with a cpu. So I don't have, you know, hands on experience in with photonic chips, but from the conferences I've been attending, like I've been talking to those people, it is very exciting. And another thing that I've been seeing as well is like a revival in hpc. So AI has become so big that started to incorporate HPC into it. And it's great.

Wilder Lopes [00:30:02]: AI can solve a lot of the traditional HPC problems much, much faster. But there are still problems that cannot be solved by AI, at least the AI that we have today. And we still need like this high precision, like 64 bit. So I see companies coming out with new CPUs and GPUs that are focused on high precision like 64 bit. And I see that there is a lot of space for this right now as well. Because you know, Nvidia has, I want to say abandoned, but they focus so much on AI that their chips became way too expensive for hpc. So if you are a regular HPC shop, you probably wouldn't be doing this with any Vita anymore. You can do this with other chips that will give you the precision that you need and are going to be much cheaper.

Demetrios Brinkmann [00:31:00]: And can you talk to me a bit about how you've seen the best hardware combinations as far as maybe it's GPUs working with CPUs and what that looks like?

Wilder Lopes [00:31:14]: Well, in terms of architecture, I think each company or each team is different because they have different problems. I have a ton of experience with the traditional CPU plus some accelerator. And this could be an Nvidia GPU, an AMD GPU or even FPGAs. So to make those guys to communicate with each other because they have to, traditionally you have to use PCIe and there has been throughout the years a bunch of upgrades to the PCIe protocol in order to be able to exchange as much data as possible. And this is great for this vision of, you know, let's use the compute power that you already have, because if you can update PCIe in order to exchange as much data as possible, then you're going to have a fairly fast machine to perform your AI workloads. Now if you're building something like the, the starship from OpenAI. You need a completely different way of completely different protocol to communicate data. Like, how can you make those hundreds of thousands of GPUs to perform as one? That's why Nvidia has NVLink and other things that are so specific, because those are the extreme use cases.

Wilder Lopes [00:32:50]: And it's great. I think that's the way to go. But that's one company, 99% of other companies don't need that thing for their use case. So how can we still like, use or reuse the protocols that we are used to, like the PCIe for example? And I see that this continues to be very strong. I went to supercomputing last year and yeah, it was in. Was it November? December? No, I don't remember. And people are, you know, continue to push the limits of this thing. And there is so much coming out from this area.

Demetrios Brinkmann [00:33:33]: And with PCIe, is it possible to mix and match different GPUs?

Wilder Lopes [00:33:41]: Yes, yes, you can do that. I used to have a machine where I was running different GPUs from different vendors and it works great. It's always about the work you're willing to put in in order to make those tools work. We've done this so much. I personally have done this so much that we started to create a framework on how to do this. And that's what we are trying to deliver to our clients today. So how can you mix and match, have a heterogeneous system that works for you?

Demetrios Brinkmann [00:34:20]: Yeah. Talk to me more about what you're working on and what you're building, because I find that fascinating too.

Wilder Lopes [00:34:27]: We are a DevTools company, so we are developer first. We are trying to help the developer to have an easier life when deploying AI. So what we build is a tool that's based on a. We call it a transpiler. We assume that the world is. The world speaks Cuda, the world speaks Nvidia, but we still need to speak other languages because we have different applications. So how can we have like an instantaneous translator from the Cuda world to another world? And that's what we're trying to do here is like a plug and play solution, a transpiler that, you know, it's a common line, very low level. But for the developer who is doing this every day, it's not difficult to use.

Wilder Lopes [00:35:11]: So you just need to install the transpiler and then you go inside the folder, you're working the folder that contains your CUDA code and then this will convert that Cuda code to the computing power that you have in a given computer. Let's say that you, you develop, you are in your laptop, you developed everything to run on an Nvidia capable server, but then the server that's available to you, which will be much cheaper, only has AMD GPUs. How do you do that? I mean you can't, you won't be able to just rewrite everything from scratch. Those are different animals. So how can you quickly move from one world to another? And that's what we are, we are working on today. So we have our first mvp. We are working on a use case, specifically the use case I already mentioned, which is helping the previously bitcoin centric data centers to transition into AI. But yeah, there is so much more to be done because it's a very low level work.

Wilder Lopes [00:36:19]: Like we are not even touching Python. So we are doing a lot of stuff with C and using the LLVM framework in order to break down stuff and see how intermediate representation works for several different types of workloads. Yeah, it's not easy, it's not supposed to be easy, otherwise everybody could do that and it would be great. So we are trying to make it easier for people to use because we believe in a world where, you know, Cuda will be a lingua franca. But there's gonna be huge communities that don't speak Cuda and we want to enable those developers to tap into those communities. And by community I mean computing power. Hardware.

Demetrios Brinkmann [00:37:06]: Yeah, hardware of course. And you wanna stick to the software layer as opposed to. Cause I feel like there's an easy play for someone to say, look, I'm just going to go and I'm going to grab a lot of these abandoned hardware accelerators and create like a misfit cloud, Neo cloud, where you could have that as your, your value prop too.

Wilder Lopes [00:37:34]: Oh yeah, that's, that's our very next step because.

Demetrios Brinkmann [00:37:38]: Nice.

Wilder Lopes [00:37:38]: And this, this all depends on capital. So you're like at the very beginning of the company, but if things work out, one idea, and I dream about this every day, is to have this. You know, I don't know the best way to put this, but it's a cloud of, you know, heterogeneous cloud. So we can offer you every chip that exists under the sun and you as a developer have two options. You can give us the code or the binary and let us handle that for you. So we're going to optimize for what is the best hardware to run your code, or you can have the power to choose from all of those options. The second option already kind of exists in the traditional hyper scalers, but it's so difficult to use. So if you're trying to use not CPUs, but like, for example, if you are like at the, at the leading edge or trying, trying new ideas on the hardware you want to use an FPGA.

Wilder Lopes [00:38:44]: AWS offers FPGAs in the cloud. It is so complicated for somebody like me who is experienced with FPGAs, the whole, the whole tools, the tooling and everything, it is so difficult to make it work that I would love to have just like a translator to just, hey, grab it, grab my code and just put in there. Actually this is something. Hopefully I'll be able to publish this in a couple of months. I'm working. This is small FPGA from this company, Rocketry. It's a very small fpga. And I'm creating like an example here on how to help AI developers on how can they make like a minimal, very minimal accelerator.

Wilder Lopes [00:39:30]: So they plug this into their computer and then they can just play with, you know, moving operations, let's say a matrix multiplication to this little guy here. The focus here is not to be fast, it's just to teach them or to give them more visibility on what it takes to actually make an accelerator. And once you understand that, you can see like the possibilities like, oh, wow, I can do a lot if I just, if I have the chance to, to spend more time on the hardware side. And that's, I think that's the main thing that we lost in the last 20 to 30 years as a community. Hardware has become so good that people abstracted everything and they just assumed that it would be there and they can just code something and it will work. And for the most part it did. Now we hit another wall and we need to go back to hardware. And I'm an electrical engineer by training.

Wilder Lopes [00:40:35]: I'm not a computer scientist. So I didn't start with coding. I started with like actually designing this sort of, you know, little boards. So I see the problems that we're facing today in the AI community from a different lens. I see this as really a harder problem, understanding what you can do with the, you know, as little hardware as possible.

Demetrios Brinkmann [00:41:01]: I do love the vision of bring your code to the NEO cloud. And I don't have to think about what kind of hardware I'm going to get. I just know I'm going to get a screaming deal and I have it in my, I write it in whatever language I'm trying to write it in. And then you do all the hard work of figuring out how to make it work on all these random redheaded stepchilds of hardware.

Wilder Lopes [00:41:35]: Right? Exactly, exactly. And I think chip designers are not going to stop. There's going to be every week I see like a new idea either coming out, you know, as a spin off from a lab or somebody who is a veteran in the industry. Like spent, I don't know, tens 15 years at intel and now they're coming out with a new company that promised a new kind of chip. So this is not going to stop, but they are only going to be successful if the tools that connect those chips to the developers are there.

Demetrios Brinkmann [00:42:19]: A hundred percent. Otherwise you have this fragmented ecosystem and folks aren't going to put in the time to learn how to leverage your accelerator. If it's like, yeah, I may encounter that in a random job, but let's be honest, I'm going to get way more bang for my buck if I learn Cuda, right.

Wilder Lopes [00:42:45]: And another aspect of that is like if you think about geopolitics, we are like, I mean you are in Germany, I'm in the US relatively. You know, they're privileged countries when it comes to access to compute power. If you start talking to people in South America, in Africa and I, I do like I'm from Brazil and I know people also doing similar work in other parts of the world. You will see that people in Africa for example, they are trying to get second hand data centers from banks in order to build AI data centers. So you can only imagine like what is a secondhand data center that was used in a bank, by a bank for over 10 years. What does it look like? You know, can you even program that thing? Most, most, most programmers today, they, they can't. But that's the kind of material they have and they're not stopping because of that. You know, they know they can't get the, the Nvidia GPUs.

Wilder Lopes [00:43:51]: So we, we, we these people, they have to outsmart their, their competitors in order to be able to do the same work with less hardware.

Demetrios Brinkmann [00:44:04]: So changing gears for a second, I know that you are in Oklahoma and around there you got a lot of oil men. I, I'm currently watching this series called Landman. Yeah, I feel like I am living vicariously through the oil men through that series. But that's your day and day and I know that you have some thoughts on like the middle of America and the AI revolution as they perceive it versus the coastal Cities and how they perceive it.

Wilder Lopes [00:44:45]: Right. So we, you and I, as I said, we are used to the Silicon Valley mentality, like very open, let's build the future. There is a lot of here, I would say, like it's not, they are not excited about the future. The problem is they are very risk averse and that's most of the country actually. I think people in the US who are actually excited and building the future, those are only two cities, It's New York and San Francisco and Silicon Valley. When you go everywhere else you talk to people, they are excited, they see the opportunities, they just don't take the first step because there are many roadblocks. And this, in my opinion, this comes from education and exposure. We simply don't have enough people with experience leaving those areas.

Wilder Lopes [00:45:46]: I came here because of family, but it's hard to find people to even talk about this on the street. Obviously after three years I found a little community here, but it's hard to find those people who have this experience in big tech or big enterprise and were in charge of making these decisions in the past. So you just don't have the culture. How do you change this? Because there is money here, there are use cases. It's a perfect place. We have energy, which is the commodity that you need for AI today. You have tons of energy. How do you change that? Well, you need first more people with the right background to be here.

Wilder Lopes [00:46:39]: We need to create more opportunities for people to leave the coast. And this is hard to do. Yeah, you need to create more opportunities here for those people to meet. It's even hard to meet those people. We, we don't know we exist. Like I have to go out hunting here, like on LinkedIn, hey, what are you working on? And once you do these connections, then people get excited like and start working on side projects, like what I'm doing today. But I don't have a perfect answer for that. I think if we are able as technologists, if we are able to show them that we have a killer solution for the problems that they have here today, then they're going to be excited about it.

Wilder Lopes [00:47:24]: It's not about selling the technology. This doesn't help here. I can go to Silicon Valley tomorrow and start talking about this and everybody's going to be excited. Even I get out of the airplane and start talking to people and oh yeah, let's do something here. You cannot sell the technology. You can only sell, you know, like a real closed solution. And since AI is very experimental right now, we're probably not in the right phase to do this, but this is going to change. And I think 2026 is the year for that because, well, data centers are being built and this region is very attractive for this area.

Demetrios Brinkmann [00:48:06]: Yeah, you got the energy.

+ Read More

Watch More

The Role of Resource Management in MLOps

Posted May 07, 2022 | Views 1.3K

# Run:AI Atlas

# ML Inference

# Resource Management

AI/ML Product Management

Posted Mar 11, 2024 | Views 437

# AI

# ML

# Klaviyo

# Management

Building Data Infrastructure at Scale for AI/ML with Open Data Lakehouses // Vinoth Chandar // DE4AI

Posted Sep 17, 2024 | Views 1.3K