MLOps Community
+00:00 GMT
Sign in or Join the community to continue

The Missing Data Stack for Physical AI

Posted Jul 01, 2025 | Views 25
# Physical AI
# Robotics
# Rerun
Share

speakers

avatar
Nikolaus West
CEO @ Rerun

Niko is a second-time founder and software engineer with a computer vision background from Stanford. He’s a fanatic about bringing great computer vision and robotics products to the physical world.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Nikolaus West, CEO of Rerun, breaks down the challenges and opportunities of physical AI—AI that interacts with the real world. He explains why traditional software falls short in dynamic environments and how visualization, adaptability, and better tooling are key to making robotics and spatial computing more practical.

+ Read More

TRANSCRIPT

Nikolaus West [00:00:00]: And it's not time. It's like, oh, this is annoying to wait for. It's like the world continues to do things independent on how fast or slow you are. Like, if you're a slow person, that doesn't change the speed of the rest of the world. Right. And that's the same with a robot. Okay. So I'm Nico west, the CEO of Rerun, and for my bulk use of coffee, it's pretty.

Nikolaus West [00:00:28]: It's filter coffee with milk. But I'm an enjoyer of many kinds of coffee.

Demetrios [00:00:35]: I think we should kick this off with an overview of your opinions on physical AI versus robotics. Because I've heard about robotics and I've also heard from people who are doing robotics, that the big letdown is there's not a lot of AI inside of robotics. Most robotics companies that you hear about.

Nikolaus West [00:00:58]: Sure. So I think physical AI as a term, I at least saw it popularized by Jensen quite recently. So therefore it's obviously subject to a lot of the problems of hype. Right. But I was really happy when that happened because we've been looking for a term that that kind of encompasses products that use AI or AI in the broadest sense. Maybe the classic just algorithms that do intelligent stuff to really deep models and that kind of stuff. And that basically applies intelligence to the physical world, either to just analyze it or to do something in it. So I include like intelligent sort of, maybe somewhat autonomous robotics in that, but also spatial computing and sort of, I don't know, we thought of something like long tail physical intelligence, like, I don't know, like security applications.

Nikolaus West [00:02:00]: There are just a lot of different, I don't know, sports analytics. So just a very large amount of stuff you might want to do with intelligent software that's somehow interacting with the world. So I put, personally or we put all of those things into the bucket of physical AI.

Demetrios [00:02:16]: Like video games too, when you're grabbing all those sensors.

Nikolaus West [00:02:20]: Yeah, I think. Well, for our. What VRUN is doing, I think it's very relevant. Maybe I don't put video games into physical AI, even though it's so close that you could talk about it. I think there's also. The other adjacent things are like generative media you could totally put in there. There are so many similar kind of patterns to the software and the data and how you build these products, that some aspects are like physical sort of generative media, particularly if they're like video and they're trying to be more realistic or so on. That's close enough that you could, in some cases Talk about that too, but I think most people don't mean that when they say physical AI.

Nikolaus West [00:03:04]: So I think there's a span of people meaning, like the stuff that I said, which is like the broad sense to some people, just meaning it's robotics with a cool name. I think both of those happen. Yeah.

Demetrios [00:03:16]: Basically, if I'm grasping it correctly, it's AI that's out in the world, in the physical space almost in a way that we can touch it. It's tangible.

Nikolaus West [00:03:27]: Yeah. Like interacting with the real world.

Demetrios [00:03:31]: Mm.

Nikolaus West [00:03:31]: I think not on the easy. It's easy for us, like tech people to forget, but in most of the world, GDP takes place in the physical world. And historically, software has really not participated in that, other than maybe administering stuff. So, I mean, you have software for maybe managing a doctor's appointment, but it's not a robot like doctor. Right. So you have maybe software for keeping track of the building, like the schedule of construction or something. But.

Demetrios [00:04:06]: Yeah. Or sending the invoices.

Nikolaus West [00:04:07]: Exactly. But you don't have software tech that is just doing all the construction. So I think that's what we have in front of us, that that's really happening right now and becoming possible. And so in that sense, in the broad sense, I think physical AI is set up to transform large, huge, huge parts of the economy. So I at least believe that it has the potential and looks like it's going to also do that to be one of the biggest changes to the world economy sort of in history.

Demetrios [00:04:40]: Now, I happen to agree with you, but I also want to raise a point, which is we've been hearing that same thing from those Iot folks for the past, like, two decades, and I still have not seen IoT totally transform the way we live. Right. Maybe there's cool stuff that you have with smart homes, if you're really into that, or. I noticed that certain parking garages have sensors on if there are free parking spots. None of those I would bucket into life. Transformational.

Nikolaus West [00:05:16]: Got it. Yeah. I'm not going to defend Iot hype. I never understood it personally, but certainly with anything new. Right. That hasn't happened yet. So it hasn't happened yet. So it could not.

Nikolaus West [00:05:27]: Right.

Demetrios [00:05:28]: Yeah.

Nikolaus West [00:05:28]: So it comes down to belief, I guess I come down to thinking the thing I maybe never understood with IoT was like, it sounded like a lot of very people who like tech talking about, like, what is it? It's like, yeah, you're connecting everything. It's like that's not in itself. Solving a problem may be used to Solve one, but it's not actually describing something that you could do. But I think performing work in the real world or sort of automatically understanding what's going on in the real world, that's work, that's how you unlock value for real people. I think that's pretty unambiguous. So in that sense that is pretty different. But certainly it's up to if the technology works and if they can be brought to market in an effective way. But I think it's pretty.

Nikolaus West [00:06:18]: These categories are very different in those ways.

Demetrios [00:06:21]: Why now? Why do you think that we are on the precipice of things changing?

Nikolaus West [00:06:30]: I think it's, I mean it's mainly. I think it's the AI part of physical AI, right. And that's not said that all the great solutions would be dependent like where AI would be the most important thing. But the real world is super complex and this just sort of unending complexity. There's a long tail of just things that can go wrong. The world is super messy. It's very hard to build super general products that are very large markets when the software is not intelligent enough in that fussy way, not able to handle ambiguous sort of fussy situations where things change. Because with classic tech for the physical world, the way to make an effective product is to constrain the use a lot of.

Nikolaus West [00:07:30]: So that you can make. Write like a person can write an algorithm that handles each situation.

Demetrios [00:07:36]: And this would be like what we see these days with the coffee making robots. I think that's the.

Nikolaus West [00:07:41]: Yeah, you just constrain it. Maybe it's a coffee making robot or something like one cell in a manufacturing line. It's like super repeatable but you just constrain it a really, really lot. And if that thing is valuable enough, you can put all the effort into just making something for that. But a lot of the physical world things that are out there are much more messy than that. So basically you need something that's more flexible, more able to handle ambiguity. And that's really what the technology of sort of modern ML and AI is about enabling. So I think that's one.

Nikolaus West [00:08:24]: And what that enables then is if you can address a larger market, you can invest more into the hardware. The hardware is also important. It's not only AI, but you need to be able to invest in the hardware as well to make it good, to make it, I guess high quality, but also low cost. That comes from scale. Hardware is a scale game. And when you have scale then you can generate sort of, you get lots of side Benefits. You think of what happened in mobile phones that generated a huge amount of ecosystem that produces components that then you can use to create reasonably priced, other more niche products. So the mobile phone ecosystem drove the ability to make drones.

Nikolaus West [00:09:17]: Right. Then you can make a good cheap drone because of that. Not dropping the word, but because of that ecosystem.

Demetrios [00:09:26]: Basically, almost like you have this innovation and there's those secondary and tertiary effects.

Nikolaus West [00:09:34]: So I think that's really the big thing, that a couple things need to come together. First, you need the have a technology that can handle this messiness, which enables you to build hardware products that serve much bigger markets, that enables you to invest heavily into those products, which gets you the ability to get that scale. And that kind of gets you that scale flywheel going. And particularly for AI and so on, you actually need that flywheel as well for data collection. But so for really good AI, you need lots of hardware to collect data to improve the models that then allows you to deploy again and get better data because they're not doing more advanced things. So you need that flywheel going and you also need the scale flywheel. That is what leads to good hardware, like effective hardware products at a good price. So to get that ball rolling, you also need hype.

Nikolaus West [00:10:32]: That's actually a really important component that you need to be able to believe in the future and invest deeply into it. I think the ChatGPT LLM side of AI has provided that. So that started that out and that I think generated a lot of interest. And then there's been some like, within the field of robotics, there's been some big sort of breakthroughs, I guess, in methods like scalable robotics. Learning methods, which really has been a dream for a while is my understanding at least, but not the reality. So with scalable error just means scalable in the same way that you tend to talk about AI, Right. We mean you can throw more data and more compute at it and it gets better. Yeah, and we take that for granted now that lms, but that has not been the case in robotics for forever.

Nikolaus West [00:11:23]: Right. But a couple of years ago I was aware of the first line of papers that show those properties for these RT1, RT2, RTX kind of line of papers. I don't know if you know them. Yeah, there may have been something else that really started everything off. But that's from my perspective what I saw. And that also kicked off like, okay, we're seeing scalable methods here too now. So that from my perspective, what like the combination of like having seen that with LLMs and then seeing that now also in robotics method is what started to really, really get the ball rolling. And now the hype is very, very real in the space and it's a huge, huge amount of investment.

Nikolaus West [00:12:06]: And I think that is for particular when hardware as well that's actually necessary. So make a long winded answer I think to that question. But I think those are the reason.

Demetrios [00:12:16]: For why it's now can you break down the life cycle of how physical AI is trained? Like what models are we using? What are the ways that we're collecting data? Is it all through cameras? Is it through other sensors and how the platforms look? What do you need to enable if you want to be putting these models out into the world? Because I think it has a lot of extra complexities since you are deploying to the edge in a way. But I don't know how much of edge deployments can you then also offload certain tasks to the cloud? Like what does that whole thing look like? I feel like I am not clear. And it's of course case dependent. Yeah, maybe we can just take one specific case and talk through that.

Nikolaus West [00:13:11]: Sure. Oh yeah, it's super case dependent. They're so complex and you can I guess imagine any two of those. Like if you can imagine some setup, someone's doing that. But maybe super high level. I like to think of the two major systems that you need to think about as the online systems and the offline systems. So with online systems we mean just the things that are running as I'm going to say robot now, but it could be some non robot thing. But the thing that's running like when the robot is doing stuff in the world, it's running on the robot.

Nikolaus West [00:13:47]: Technically doesn't matter. It could be running some of the. Maybe hit an API that runs a model and back. So I include that. But mentally you can think of what's running on the robot that's understanding the world, planning, making decisions, picking stuff up, acting or whatever. So that's the online systems. And then you have your offline systems where you're basically you're using maybe running stuff on your laptop or workbench or on some data center somewhere. And that's going to be about observability, like what's happening right now with my fleet of robots.

Nikolaus West [00:14:28]: Maybe it's going to be where you maybe prototype new algorithms and new ideas. We run analytics to understand performance or just dig into things. They're just trying to understand your data that you're collecting and where you curate and collect and curate and Transform data through data pipelines into data that's ready for training and then train and deploy and sort of all those things. So I put that in the bucket of offline systems.

Demetrios [00:14:59]: And how many, sorry, just in this fictional scenario, how many models typically would be running on device or online?

Nikolaus West [00:15:11]: Yeah, so that's a hard question to answer, but I think we can maybe think about maybe take a little bit the historical perspective. So if we then start thinking about so running on device, we're talking about the online systems. So classically, everything that was running online was again, no machine learning or maybe some, you know, maybe you learned some classifier that did something or whatnot. But it's like mostly handwritten stuff, acting with 3D planning algorithms and so on. But it's all like whatever C algorithms written by a robotics engineer or something, optimizing the state of doing slam, where is the robot, all that kind of stuff. So that's how things were done before. And then deep learning happened and might start switching out small modules like, oh, actually our computer vision sort of works a little bit. So maybe we just detect objects, but then it's just running at some frequency.

Nikolaus West [00:16:08]: Run one object detector and everything else is handwritten. Still just a little object detector runs on whatever every fifth frame or every fifth camera frame. And we're detecting things. But then the rest of all the pipelines are kind of treating that as nothing special about that. It's just some data. And we write algorithms to fuse that over time and reason about what to do and so on. So that's maybe the next step. And I put that into the.

Nikolaus West [00:16:37]: Let's see, when was Alexnet 2012? Was it 14? I can't, I don't remember. But like that, yeah, 12. So maybe in the like 2018 kind of era, like stuff.

Demetrios [00:16:47]: I like how I just said it with a ton of confidence. I'm going to fact check that right now. I was like, no, no, 12.

Nikolaus West [00:16:54]: Yeah, it's definitely 12.

Demetrios [00:16:55]: Yeah, no, I haven't.

Nikolaus West [00:16:56]: That feels right to me. But stat range. So then you start. So that works and you add some more models into there, but it's still this modular. You have one model or you have many models, maybe they do single things. Maybe you have another model that's like, look at some other input signals or images and they output an estimate of the motion or something like that. So you just have these small modules. It's more like a library.

Nikolaus West [00:17:27]: But then you can think of it as like, it's just a function that does stuff. So there's been A trend of just more and more of those things. Right? And that has a lot of problems because in fact you can't treat them as black boxes because there's a lot of uncertainty. And ML models don't even really high performing ones, they only work well when they're sort of operating on roughly the same kind of data as they were trained on. And the only way to. And that's hard, right? That's a hard problem. How do you know when you're outside of that data and so on. And then you get a lot of these, stitch them together with handwritten algorithms and it gets a mess and it's pretty hard to build complicated systems.

Nikolaus West [00:18:11]: And I think this is how people try to build self driving cars with this approach. And well, it didn't really work right. So I guess the trend from there is to just be. I mean the idea of deep learning is do things end to end. And that has been happening more and more basically. So over time you just take, okay, now we have like four modules, like okay, can we swap them all out and have it be like one neural net that does more things end to end. So that's I think generally been the trend. So that can go quite extreme.

Nikolaus West [00:18:46]: I think in some of the very end to end focused humanoid projects you could have two neural nets or maybe one, they call it one, but it's really two. And then you might have one lower level, one that's faster and smaller, that's focused on fast low level whole body control. So it's really taking IMU signals and maybe pressure and some other sensors like that and just have some target of the pose or where it should be like how the body pose should be. And it's basically doing that control that you might previously have done with more classic optimization based methods. And then you have some larger neural net that's like maybe no skills like that, a higher level and like go reach for this thing or so on and that thing can be slower. So that's something like maybe even have a third level above that takes like text input and plans and stuff like that. That's if you're like very, very AI first. But you could swap out pieces of that with handwritten systems and so on.

Nikolaus West [00:19:52]: But yeah, I haven't seen a lot of like single neural net things that does everything I've seen that marketed but I don't know if that happens in practice.

Demetrios [00:20:05]: You bring up a great point, is that in these systems, especially the online ones, you are so constrained by different things because you're out in the world. And whether that's you have to be hyper focused on battery or you have to be focusing on speed. Nobody wants to have a robot that you tell to do something and then 20 minutes later it comes back and is like, actually I can't do that. I went through and I planned it out and no, I researched the topic and I can't get to it. Right. So what are some other constraints or things that you need to be cognizant of when you're doing stuff in that realm?

Nikolaus West [00:20:49]: I think the most important difference that is like the most important is time. And it's not, it's not time. It's like, oh, this is annoying to wait for. It's like the world continues to do things independent on how fast or slow you are. Like if you're a slow person, that doesn't change the speed of the rest of the world. Right. And that's the same with a robot. If it's doing something and it's like, oh, let me grab this thing and then that thing is moved and it didn't matter if you corrected was going to do it.

Nikolaus West [00:21:21]: Right. It's not there anymore. Right. And so on. And that's very different from even your ChatGPT style interaction. Right. You would love it to be fast because that feels better. But it's still this single stream of you take the inputs and then process them all and give you some output.

Nikolaus West [00:21:39]: There's not really a concept of a world evolving around that. So time is just that changes everything really. You need to be much more sophisticated about how you think about that in everything, how you understand what your software just did. You need to keep track of how everything evolves over time. And you maybe have multiple notions of time, like the compute time, the real world time, whatever's happening in the real world. Then maybe you have an algorithm that takes a certain amount of CPU time or how many certain amount of iterations maybe you want to keep track of like, oh, what time was this sampled at? And then when am I making this decision? That decision is a little bit later in time that you made it because you have to compute stuff, but it's relating to old information. So just dealing with time is the really, really big thing that you get into.

Demetrios [00:22:35]: Sounds really messy on the back end too. When you are trying to create systems and you need to look at all the different ways of time being interpreted.

Nikolaus West [00:22:48]: Yeah, it gets messy. So you need to build. I mean it increases the complexity of the data tools that you need. Right. It's pretty different than like Oh, I train one image classifier. It's shockingly simple. Right. A problem in comparison.

Nikolaus West [00:23:04]: And these robotics models or whatever other things, they're operating on sequences of time, so even that is more complex. But then they have some internal notion of steps or something. And then that in the real world is sort of overlaid on the real time of the real system they're operating with. So that is. Time is the really, really big thing, I'd say.

Demetrios [00:23:26]: Yeah.

Nikolaus West [00:23:27]: Then there's obviously whatever restores constraints and battery and things like that that are really difficult, but similar to other things. Like you have constraints and maybe they're more difficult on the edge, but. So the same idea.

Demetrios [00:23:42]: Yeah. Well, yeah, talk to me about the data side of this, because that feels like again, it would be very hard to deal with all of the. All of this different data that you're getting in different formats.

Nikolaus West [00:23:55]: Yeah.

Demetrios [00:23:56]: And specifically all of the video data has got to be very heavy.

Nikolaus West [00:24:02]: Yep.

Demetrios [00:24:03]: And then how you're training models with the video data, you might have some like time sensors or just time data and so more tabular style.

Nikolaus West [00:24:15]: Exactly. I think. So. We had this idea about online and offline systems and so on the robot. Right. On the online systems, what you'll do is you're trying to record what happened.

Demetrios [00:24:28]: Yeah.

Nikolaus West [00:24:28]: And so the real world, you had this like things happen at different rates. Maybe you have image your videos happening every like a 30fps, but maybe you have motion sensors, they're going at 1000Hz, so very different rates. Sometimes these things are kind of like a robot can be a distributed system to even have different clocks and stuff, all this data changing at different rates. And you're also recording what happened. So you don't really know the exact shape of the data set beforehand because you're recording what happened. So these things like the data that you're recording there is super messy. It's kind of like basically logs. Right.

Nikolaus West [00:25:08]: But it's logs of multimodal data streams, so lots of different types. It could be like 3D information. This data is often structured in deeply nested structures and so on. And you have maybe audio and video and 3D sensors, motion of different kinds, internal metrics. So it's really, really messy and really complex and difficult to handle from a data perspective effectively because you have this problem of combining really fast small signals with large, heavy, big tensors and images and point clouds and stuff like that that maybe are slower. And storing that together is actually pretty hard. So classic robotics, or in general you'll have on the system you tend to have store data to this very specialized file formats that are very write optimized. Oh, interesting.

Nikolaus West [00:26:02]: They're just good at recording exactly what happened and to do minimal operations to just get them onto disk really fast. That's without disrupting anything that's running on board. So that's the step one. Then you want to get the data off the robot, upload it. And that's depending on the volumes. Maybe you upload all of it or you have some selective like only upload when something happened or that kind of thing. But somehow you got to get it to the more centralized place where you can use it and that's where you're.

Demetrios [00:26:31]: Throwing it into like an S3 bucket.

Nikolaus West [00:26:33]: Or is it still, I would say like just to make it super simple. Right. The absolute most simple like thing would just be, yeah, you periodically write these logs to file and then you have a little job that uploads them to S3 to some S3 bucket and then you have them there. So that would be start one, part one.

Demetrios [00:26:53]: Wait, so that's simple. What is advanced?

Nikolaus West [00:26:58]: Well advanced just of getting it off is like okay, we're gonna be run like collecting so much data that it doesn't even make sense to upload. Like if you think about a self driving car, they will collect the data when they dock back somewhere and then you'll just swap out the SSDs right. And put in some new SSDs and you may never upload it. Or, or if you do, you need to send like trucks of SSDs to AWS, right? Or you have your own local. So there you could, you know, make choices, right? You only upload it when it's needed. You have some kind of like storage architecture where you keep everything like on a local data center that you have like right at where the data is collected and you just have it there until you upload the metadata and you go fetch it when it's needed. It can get really complex at large scale. But let's keep it simple.

Nikolaus West [00:28:00]: Write these files and upload them. Let's assume that's possible. So actually even before that you have another problem. So you want to be able to look at the current state of the robot. So you want to visualization is super, super important just to basically want to see what all the. If you're working on a robot and you want to be able to live visualize all these streams of data, you want to see 3D. If it has a 3D understanding of the world, you want to see that 3D map. And you want to see the robot walking around in that map and see what it sees and see the internal state of different algorithms and what are all the camera feeds.

Nikolaus West [00:28:40]: And you want to be able to scroll back and forth in time to oh wait, something goes wrong, I want to scroll back, hey, what happened? Right, so that's something that you need when you build these kind of systems, just live visualization. And then you want to look at those files that you recorded after the fact, right. And just analyze those. And that's just like a per session kind of observability. So it's a super, super core aspect. Okay, so that's important. Um, and so even before going to offline systems like this set of like just like recording what happened to some right. Optimized file and then having some visualizer to like either look at the files or look at, you know, live like you cannot build these products without those things.

Nikolaus West [00:29:29]: And even in classic robotics, like that's how you would you need those things. And in classic robotics there's a sort of ros like robot operating system be the most sort of commonly used setup that gives you this like data recording and some visualization capabilities. And there are slightly more modern visualizers that are built for that scenario. But they're really like. And so they're great, they work well for that. And that's like Orvis Webvis xvis Foxglove. There are a bunch of tools on that thing. They're kind of like robotics log visualizers.

Nikolaus West [00:30:04]: Really important. And that was designed for this world of pre ML world where that was the main complexity of your product, like what ran on the robot. But you need that. But then you can think about what happens in the off sort of you now want to improve your models so you've uploaded this data and the current state of the world at least is that you then need to make that data usable by sort of the kind of systems that you have to do MLOps or ML data pipelines and so on before training. You want to have it in, I don't know if TF records or you know, whatever, HDF5 files that are just optimized, ready to train and all of those things tend to be very structured and they're not good at storing this sort of messy log style data on top of that. And you want to also run analytics, run some statistical job compute metrics, all that kind of stuff. So basically all the offline data tooling that's out there, if it's like databricks or Datadog or whatever things. These tools do not understand this kind of physical AI robotic style data.

Nikolaus West [00:31:31]: They do not know the storage systems do not know how to read into these log structured messy file formats. They want everything to be like a table with columns and so on. But they don't know how to handle huge unaligned data. And there's no built in visualization which is crucial for debugging. So then you end up building like teams end up building these very complex data pipelines to try and transform and clean the data and do dataset curation. And because those offline systems don't understand really the source structure of the data, these things get super complex.

Demetrios [00:32:13]: Sounds miserable.

Nikolaus West [00:32:14]: So that's super miserable. And then it's super complex and really brittle. And then you don't have the ability because you don't have any built in visualization. So you don't really have the ability to debug. If your last. Right before training you just suddenly all the data is showing up upside down. Like where did that happen? You don't even have built initialization. So maybe I've talked to self driving companies who, who started using Rerun and found bugs like oh, we were training on something and the orientation of something was flipped for two years during training.

Nikolaus West [00:32:51]: It was giving bad performance and because of some no one saw it. They didn't have a good. It was too hard to debug the data pipeline. The state after each step in the data pipeline. It was just too hard to do. So that kind of stuff. Yeah, that stuff gets really complicated. So you just end up like with these robotics companies in a tough spot.

Nikolaus West [00:33:12]: Right. They end up with two stacks. Classically you have your online data system stuff that was built for classical robotics but does understand the kind of data and you have your offline systems that are built for large scale learning and stuff like that. But they don't understand physical AI data. These don't talk to each other and yeah, it's a mess. That's the kind of base state of, of the world.

Demetrios [00:33:37]: You created some visualizations, right. And you decided to or tools to help with the visualization and so that the physical AI can understand the world and you can see where and how they're understanding the world and you decided to open source it. Can we talk a bit about everything that you've been open sourcing until now and the inspiration behind that?

Nikolaus West [00:34:04]: Sure. Just to frame it first. So what we run as a company is doing. We're basically trying to solve the problem I just talked about. So we Want to build a new unified kind of data stack that handles both this online and offline scenarios for physical AI, so that you get a consistent, easy to use experience with kind of built in visualization and like much more efficient and easy to use like querying and things like that. Because the data stack understands both of these kinds of types of data. Okay, so we started out two and a half, what is three years ago roughly, and spent most of the first two and a half years, I'd say, on an open source project that's called, you know, Rerun, like the company. And that project is focused on logging and visualizing multimodal data that changes over time.

Nikolaus West [00:35:03]: So a broader application than just. But broader than robotics. So we start out actually focused on more like computer vision outside of robotics. And I've kind of expanded into to be much broader. And so that is a project that has, you have like SDKs in Python, Rust and C that allows you to. You can think of it like, it's like you'd log text or something, log metric, something like that. But you can log sort of anything, you know, like tensor, 3D point, cloud, build up a full like 3D scene of things happening or you know, normal metrics and video and have everything connected like cameras moving around and you like hover an image and it will like highlight where that ray shoots out in 3D. So those kinds of things and allows you also to scroll back and forth in time.

Demetrios [00:35:59]: I just gotta say this is kind of some Star wars shit right here. This is what, you know, when they plug into the droids and stuff, this is what I imagine they're seeing on their little computer.

Nikolaus West [00:36:08]: I hope so. Yeah, I hope so. It is pretty cool. Like, if I can, you know, say so myself, it's pretty cool. Cool application or framework. So we've been building that open source project and it's a pretty extreme thing. So we basically said, okay, none of the old things work. We rebuilt sort of this whole stack, like a data logging and visualization stack from scratch.

Nikolaus West [00:36:32]: In Rust, we took a lot of inspiration from how modern gaming engines are built. So the data model is built around an entity component system.

Demetrios [00:36:41]: Nice.

Nikolaus West [00:36:42]: It's basically more like composable data model. And so we had this goal that we wanted to unify if we talked about this online and offline system. We wanted to unify the open source project to unify the visualization side of that. So you should be able to use the same visualization framework for your dirty little Python script where you kind of, where you might use matplotlib or something. You want to just. I have a little algorithm and I want to just pop in some data. Blah. Just dump it in and it should just show it and then you'll be able to analyze it, go back and forth in time and that kind of thing to your centralized visualization dashboard.

Nikolaus West [00:37:18]: So I don't know if you've seen this maybe marketing video from Waymo or something where they show all the lidars and map like things updating on a map and all that kind of stuff. So teams use rerun to build those centralized things and also actually recently where our last release allow you to build data annotation apps so you can have interactivity, click on things, it'll respond back with the data that you clicked on. So you can build data annotators with.

Demetrios [00:37:45]: It and that's good for these anomalies that sometimes you'll hit or the edge.

Nikolaus West [00:37:50]: Cases you need to. I mean you work on. Yeah. You tend to annotate data like label data. There are many different. There could be whatever. Right.

Demetrios [00:37:59]: You always need to be doing that. Yeah, that's true.

Nikolaus West [00:38:00]: There's a lot of that. Right. There could be oh here's something weird happened but it could also be. Yeah, I'm just gonna. This is how we annotate data, you know and it's. Yeah, it's basically like wherever you want to look at your data, which you should want to do a lot. You want to have a consistent view. You want that to be like you ideally like that to look the same like wherever it is like if it's in production or script or people use it to visualize maybe the.

Nikolaus West [00:38:27]: There are evaluation runs during training, training pipelines, just a lot of different things. So we knew that the goal was to unify all of that how to be able to do it in the same framework and that required extreme flexibility and performance and so on. So that's been the goal there. That's a never ending job. But I think we've come quite far. Pretty good adoption and I think both spatial computing and robotics and like from 2% startups and I think now meta and Apple and unitree and hugging face and then forgetting companies but they use rerun in open source projects at least. Yeah.

Demetrios [00:39:10]: Damn.

Nikolaus West [00:39:11]: So it's used from smallest to the largest and so that's been really cool to see and I think it's about. We really focused on extreme ease of use and flexibility when you want to do whatever you as a researcher need to do and the performance. So that's been that project. And that's open source. It's always going to be open source.

Demetrios [00:39:36]: And it's almost like you went with the visualization aspect of the open source side Rerun and then when you thought about building an actual product, how did you think, all right, we're just going to complete the cycle and incorporate Rerun into a greater platform.

Nikolaus West [00:39:55]: We think that they kind of needed to reinvent the whole data stack. Right. And so the open source project forced us to do a couple things. So one of them is to really develop like a really good data model. Because you need a data model that's like expressive but also fit for purpose enough so that it's easy enough to use, but also composable and flexible and extendable and all that stuff. And that's really difficult. And you need to have something like that that can also be performant together with the right query engine. So those two parts are kind of.

Nikolaus West [00:40:30]: We've been forced to build so query engine basically to build a visualizer like this that's fast and flexible and allows you to scroll back in time. And I talked about this unsynchronized streams of data. You need to have a, basically build a small query engine or a small in memory database to make that, that work well. So we had to develop that as well. So those, the core pieces. So the query engine is really focused on time alignment and those kinds of robotics.

Demetrios [00:40:57]: I was going to say that's probably like one of the biggest boons that you can give someone is just making sure that all these disparate sources of data can line up. So if there's some kind of a event that I want to look into, I say what happened there with everything? How do I get a 360 view of what's going on there as opposed to okay, I see that something happened with this sensor. Did anything happen with the other sensor? And now I got to go sift through the data and try and figure out where in time that is with that certain data source.

Nikolaus West [00:41:33]: Yeah, no, it's. It's both hard and really important. And so that we kind of forced to work on those technical challenges for the open source and so for our commercial product that we're working on right now, it's kind of, you can think of it as a database that has sort of visualization built in. So it's a storage and indexing engine. It's like a query that basically. So this thing is built for the constraints that we have here. So they have source data in varied forms of the right optimized robotic style file formats that you need to have a plugin system so you can support many different. You need to be able to handle these recording unstructured style data sets, 100,000 recordings that would happen on a robot.

Nikolaus West [00:42:28]: And you also need to understand normal tabular data. So you need to understand both of them. So like a storage index engine that can make working with that fast and unified, I guess a data model that gives a way to consistently interface with this data. So that you want to maintain, so you want to be able to visualize any data that you have stored, but you also want to have a query engine that can operate on top of it. So and you need this consistent data model to do that. Then the next step above that is the query engine. And there what you really want is to be able to have this query engine sort of understand the physical AI data model. So what that can mean is one simple thing is maybe, okay, you have a data pipeline.

Nikolaus West [00:43:19]: You have your raw data and then you run some transformations on it and produce some nice, more structured, easy to work with table. You'd like to not lose all the semantic information. Like what does the first column mean? This is a column of 3D point clouds and you want to know that. Or a column of 3D positions that are part of a point cloud that change over time. That may be one column, another might be a video, another might be some sensor reading or something. You want to keep track of what everything means. And if you do, then you can sort of visualize something and debug a table that's like five steps into your data pipeline. So that's one of the things you want to have your query engine maintain that data model.

Nikolaus West [00:44:04]: And the other is you want to be able to do sort of robotics sort of oriented operations in the query engine. So imagine like writing a SQL expression and one of the parts of it is like, oh, doing time alignment. And you might want to do things like do 3D transforms in the SQL expression because you want to have all your data come out in the reference frame of your robot's gripper, something like that, maybe abstract if you haven't worked with this kind of data. But the ability to kind of push these kinds of operations into query engine can make just working with the data like a really lot, like a lot easier. And so that's one the next part and then kind of a data catalog that can understand this kind of data set. So it's really like the full data stack. So that's kind of what our commercial product is that we're building towards.

Demetrios [00:45:02]: You talked about these janky pipelines before. Does this eliminate the need to create pipelines or are you still seeing folks create pipelines just with a much more high quality data?

Nikolaus West [00:45:20]: I think so. First off, this is like still in development. We have a couple like you know, paying design partners working with what we have now, but it's still pretty early. So this is the first two prefaces disclaimer. Yeah. To not. I want to say that we have something that we don't yet, but I think our goal is for you not to have to have like any steps in your pipeline.

Demetrios [00:45:45]: Wow.

Nikolaus West [00:45:47]: I think that in all cases that's not achievable, but you should be able to just record and then build up a series of queries and like train off of that directly like that. I want, I want that to be possible. Right. That's not going to be most efficient or the best way to do things because you'll want to. Yeah, that's just not. You want to save the intermediate results and you want to be able to inspect them and do quality control and not redo all the computation during training. That doesn't really make sense efficiency wise or kind of structure wise to do that. But I want that to be possible such that when you want to iterate really fast, you just have very, very few sort of materialized intermediate steps.

Nikolaus West [00:46:34]: And then as you know what you want to do, you kind of say, okay, these parts of the pipeline should be stored so you can flexibly choose that. So in reality, I think any company is going to have multiple steps in their pipelines, but hopefully they're a lot easier to manage, a lot more efficient to run and build, and a lot less complex than they have to be now. So that's kind of the goal.

Demetrios [00:46:57]: Is there anything you want to talk about that we haven't hit on?

Nikolaus West [00:47:00]: Yeah, there are things if you think about the rerun. So the thing that's mostly widely deployed is our open source project and There are massive Mag7 style companies that have switched over such that at this point all the computer vision, for instance, they do uses rerun to debug it. And that goes from the little systems that the researchers do to how they debug their data to kind of really like all the way through. So very wide deployment like that. And it's hard to give a specific example there, but that just reduces friction at every point. Right. And kind of increases productivity. How do you even put the value of looking at your data? It's sort of the core, core thing that oils the wheels for everything you do.

Nikolaus West [00:47:53]: So that on a broader sense it's that. But I mean maybe specifically there are things like that. Like very well funded self driving companies finding long multi year bugs in their data pipelines that were leading to bad model performance. But after adopting using rerun to debug their data pipelines, that's another kind of.

Demetrios [00:48:16]: Example of that nature Meta, how are they using it with the Ray Ban glasses?

Nikolaus West [00:48:21]: So what's public from Meta is the.

Demetrios [00:48:24]: ARIA glasses, that's the new ones that are coming out.

Nikolaus West [00:48:27]: No, this is like the research glasses. So this is, they're pure like data capture devices that are open research and yeah, they're basically a pair of glasses with a lot of sensors on them. And so their rerun is the kind of official visualizer for the data sets that come from there. So there's like the EGO4D dataset for instance. They record a bunch of data in the home and things like that. It's also built into the ARIA like dev toolkit as the main visualizer there. That product, it's used for spatial computing things and now also quite more commonly used in robotics too to collect robotic style data but collected by a human and try to retarget that to robotics applications. So it's used like that.

Demetrios [00:49:19]: You know what actually I, I would love to ask you about is you mentioned, you mentioned before there's different schools of thought or ways that folks are trying to implement successful robotics or physical AI. One is as many sensors as possible, the other is the least amount of sensors as possible. Are there other vectors that folks play on that you've seen? Interesting or surprising for you?

Nikolaus West [00:49:51]: Interesting or surprising? I don't know. I think the kind of major vectors I think about are differently. Like how deterministic is something. Like how. Okay, are you with just saying ah, we learned trained a model and it seems to perform well versus like no, I need to have like mathematical guarantees about some behaviors. That'd be one other one that's like modularity. So the extreme being like oh, we have one neural net that does everything. We don't have any code, it's just a neural net.

Nikolaus West [00:50:25]: But that's the extreme that you can think about. Another being like no, it's very important to have modules and to test all the modules separately. And so the extent at which you go after and value modularity versus sort of performance is I think another really big one. And then in general some teams don't even believe in training at all. They think or like you shouldn't use much machine learning at all. That's certainly a bunch of folks like that. They're like, ah, it's unreliable, doesn't work. Your perception should.

Nikolaus West [00:50:58]: Maybe they use it for detection and things like that, but nothing sort of, I don't know, more complex. So you use Slam to build 3D maps of the world and you write classic planners that decide how to move and they feel like all these technologies are proven and you should use that and for certain applications that's totally the right way to go. And some people are a bit more purist about it, but that stuff is certainly still around and probably the right approach. And a lot of more like structured environments you can make that work really well. Like a warehouse or something.

Demetrios [00:51:35]: Yeah, exactly. Like if you know what environment you're playing in, then there's probably a lot stronger case to make it the least stochastic or what's the word? Stochastic as possible.

Nikolaus West [00:51:51]: It's nice to know what's going on, right. And you can make things fast and cheaper and so on. There's a lot of benefits. Of course. It comes with a lot of like taking training seriously comes with a lot of costs. It really increases the complexity of your offline systems. And the more end to end you go, the more there's kind of this like you can have simplify your online systems by just going fully end to end, but then you get like really complex offline systems like instead. So yeah, there's, it's engineering, right? There's trade offs.

Nikolaus West [00:52:24]: It's not the, it's not magic.

Demetrios [00:52:25]: Those are fascinating trade offs. That's really cool to think about.

+ Read More

Watch More

A Missing Link in the ML Infrastructure Stack
Posted Mar 24, 2021 | Views 851
# Monitoring
# Interview
# Stealth
Building a Modern Data Analytics Stack
Posted Mar 24, 2022 | Views 914
# Building ML
# Analytics
# Data Stack