Sign in or Join the community to continue

Eighty-Thousand Pound Robots: AI Development & Deployment at Kodiak Speed

Posted Aug 08, 2024 | Views 225

# Robots

# AI Development

# AI Deployment

# Kodiak

Share

speaker

Collin Otis

Director of Autonomy @ Kodiak Robotics

Collin Otis is the Director of Autonomy at Kodiak, a leader in self-driving commercial vehicles, and a Founding Engineer. Prior to Kodiak, Collin held positions as Chief of Staff at Uber Advanced Technologies Group and Director & Principal Scientist at Target Corporation. He was the Founder and CEO of Powered Analytics, a provider of adtech for large retailers, which was acquired by Target. As an aerospace engineer, Collin previously held Research Scientist positions at NASA and the Air Force Research Laboratory. He earned his BS, MS, and PhD in Mechanical Engineering from the University of Pittsburgh in the area of computational fluid dynamics for hypersonic vehicle applications.

+ Read More

SUMMARY

Kodiak is on a mission to automate the driving of every commercial vehicle in the world. Today, Kodiak operates a nationwide autonomous trucking network 24x7x365, on the highway, in the dirt, and everywhere in between. We also release and deploy software about 30 times per day across this fleet that is not just mission critical, but also safety critical. Our AI development process must match this criticality and speed, providing fast engineering iteration while guaranteeing a high level of quality that is the requirement of safety. In this talk, we’ll share the details of that process, from how the system is architected, trained, and evaluated, to the validation CICD pipeline, which is the lifeblood of the development flywheel. We’ll talk about how we collect cases, how we iterate models, and how we do quality assurance, data, and release management - all in a way that seamlessly keeps our robots truckin’ across the US.

+ Read More

TRANSCRIPT

Collin Otis [00:00:09]: Okay, so while we're waiting to get started, quick intro. I'm Collin Otis, director of autonomy at Kodiak Robotics. I was also a founding engineer of the company six years ago. And a couple questions. So who all has worked in self driving? Maybe a show of hands. Who all has worked in self. All right, good. And who has heard of Kodiak? Oh, great.

Collin Otis [00:00:29]: More people than just. Okay, good, good. Okay, great. So, yeah, I'll just jump in here. We're a self driving truck company, so we basically deploy trucks of all sizes that drive themselves all the way from over 80,000 pound trucks, which is a semi in the middle, down to an f 150, which you see on the left side. And today, what I'm going to talk to you about is how do we deploy AI systems in safety critical applications, which is a little bit different than a lot of the other AI systems that exist in the world. Our mission as a company is to automate every single commercial vehicle in the world. So our focus is on the dull, dirty, and dangerous jobs of driving vehicles and completely automating those vehicles.

Collin Otis [00:01:16]: Just to give you a sense of the scale of our company, at the moment, we're the broadest scale b two b autonomous vehicle company in the world. We have about 3 million autonomous miles so far. We've been around for about six years, 28 customers, and about $200 million into the company. Today, we're at about a million mile run rate of autonomous miles across three business segments. That's commercial driving. So over the road, trucking on the highway, on interstate highways, defense applications, and then industrial applications, we have 36 vehicles and about 200 people. And what I'm going to talk to you about today is how do we release software? So how do we evaluate, acquire data, and then verify software before it goes on a vehicle? And we have an incredibly high release velocity. So we release about 500 disparate releases per year, somewhere around 30 releases a day.

Collin Otis [00:02:11]: During weekdays, a weekly release, we have daily releases. And then sort of the cornerstone of our business is the driverless release, the release that's actually able to be operated without a safety driver in the vehicle. Just to give you a sense of our business segment. So this is what highway driving. Our highway driving looks like. So this is about 02:00 a.m. central time somewhere in Texas. So this is on a run from Jackson, Mississippi, to Dallas area, taking a commercial load.

Collin Otis [00:02:41]: We operate 24 7365, taking commercial loads all over the United States. Like I said before, we have the largest domain. So if you look at the map, there on the top right, we go coast to coast over most of the southern United States at this point, all the way from California over to the east coast, down to Florida and so on. On the defense side, this is a super exciting part of our business that we've built over the last 24 months. And here we drive with the same software stack that we drive on the highway. Same system, no changes to the architecture at all. And here we actually operate an f 150. We do that across a very broad array of terrain, all the way from freeways, rural highways, unmarked roads at the top, down to really the dirty sort of dirt roads, dirt trails, and then completely off road.

Collin Otis [00:03:31]: And this has been, for me personally, a really fun part of our business because it's sort of a return to the DARPA grand challenge. If you remember, the reason for the existence of the self driving industry is essentially this application. And it's been about 1520 years working on this in industry, and now going back to the Department of Defense. And at this time, Kodiak also leads in this space as well. So I just came off of a ten day trial at one of the military training areas in the United States. And it was an amazing thing to really watch our vehicle be operated solely from a command center. So we, we have basically an over the air connection to the vehicle, and we point and shoot on a map in Birds eye view, and we tell it to go here, there and everywhere. And the vehicle just drives itself around and around and around.

Collin Otis [00:04:12]: So really cool to see sort of that come to life. And then the last part of our business is industrial. So industrial is in typically low speed environments out in the middle of nowhere. Oftentimes private roads, hauling things like raw materials. And this has been, again, a really interesting part of our business that's grown a lot over the last year. We don't usually talk too much about our driverless operations, but we'll have some exciting announcements over the next couple of months. We've signed some really big deals, and I think this is a part of the business where we're very excited to be sort of scaling out. Okay, so let me just give you a background on how our system works.

Collin Otis [00:04:53]: I won't spend a ton of time on this, but the context is important for understanding how we actually deploy these systems. Essentially, we have sensors on the left there that's camera, lidar and radar. We can do both real and photo realistic. In sim, we're modality neutral, so you can take any mix and match of those sensor combinations. And we run what's called a modular cognitive architecture, or parallel MCA. So it's basically the brain of our system, and there are a number of independent but connected neural paths to the system. And essentially what each of these neural paths do is they take either all of the data or some subset of the data, and they express what do you see and how confidently do you see that? And they also output things like embeddings. So you still have preserved some of the information from the neural pathway through the system.

Collin Otis [00:05:43]: And the separability here is really important when you're trying to isolate changes and not have to redo validation, as well as sort of build up the system in a piece by piece way without having to train this monster brain over and over again. The output of that, which is a sort of vision and planning output, goes into an optimizer, which is differentiable, and it also has an optional, what we call a behavior expression prior. So we allow our product managers and our system engineers to essentially express specific things in a domain. So if you think about some things that you'd want a defense vehicle to do that you wouldn't necessarily want an 80,000 pound truck to do, there's a place where folks can sort of express that in a language that our system can use as a prior. One other thing to note is we use our, what we call our trail map as a sensor of the system as well. So it's essentially Google maps. We have polylines in Bird's eye view, and that's all the map that the system has as well. And it uses that as a prior to sort of understand routing in the world, but it doesn't use it for immediate horizon planning.

Collin Otis [00:06:46]: That is all based on perception that you see in front of you. And then after that, you go to a controller and you execute the vehicle actuation. This whole thing is encased in a traceable safety monitor. So this is a system that essentially makes a bunch of safety guarantees, and it's a really important part of the deployment process is to guarantee that on every release that you've also, you're still meeting those safety guarantees. And what that allows you to do is it allows you to iterate on all the parts of this parallel mcae without having to have fear around safety or fear that you might have degraded something. On the safety side. Just to go through some of the principles that we have as a autonomy team, first of all is learn everything. So we want to learn everything that we can.

Collin Otis [00:07:31]: Rather than having a heuristic system or one that requires a lot of tuning, our approach is to try to learn every part of it, but to not require any part of the system to be perfect, just like was talked about in the LLM talk before, you can never expect perfection out of one of these nets. And if you do, you're kind of asking for, you're asking for a nasty surprise. Resiliency reigns. So we want no single point of failure in the system. Multimodality first, meaning we don't rely on one sensor or another sensor. We rely on all the sensors, and if you ablate a bunch of them, the system still works just fine. So that allows us to sort of have that resiliency, interpretability, and partitioning. So that was what I mentioned before.

Collin Otis [00:08:11]: These systems can be separated to ease the training and also to minimize the validation. And we also want the outputs of them to be heavily interpretable. And this just makes it for a system that can be iterated and developed over time without having a single black box. The fifth one is that the system should be highly generalized. So this system, for example, you can plug and play different, completely different sensors. It's very common that we add a completely new lidar or a completely new camera, or we change positions to a very different setup, and the system just works. That's a lot of, it's a real challenge to train a system that has that level of generalization and can still drive safely. But it's really important if you want to show that you actually have a robust system.

Collin Otis [00:08:53]: And the last one is bounded validation costs. So if you need to validate the entire system every time you make a change, it's really prohibitive to actually deploy changes to the system. Okay, so this is what I'm gonna talk about for the rest of the time, which is essentially, how does our active learning flywheel work? And what you see from here to the end of the talk is available on any release of our software from the smallest pull request, which is a point release. So every pull request when you go put a pull request into our system today, that is actually released, and it has first class deployment, it's available Fleetwide, it's available nationwide, it has all the validation done all the way up to our driverless release, which our driverless releases are typically on a three to four per year cycle. So much more rigorous validation. But all the pieces that you're going to see here can be, they're either fully automated on every one of those release from the small to large, or they're at least available to be run on any release. So what we'll talk about here is how do you go from a release to the release evaluation to the training and then back to the release? And how do you run the cycle over and over, all the while building up the data. So one of the other things you're going to see here is every part of this cycle increases our data, the training process.

Collin Otis [00:10:09]: There's actually a lot of sort of bootstrapping in the training that gets put back into the data when you actually go deploy a release that spits off a bunch of data from your validation process. And then on the evaluation side, once you go put this release into the wild, you start to see new things, and that again enhances your data. And you keep running and running and running. This cycle and the evaluation, the training and the release all get better as the data gets better. Okay, so quick brief on our data acquisition. There's a bunch of classes of things that we do to acquire data, and this is a general theme you're going to see. As I talk about our systems, there's not one single thing you can do to sort of build these systems. There is a lot of things you have to implement.

Collin Otis [00:10:51]: You're going to see it in data acquisition, you're going to see it in evaluation. There's just a lot of moving pieces on the infrastructure side to allow us to actually evaluate and deploy these systems. So the first set is machine triggers. These are essentially done either onboard the vehicle as it's driving down the highway, driving itself, or on ingest. So after the vehicle comes home and gets plugged into a dock, they're the combination of nets and heuristics. So we run a bunch of nets onboard, we run a bunch of nets off board to do specific triggering. And we'll talk about what kind of triggers in a little bit. And then we also run a bunch of heuristics.

Collin Otis [00:11:24]: So, for example, if you see a crowd of people, we typically want to capture that and put it in our data. See a crowd of people on the highway, for example. These are easy to add to our system. So all you have to do is write a single function and then your trigger gets executed. So any engineer in the company can write a new trigger, and you can also do these post hocs, so you can write them afterwards. And then it'll go crawl the rest of our data, and then it would sort of generate all the data as if it had been running for six years. We do some interesting things here in the machine, triggers. So one example is VLM bounties.

Collin Otis [00:11:55]: So we basically have a vision language model that you can talk to, and you can ask it to go acquire data. And that data that will run, parts of it will run onboard to sort of trigger the event, and then parts of it will run off board to go slurp up the data. And it's a really interesting way to sort of go get your system to get data where, you know, you have a blind spot. Embeddings, I think everybody knows about that. Out of distribution is also a really important part. So our system, as we build that data set, we always know what is out of distribution. And events are triggered for that as well. Last one is call for help.

Collin Otis [00:12:28]: So when our system calls for help, we have a big part of our system is to understand when the AI doesn't know how to drive in a certain scenario, we do what's called a fallback, where we stop the vehicle, pull over to the side of the road, and we actually call for help and for humans to come intervene. And those are part of our machine triggers. We do a number of things on the human triggering side as well. So these are not super unique. We allow humans, they have easy ways to kind of go look at our data and collect events. We do random sampling and deep dives of our data. But that's not a huge part of the work that we do. It's about one person or less on a weekly basis.

Collin Otis [00:13:07]: In terms of what I think is the most interesting, we do a lot of nets, teaching nets, so foundation models, I think that's pretty broadly understood. People know how to use those, but we do some other interesting things. So one is temporal back propagation, auto labeling, where you basically play forward in time, and then you take all that data that you saw in the future and you bring it to the past, and you use that to basically auto label the past by looking into the future. Cross modality omniscience is where we basically use, let's say a lidar to automatically label camera data, or a radar to label lidar data. And you can essentially use these modalities to have sort of like superhuman vision in the other modality. Let's see if there's any ones. I wanted to call out here as well. Cross training.

Collin Otis [00:13:50]: This is a super interesting one as well. So we have all these neural paths, and they all critique each other. So if you see something, let's say in one neural path that you should see in another neural path, that path critiques the other one. And then that's used to train the next revision of that model. We use a lot of unsupervised data. We have about 50,000 hours of proprietary data. This has sort of taken over our training. Labels has become the smallest part of our training, and we rely a lot on self supervised or unsupervised data.

Collin Otis [00:14:19]: And we also use a lot of non driving data. It's really useful for your models to have abstract sort of vision, understanding, even in a self driving task, just to give you a sense of how the data distribution looks, to give you a very clear example. So there's a bunch of partitions of the data. Data with available labels is a very small part, and you can see that at the top. It's sort of just vanilla scenes that we might have gotten labeled at some point. Then we have rare samples that can be enumerated. And this is where something like the VLM I mentioned is useful. We actually tell the system, hey, I want you to collect data of a person very close to the camera or a police car on the highway, or a whole set of other things that we know are interesting.

Collin Otis [00:14:59]: And then our system goes and collects that data as a byproduct of its running. But then the last part is the out of distribution samples. And this is where you really don't know what you don't know, and where running in the real world really matters a lot. And so you actually ask the system, hey, go get me things that are out of distribution. It can be out of distribution either in our data set or on one of the neural paths, because all those neural paths are sort of, the training of them is not exactly the same. And you can sort of see what that looks like. So that one on the left is condensation on the camera lens, and then on the right is basically a little bit of water that's causing there to be scattering of reflections in that bottom right image. Okay.

Collin Otis [00:15:41]: On the data quality side, we actually don't do a lot here. The way I say it is, if you need perfect sensor data or labels in 2024, you're doing it wrong. You really shouldn't need perfect labels or perfect data. You just need them good enough. We do three things to maintain data quality. One is neural and labeler disagreement. I think label or disagreement has been done for a long time. The unique thing we have is neural disagreement, where we basically have nets that disagree on labels.

Collin Otis [00:16:04]: If they disagree sufficiently, that's a sign to us that we probably have a data quality problem there, data that degrades the metrics. Again, this one is really obvious. When you run all of our validation systems, you get these metrics for free. And if you see a regression of a certain magnitude, especially in a given segment. Let's say you regress bicyclists that are crossing an interstate highway, then you might want to actually go look at the data there. And then the last one is defect trace back. So we have methods using tagging and embeddings to actually find where a data that has regressed to go find all the other data that is similar to that in order to really pinpoint the exact problem with the data. Because oftentimes it's not a global problem, but it's very local.

Collin Otis [00:16:45]: It's in a certain type of set of scenes which may be scattered throughout the data set. I won't talk too much about how they train these systems, I already alluded to it. But essentially we have a way to pinpoint the training based on the defects that we find. This is a really important part, and it's really important that you correlate the defects in your system with the KPI's. So for us, that's around the safety metrics and the comfort metrics, which I'll talk about a little later. This is really important. So we don't spend all of our money with Nvidia. You'd be shocked at actually how little money we spend training the system.

Collin Otis [00:17:17]: Even though it's a fairly sophisticated system, it also makes iterations extremely fast, and it makes it much more difficult to regress the system because you're not doing these massive trainings that have all kinds of challenges. I won't talk any more about it. If you want to know more, you're welcome to email me. See me after the talk, and happy to share more. Okay, evaluation methods. So on the left are the metrics that we have. So KPI's I'll hit on in a little bit. We have training metrics, we have a huge number of these, and they're all segmented as well.

Collin Otis [00:17:47]: So segmentation is really, really important if you want to be able to really pinpoint where your system's doing well and not doing well, so you can focus the training. And then the last bit, which is actually the most important, is PRA. So this is probabilistic risk assessment. In self driving vehicles, the most important thing is the estimate of how safe the vehicle is. And on every release we have an estimate of how safe the vehicle is. And that's what actually allows us to know whether you can actually launch that software or not in a given domain. In the PRA, there are indicators that this PRa goes down into every little bit of the system. Everything from a bolt on the system to a vision model.

Collin Otis [00:18:25]: And in every one of those parts of parts of the PRA, there's indicators that basically tell you how well you're doing or what your risk exposure due to a single part of the system is. And so increasing those indicators is really the goal of a given engineer on the team, because once the vehicle is safe enough, performance on a comfort point of view is fairly easy. It's safety, especially on edge cases, that is the big challenge. And so pushing up those indicators is one of the most important things when you're evaluating everything from a pull request all the way to a driverless release. We run these things in sim. We run them in a fallback sim, which is basically where you look for self identified defects to the system. Oftentimes these are hardware failures, they're rare failures, things like that. We run that in hill, but we also run that in an off hill benchmark.

Collin Otis [00:19:09]: So hill stands for hardware in the loop. You actually have a mimic of our truck, and you can run the whole system on it. Again, that's just a pr. You press a button and you can run the system on a truck, or you can run the whole system off of a hill setup. And we have a set of benchmarks that allow you to sort of look for failures that aren't necessarily algorithmic, but more performance related. And then the last part and the smallest part is real world. This is where we make our money, and this is where it really matters that it actually works. So we would like the system to be fully validated before we actually ship it to the real world.

Collin Otis [00:19:40]: Now we'll talk about the release. So there's five principles for our release. Safety, efficiency, focus, speed, and the human having the final call. Safety is guaranteed by our validation system. So whenever you put code out into the world, all the system to guarantee safety is fully automated. And that allows engineers to have a lot of freedom to go innovate, try things, build things very quickly, because they know the safety part is taken care of. Efficiency is very important. This has to be fully automated.

Collin Otis [00:20:09]: Focus. I mentioned before, you want to be able to pinpoint whenever there's a failure, whether that's a failure in the validation or failure in the real world. Speed is super important. So we have a tenant in the autonomy team, which is for a new person joining the team. They should be able to ship code nationwide within 24 hours. So to any truck in the country within 24 hours of joining the company. And the last part is a human is the final call for whether a release is good enough. At the end of the day, we have a human look over all the data that gets spit out and make a decision at the end.

Collin Otis [00:20:38]: And so there is a human in the loop to push the button. I'll skip this slide for the sake of time. I'll just mention one thing quickly. So as you go from pull requests to daily releases, candidates stable and driverless releases, we essentially scale up the number of trucks, the number of states you're operating in, the number of simulation miles and the number of real miles. It's that whole same system, but it basically scales with how important or sort of how big the release is. Like I said, a pull request is also a release. And here's an example of what you get in GitHub. So on the left is sort of the automated results.

Collin Otis [00:21:14]: All these need to pass. These are linting and unit tests, but they're also safety tests. There's our full build. So this is all of our compute build. It's also all of our safety computer build. So we have a redundant safety computer. It builds the code for that. It runs simulation metrics.

Collin Otis [00:21:29]: There is one human component. You can trigger a dev test from here. So once all that passes, you can actually ask someone to go drive your truck, either on a test track or on a road if it passes all the safety bits. Also, inference model, build and deploy is part of this. So we run. So deploying to the fleet is one thing, but we also deploy to an internal cluster, which is a GPU cluster that basically all of our simulation hits. And so whenever you put up a pull request, if you have a new model that gets deployed out to all those places. So any vehicle, anyone running our stack, whether that's in SIM or in the real world, has access to your new model daily release.

Collin Otis [00:22:06]: So this happens every day at 01:00 a.m. it passes CI again, because there are actually cases where you have a merge conflict and you want to really make sure that daily release is good before you ship it to the fleet. After that, there's a smoke test at 07:00 a.m. so that's basically it gets put on a couple of vehicles driving in at least two different states for about an hour just to make sure that it boots up and everything. Usually the thing that can get you if you don't do this is that you've implemented something that actually doesn't, the truck doesn't start up properly. So that's actually pretty hard to catch in a validation system. Let's say the truck doesn't have the right dependency and it doesn't have an over the air connection where you try to run it. So that's the reason for the smoke test.

Collin Otis [00:22:49]: We do a deep triage of those every day, both sim and real world, and then that gets sent out to the team. So on the bottom right is an example of that. It just goes out on slack. And so you have this constant feedback of, hey, give me the anecdotal, give me the root cause of any issues that you're seeing in the daily release. And all the engineering team sees that candidate and stable. These are our main releases. So candidate gets cut at Friday at 01:00 a.m. and basically asks to compete with our stable release.

Collin Otis [00:23:17]: We run after Friday at 01:00 a.m. we run seven days of assessment of candidate, both SIM and in production. And then on Friday afternoon we have a presentation of all the automated metrics to the entire autonomy team. And we decide as a team whether we want to actually promote this new Canada to stable. So it's actually a pretty fun meeting because you've accrued five to 10,000 miles, you've run million miles of SIM on it, and you really get to see a full bodied assessment of the latest of the software. The last part is the most important, which is driverless. So the assessment is easy and continuous. So a lot of this stuff is automated, but it's very hard to pass this because it's much more rigorous.

Collin Otis [00:23:51]: And so we only do it about three to four times a year. Do we promote a new software for driverless? We start with stable, and then we actually execute a driverless test plan, which is a very long document. It includes our largest SIM set. It has a lot of things in the PRA that I mentioned, structured tests and so on. And then after we do all the testing, we assemble a plan and we put it in front of our safety review board. And they actually have to sign off whether they're willing to release this for our new drivers release. Okay, so in closing, infrastructure enables the iteration. So this is a really important part.

Collin Otis [00:24:24]: I didn't talk much about how does the actual brain work, or how do we iterate that. There's a lot of interesting things there. But the most important part of building that system with speed and quality is actually all the infrastructure that goes around it. We automate everything in the AI development process that we can. The right data and metrics, as well as the right architecture are much more important than perfect data quality. And then the last piece is there are many pieces that are needed to actually do this? Well, I probably talked about 50 pieces in this talk and there's another 50 that I didn't talk about. And so this is really important. If you want to build these safety critical systems is you have to do all these things.

Collin Otis [00:25:01]: Well. There's no sort of cure all or fast easy fix. You have to actually build the infrastructure. And that's all I have. And I'll take any questions.

AIQCON Male Host [00:25:13]: All right, let's give it up for Collin. Thank you. Thank you, sir. Great job. All right, before we conclude for a 30 minutes break on the main stage, just want to see if anyone has any quick questions for Collin. All right, you ready for it?

Collin Otis [00:25:32]: What was that? Awesome. Here you go.

Q1 [00:25:38]: Great talk. I really enjoyed it. I would like to know your thoughts on the alternatives such as deploying the model on edge devices on the truck itself instead of the OTA updates due to Internet connectivity and all those issues. And also considering like if the model is deployed on the truck itself at the edge, then it might save on bomb costs as well.

Collin Otis [00:26:06]: Yes, I should have been clear there. So our system is self contained. So the OTA is for sending a vehicle from here to there and for a new release deployment. But all the models actually run on the vehicle. So all those paths, they're all running locally. We don't rely on any kind of connectivity for the truck to drive itself. Does that make sense?

Q1 [00:26:32]: Got it.

Collin Otis [00:26:33]: Yeah.

Q1 [00:26:33]: I would curious to know the size of the hardware that the model is running on.

Collin Otis [00:26:38]: Yeah, all of our trucks. So it depends on the truck a little bit different. But four or five a 4000s is a good. Could give you a good idea of how much compute we use.

Q1 [00:26:50]: Got it. Thank you.

Q2 [00:26:57]: Sorry. Are you modified levels really high? Are these trucks still running off can bus? And have you had your issues with the AI trying to like work with the can bus?

Collin Otis [00:27:11]: Can bus, yeah. Yeah. So every, all the actuation runs over can bus. The way that the system actually works is we have a big computer in the back that has gpu and cpu and then it sends a control, it essentially sends a plan which includes all that is needed for controlling to a tiny little computer. And there's two of them. They're called safety computers. We call them actuation control engine. So Ace.

Collin Otis [00:27:37]: And that's a board that we build. There's two of them. And those are the only things that interface with can. So you can actually sever all of our models, all of the main compute from the safety computers and the vehicle will come to a stop and still be able to do that safely. But all of that messaging is coming from the safety computer onto the can bus, and that interface is fixed. So the AI interface sort of changes over time, but eventually it gets translated into fixed can messages.

Q2 [00:28:05]: Okay, I have a small follow up question. So you mentioned all these a computers. Is that taking the space of what used to be the living quarters of the rigs, or is that taking the.

Collin Otis [00:28:17]: Space of what used to be, say.

Q2 [00:28:18]: Again, the living quarters in the back of the rig?

Collin Otis [00:28:21]: Yes. Yeah, exactly. So it's a box in the semi. It's a box about this big, about 300 pounds, and that sits in the living quarters. We actually. Well, if you ever come see our trucks, we also have two seats still there so that we can take people. And in some of our vehicles, we actually still have the bunk above it, because you might want to take a human that actually sleeps, and there's some operating things there. They sleep, and they can drive later if they need to.

Collin Otis [00:28:51]: So, yeah, in the f 150, it's in the back of the bed. So it takes up about half the bed. Yeah.

AIQCON Male Host [00:28:57]: Great questions. Any more questions? All right, I see you.

Q3 [00:29:05]: Hi. I have a question about the test that you're performing. I know that trucks are behaving really differently under different weather conditions. So if it's really hot, the tires will connect differently. If it's raining, it will drive really differently. How do you test for that? Are you modeling for that? How does it work?

Collin Otis [00:29:24]: Yeah, that's a great question. So the first part is that you can have a lot of that data in your simulation environment. So you can basically test for that, both synthetically as well as from real data that you've collected, and you can sort of replay those, and that allows you to test whether that's rain or extreme sun. When you're talking about temperature there, you actually have to go do it. And so this is why real world deployment is actually important. For example, Texas, where we run a lot of our heat soak tests, are in Texas. So, for example, we will go on the hottest day of the year, set the truck out in the sun, and just leave it, sit there, and then operate the vehicle. We also do, there's a lot of lab testing you can do to heat up components and make sure that they still run.

Collin Otis [00:30:10]: The most important bit, though, is that you actually detect when you're getting too hot and you execute a fallback. So it's more important that you're able to detect it rather than you're able to handle it. Rain is a little bit more straightforward. There's not too many fallback cases with the rain side, but at the end of the day for all those edge cases, you need to test against those and typically test those in simulation by collecting or synthetically generating those offline.

AIQCON Male Host [00:30:39]: All right, any final questions for Colin? All right, coming to you.

Collin Otis [00:30:49]: Hi.

Q4 [00:30:51]: Hi. Great talk. I guess the system you described really does sound like the idealized version of a self driving test system. I'm really curious about the path to get there. So I used to work in self driving and then there's always different rois of like these really expensive tests that you run. So what were the previous generations of the system that maybe didn't work as well and you felt like you needed to add like more, just more robust testing too?

Collin Otis [00:31:20]: Yeah. Meaning before we were at the point where we are today when the system wasn't as robust as it was today, kind of. What were some of the challenges? Is that essentially your question? Yeah. So I think edge cases that we did not expect are the biggest challenge. And maybe I can give you an example even on the defense side because that's a newer business for us. So over the last twelve months we've had to, one example is dust. So extreme levels of dust is really, really challenging. You see that from time to time in the commercial business.

Collin Otis [00:31:52]: So when there's a dust storm or you're driving on a dirt road, but in the defense product that's almost a constant. One thing that's really cool is when you run these multiple domains, edge cases in one domain may not be edge cases another. And so you can use that domain to sort of bootstrap the other. So the defense space for example, we see a lot more dirty, dusty, rugged environments. Shock and vibe is a big problem there. And so we essentially see defects there that then make their way back into the software and then also improve the commercial and industrial businesses as well. So I think dust has been a big one on the defense side. Also getting stuck is, is and has been a big one.

Collin Otis [00:32:28]: So how do you basically assess whether the vehicle is going to get itself stuck? And that is a little different. That's trial and error. So that is really hard to reproduce in sim. And so we essentially get the vehicle stuck a lot and then try to understand how do you get unstuck? How do you get it out of a huge swamp bog and that sort of thing? Let me think if there's any other. Yeah, I would say the last one. A lot of the work that we've done. So we started the company with a lot of experience. And so we did architect the system.

Collin Otis [00:32:57]: We knew how to architect and build these systems. I think the thing that you can't get away from is the hardening of the system. So, for example, every one of the processes has a fixed time budget, and if you go over, you actually have to fall back. There's a bunch of safety reasons for that. And so that has been a lot of our work, is how do you get the system to be so robust? Like when you play your iPhone, it's very, very complicated system, but it doesn't crash. And that's actually a lot of work just to make the system have extremely high uptime. I would say, in my experience in self driving and especially running production systems 24 hours a day, that is the big challenge and one that we've overcome over time.

+ Read More

Sign in or Join the community

Watch More

How to Systematically Test and Evaluate Your LLMs Apps

Posted Oct 18, 2024 | Views 15.1K

# LLMs

# Engineering best practices

# Comet ML

Small Data, Big Impact: The Story Behind DuckDB

Posted Jan 09, 2024 | Views 13.3K

# Data Management

# MotherDuck

# DuckDB

Building LLM Applications for Production

Posted Jun 20, 2023 | Views 11.1K

# LLM in Production

# LLMs

# Claypot AI

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io