AI Is Fast. AI Projects Are Slow. Let's Fix That.
Speakers

Joe Maionchi is Co-Founder and COO of RocketRide, an open-source AI pipeline platform that helps developers build, debug, and deploy AI applications from inside their IDE. He has 25+ years of experience in infrastructure and platform engineering, with leadership roles at Symantec/VERITAS, EMC, Syncplicity, and EverPure, spanning file systems, storage infrastructure, cloud platforms, and developer tooling. Before co-founding RocketRide, Joe was VP of R&D at Aparavi, where he led the engineering team building data intelligence and automation systems. He holds a BS in Computer Science from UC Berkeley and 3 patents.

Rod Christensen is Co-Founder and Chief Architect of RocketRide, where he designed the platform's C++ runtime and pipeline execution engine. He has 20+ years of experience in infrastructure software, with leadership roles at CA Technologies, NovaStor, Yosemite Technologies, and Artisan Infrastructure, spanning storage systems, data management, and cloud platforms. Rod previously co-founded Aparavi as CTO, where he built the core platform for intelligent unstructured data management. He is the technical architect behind RocketRide's IDE-native approach, including the 17K LOC VS Code extension and production-grade MCP client.

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
SUMMARY
Joe Maionchi (Co-founder & COO) and Rod Christensen (Co-founder & Chief Architect) of RocketRide join the MLOps Community to walk through AIDE — the AI Integrated Development Environment. RocketRide is an open-source AI pipeline platform that lets developers build, debug, and run production-grade agentic AI workflows directly from their IDE, with support for 13+ LLM providers, 8+ vector databases, and full multi-agent orchestration.
TRANSCRIPT
Joe Maionchi: [00:00:00] We care a lot about saving people money
Rod Christensen: Over the year that this pipeline's gonna be up, if I can, you know, cut my LLM costs in half
Demetrios: The cost savings thing is huge, man
Rod Christensen: Oh, that was cool. You missed that That
Demetrios: was a good one,
Rod Christensen: huh? That was a good one.
Joe Maionchi: Yeah. Very cool
Joe Maionchi: So I think it's really important to get the overall context of what we're talking about here. So, you know, you could say software engineering is dead, right? And while it's not really dead, certainly the old definition of software engineering is quickly evolving away.
Demetrios: Yeah.
Joe Maionchi: And what we see is the need for developers to transition to this new way of working, and I like to just give a very simple example.
Joe Maionchi: You know, company A adopts efficiency AI, which is doing the same with less. So what happens there? They lay off 50% of their developers to maintain their current revenue, current growth path. Company [00:01:00] B adopts opportunity AI, which is doing more with the same. So what do they need to do? They need to keep their developers, re- quickly retrain them in this new way of working so that they can produce 2X, 4X, 10X.
Joe Maionchi: And that 10X advantage in the top line revenue for that company B ends up compounding over time, and you can see where this goes between those two companies. So while, while right now it's a little more shortsighted with a lot of efficiency AI going on in the industry and a lot of these either real AI layoffs or AI wa- washing, it doesn't really matter, it's imperative that developers take those steps to build their AI expertise, and this is where really Rocket Ride sees an opportunity to help those developers get up to speed, get productive building AI solutions, AI workflows, AI automation, uh, you know, fast.
Demetrios: Hmm. So yeah, tell me about how you feel the world of a software developer is changing. I imagine there's just, now we're using [00:02:00] a lot of coding agents, and a lot of the code isn't being written by actual humans. It's being written by agents, and we're kicking off sub-agents, or we're kicking off parallel agents.
Demetrios: There's long-running tasks, all of that. What does that entail now, and what have you been seeing out there?
Joe Maionchi: Well, I think in the context of Rocket Ride, it really, we really acknowledge that coding is no longer the bottleneck. Code generation is no longer the bottleneck, which it used to be. So the bottleneck just moves somewhere else, right?
Joe Maionchi: Whether that's intentionality, whether that's tool discovery and selection, whether that's quality and post-production. It, the, the whole idea here is that there's new roles for, to be filled by the developer, and those are painful steps. Yeah. They can be painful, c- certainly for the inex- uh, inexperienced.
Demetrios: I like this intentionality piece. Can you talk to me more about that?
Rod Christensen: What I've noticed about you know, the [00:03:00] development process, my development process in the last, uh, you know, couple years using coding agents and all that kind of stuff, it's really quick to actually develop, you know, code. Mm-hmm. But it's also really quick to develop crappy code, right?
Rod Christensen: And, you know, a l- a lot of times even i- I love Claude AI. I mean, it's, it's a really, really good coding agent from my point of view. Sometimes it gets stuck in these loops, and sometimes it doesn't take into consideration good engineering practices. Mm-hmm. Uh, it writes not the best algorithms in the world, and then you spend, you know, an hour trying to figure out, oh, you made a mistake on step A, you're on step N, right?
Rod Christensen: And then it forgets everything that you've done, and guess what? You end up with spaghetti code that's not worth actually shipping. And to try and get it into a shipping state is very, very difficult- Yeah ... at that point in time.
Demetrios: Yeah, we were just talking about how compact is not the [00:04:00] best when you get the context and you need to compact it.
Demetrios: And so there's a lot of hacks right now that folks are doing. One being just put it into a document so that you have your whole planning session or whatever it is that you're trying to do, the to-do list, et cetera, et cetera, is on a document, and then you don't have to lose the context when you compact it.
Demetrios: And if you start a new session, you can reference the document always.
Rod Christensen: That does help a lot. You know, one of the, the keys to actually making Claude successful is a good plan. Yeah. You know, giving it access to the full plan upfront. And when you do that, it actually comes out with better code. But that implies that it's not great at iterative, you know, engineering- Mm-hmm
Rod Christensen: because it doesn't necessarily remember what it did three days ago. And this module over here, where this is reusable, you know, I, I told it to go out and build a UI. I ended up with two or three different CSS style sheets- Mm ... which were all common. And it's like, you [00:05:00] know, good engineering practice, create a common CSS- Mm-hmm
Rod Christensen: you know, and move forward and use that, you know, for theming and all that kind of stuff. Claude's not really good at that. Claude is looking, you know, at really good at, you know, taking a particular task, you know, or plan and implementing that plan, regardless of whether, you know, you have 500 files in there.
Rod Christensen: It's not looking for all the CSS's and all the common styles on what the background paper looks like.
Demetrios: Yeah. You, it, you find that Claude is a little bit lazy?
Rod Christensen: Yes. Well, I fight with Claude quite a bit, actually. You know, it's just like, "No, I don't want it done that way," you know? "Okay, look at common.css and figure out, you know, what are the common styles here?
Rod Christensen: How do you apply themes? Why are you doing it this way and not this way when you did it this way yesterday?" I
Joe Maionchi: mean, it's simulating a developer, so of course it's lazy.
Demetrios: Yeah. That is true. You want... The more lazy the better if you're a developer. Uh, but it is relentless, so a lot of times it can get there. [00:06:00] It just means that you gotta burn a bunch of tokens.
Rod Christensen: Oh. Yeah, I'm in trouble for that.
Demetrios: Yeah. Yeah, I can believe it.
Rod Christensen: Yeah. It's... I... So, you know, from a programmer standpoint, you know, I started in, you know, Assembly language. Mm-hmm. Z80. Believe it or not, Z80 Assembly code. If anybody rec- remembers a Z80, this was like an 8-bit microprocessor way back when. And then I went to Basic, and then I went to, I did some dabbling in COBOL and Fortran, and then all the way up.
Rod Christensen: Finally ended up in JavaScript and TypeScript about three, four, five years ago. And this AI evolution with Claude, you know, and Claude Code, I just view as another, you know, language for me to actually interact with. Yeah. I don't care about, you know, what kind of code it produces underneath, what language it is.
Rod Christensen: I wanna interact at that top layer, but I also want it to write quality code. Mm. If you're gonna pick TypeScript, fine with me, no problem, pick TypeScript, but write good TypeScript.
Demetrios: So what have you [00:07:00] been seeing out there as far as the difficulties?
Rod Christensen: From my point of view, it's the ambiguity i- you know, in the bridge between the English language, I program in English, obviously, and the technical jargon about, you know, structure and creating good engineering principles underneath.
Rod Christensen: You know, it's like a good architecture. How do I actually express that to Claude? What is a good architecture? Because that's a very subjective, you know- Yeah ... uh, uh, point. W- what is a good architecture? You know, what is good structure definition? And, you know, reusability and component reusability, all those kinds of stuff.
Rod Christensen: So if you look at my memory.md, it's probably, you know, like that long because I, you know, specifically go in and say, "I like this, I like this, I like this." Uh-huh. And
Demetrios: that does help. Is that where you're primarily letting it live to help you create that solid architecture?
Rod Christensen: Consistency.
Demetrios: Yeah.
Rod Christensen: Yeah. So I'll s- I'll say...
Rod Christensen: You know, in my memory, I'll say, "Go out and look at common CS... You know, CSS. Look for styles. You know, look for this particular architecture." I like, [00:08:00] you know, uh, snake case in TypeScript versus... Uh, I'm sorry, camel case in TypeScript, snake case in Python, you know, and, and so on and so forth. So I have all these rules to assist Claude in, in some kind of, you know, semblance of organization.
Rod Christensen: Mm-hmm. And it's like, you know, I, I... We have two, we have... In Rocket Ride, we have two SDKs. We have one in Cli- uh, uh, TypeScript, one in Python, right? Right. And so I'll say, "Okay, make this change in the, the TypeScript S- SDK." Okay? And it does it. Even though in my memory I've told it, you know, when you update one of the S- SDKs, we try and keep them in sync, so upgrade, update the, you know, Python.
Rod Christensen: And then it will go into the Python and, you know, do a camel case because that did it with, you know, TypeScript, but that's not standard Python. Mm. So it's like you always have to check every line of code, you know, to make sure that it's actually, you know, based on good engineering principle.
Demetrios: Yeah, I guess the question that I am constantly [00:09:00] asking myself is, if it runs, even if it's not pretty, does it matter?
Rod Christensen: Mm. That's a great question.
Demetrios: Yeah. So- I triggered you both there a little bit.
Joe Maionchi: Yeah, you know, if you're a software developer, uh, and you care about your craft, then you want to know that it's architected for maintainability, readability, reusability. If you're a vibe coder, then your judgment is, "Does it do what I want it to do?
Joe Maionchi: Great."
Demetrios: Mm.
Joe Maionchi: So it really depends on are you a citizen developer or are you a software developer, right? But at the, at the same time, think about all the infrastructure code that you have to go through and build each time you're building an AI application. Do you really want to reinvent that wheel over and over and over again, rely on the particular version of the particular coding agent that you're using?
Joe Maionchi: And, you know, [00:10:00] God forbid, if you change coding agents and they do something different. You know, these coding agents are trained off of publicly available source code. Not all of that code is the best architecture code in the world. Mm, mm. So what you're doing is you're basically risking that you're gonna have different implementations of that infrastructure code at each time you build an AI application.
Joe Maionchi: That's really one of the foundations for why we decided to build Rocket Ride- Mm ... is let's go ahead and design a framework that standardizes that infrastructure tier and that glue codes, which is kind of almost a waste of time anyway because it's not really tied to the value of what you're trying to build, the, the solution.
Joe Maionchi: It just is necessary. Mm. So we've abstracted that away and, and done it in such a way that it's robust, scalable, and standardized, and that's one of the aspects of, you know, what motivated us to build this thing in the first place.
Demetrios: Yeah. Actually, we had a lunch and learn last week, and it was all about skills, and one of the key learnings from a paper that one [00:11:00] of the m-members of the lunch and learn read was you have to build the skill for the model that you're using because different models prefer skills in different ways, and that was new to me.
Demetrios: I didn't realize that you wanna have it that kind of specificity. I probably should have known c- for the exact reasons that you're saying, like each model is trained in its own way and it has its own particularities. But just doing that, like, it, it gives you so much lift, uh, if you think about, okay, I'm using Codex or I'm using Claude, and this is how Claude skills work, this is how Codex skills work, whatever it may be.
Demetrios: then you can get a little more lift. Are you thinking in those terms? Like
Joe Maionchi: The way I see it is we've taken away some of that open judgment that the s- particular, uh, agent will, will use to kind of follow a certain pattern, and I'll give you a practical example of [00:12:00] why that's relevant. You've built... You as a software developer have built a solution for the finance department that does XYZ, and now the legal department wants a very similar solution, but it's a slightly different use case, and it's a different data set.
Joe Maionchi: And now you go, and you tell your coding agent without our framework, and you say, "Okay, now modify that. You know, take that application that I just built as a base and now modify it for this, this, and this." It ends up going and changing things that you didn't want it to change. Mm. And by keeping it following these certain patterns for the stuff that isn't relevant to the differences between the two, it enables you to have more reliability.
Joe Maionchi: "Hey, I've already got that thing running in production. It's already working. Why do I need to take risk in areas I don't need to take risk in for things that don't matter for that solution?" So th- this is an example of why that's valuable from my perspective. So I don't think it's so much that it's agent specific as it is whichever agent is coming in, it's not really a [00:13:00] skill so much as these are the rules.
Joe Maionchi: These are the patterns that you need to follow to build solutions in our tool.
Demetrios: So you're defining in the framework, these are the important things, these are the rules, and this is how I, I want it to act, and that can't be changed as you are changing the agent
Joe Maionchi: for Well, there's a lot of flexibility to customize, to build new nodes, to c- uh, compose, uh, solutions and, and pipelines the way that you need to.
Joe Maionchi: But there's, like, a, some fundamental primitives that Are there for a reason
Rod Christensen: As we all know, agents work differently, right? A CrewAI agent works much differently than Deep Agent, which works much differently than LangChain agents, right? I mean, it's... And how they interpret, you know, the, the context that you give it, and how they interpret the question and the ultimate goal is very different.
Rod Christensen: Not only that, you can take a Deep Agent, for example, and give it two different LLMs. You know- Yeah ... [00:14:00] Claude AI and, you know, and, and, uh, you know, uh, ChatGPT. Uh, you know, it, it works differently, you know? And not only does it not work the same way, but a lot of times you get different results every time you call it, because, you know, uh, the, the LLMs are free to go out and do whatever they wanna do, and you have to actually work with what they give you.
Rod Christensen: So the thing is, is, is what we did is, is we really wanted to be able to compare agents and their performance, and then each agent and all of its tools can use different LLMs. So bring the correct LLM to the, you know, to the job. So let's say a tool requires an LLM, and that tool is very, very good at reading tables out of PDFs, for example, right?
Rod Christensen: Bring the right tool to the job. Where the main orchestrator may actually require more intelligence, so it goes with Claude or, you know, uh, uh, ChatGPT, where somebody else may wanna go with, you know, Gemini [00:15:00] something, right? It's... So bring the right tool to the job. And then one of the things that I really struggled with as a programmer when I first started doing this is, okay, do I use LangChain, or do I use Haystack, or do I use Llama, um, what?
Rod Christensen: Llama Index. Index. It's like, okay, which one do I use? Well, which one is better? I don't know. So it's, it's that trial and error that is so time-consuming, right? So with, with, you know, what we, what I wanted to bring the tool to is plug in anything you want, you know, they're both all given the same inputs.
Rod Christensen: Plug in what you want, compare the performance, and you can literally do that in two minutes and see how Deep Agent actually performs on something versus, you know, uh, uh, CrewAI. Oh, wow. Now, once you, once you do that- You pick your best agent. Well, does this work okay with Sonnet, or does it work okay with, you know, GPT?
Rod Christensen: Does it work with GPT 3.5 Turbo because that's cheaper for me to run? So not only can you tune the agents, but you can [00:16:00] tune the LLMs and do direct comparisons in your pipelines to see exactly which is the best for your solution.
Demetrios: And are you running these pipelines various times to be able to say, "Okay, this one-"
Rod Christensen: They're all run simultaneously.
Demetrios: Uh-huh.
Rod Christensen: So you can give it an instruction and it can send it over to, you know, uh, CrewAI and let CrewAI do its work. And then you can get the results from CrewAI. And then it'll do it to LangChain, and then it'll do it to Deep Agent, give you all the results and say, "Oh, this one sucked at it, so we're not gonna use that one."
Rod Christensen: Yeah. "Let's replace that one. Try a different LLM in it. Run it again." Boom, you've got your results all on the screen.
Demetrios: Wow, that gives you a lot more insight onto is it me or is it the pipeline? Is it the agent? Is it the LLM? Yes. What is it exactly that is not working here when something isn't
Rod Christensen: working? Well then, if you really wanna dig down into it, there's a full trace on exactly what agent did what, what the agent was asked, the results from the agent, the, you know, what the, the tool was passed, what the [00:17:00] tool results came back.
Rod Christensen: So you can trace through each different logical step that it took through- Uh-huh ... went through to, you know, end up at your final answer.
Demetrios: It's that full observability to recognize where things are falling over. Yes. What is the best agent for the job?
Rod Christensen: Yeah.
Demetrios: Okay. Talk to me more about the primitives and how you break this down.
Rod Christensen: I mean, we have many, many different what we call nodes, okay? Nodes are like, you know, taking a piece of text and chu- chunking it into documents, you know. And you can do it many different ways. There's a code, uh, a code chunker that actually recognizes C++ and JavaScript functions, and all that kind of stuff.
Rod Christensen: There's a regular text parser that, you know, chunks text based on paragraph or sentence or however you wanna do it. Then there's things like, you know, vector stores. We have, I think, support for eight different vector stores right now, the most popular ones. Uh, we have an HTTP, uh, uh, uh, requester that you can [00:18:00] go out and, you know, get stuff from the, from the, the web.
Rod Christensen: We have a f- uh, support Firecrawler, what, 12 different LLMs at this point in time. Mm. So I mean, each one, you build up these chains exactly what you want it to do. I have data I want it sent to here. I want it sent over here to do parsing on it. I want the text. I want to take that text and run it through a summarization engine.
Rod Christensen: But before I do that, I wanna do an anonymization on it so it doesn't have any PII in it. Then I want to chunk it up, and then I want to put it into a vector store. So you get to build up these pipelines. We d- you know, we basically look at it as, you know, a box of Legos, you know? A little tiny red Lego is not interesting, right?
Rod Christensen: Mm. A white Lego, you know, a four by four Lego is not interesting. But when you start building stuff out of these Legos, that's when it becomes interesting, and that's what we're really focused on.
Demetrios: You know, I have been constantly thinking about how much of the pipelines do I need to [00:19:00] define versus letting one of these agents just figure out what the best way of going about it is, if I give it the boundary conditions, right?
Demetrios: I don't know if you've played around with that at all, as to I wanna clearly say this is the workflow because I know what the workflow is. Yep. And I plug in the different parsers. I give it that LLM brain on certain steps. But at the end of the day, it's a bit of a DAG.
Rod Christensen: That's what s- that's exactly what we do.
Rod Christensen: Yeah. Okay? We standardize something called, on what we call lanes. And lanes, we have a write text lane, a write table lane, a write question lane, write video lane, audio, and image. And based on the kind of data that's going through it, it flows through these nodes, and you can do different things. You can do OCR, which takes an image and outputs to the text lane.
Rod Christensen: Then from the text lane, you can take that and plug it into an anonymization text lane, which produces more text, but it's [00:20:00] anonymized text- Yeah ... that you can then chunk into documents if you want. So it's basically that plumbing between the two things is the, is the boring stuff, right? Yeah. And it's the stuff that LLMs, you know, Claude typically get wrong.
Rod Christensen: It's the plumbing. It's not the actual, you know, AI calls. It's the plumbing and getting everything to work together that, you know, these coding agents usually screw up.
Demetrios: Yeah. Talk to me more about that because I wanna just say give it to Claude, let Claude figure it out on its own.
Rod Christensen: Yes. So with, with, th- we've created this complete set of documentation that you can do two things with.
Rod Christensen: Claude can actually build custom nodes for you to accept the, you know, text input and do whatever you want and send it out. That's, that's one little Lego, right? But the more interesting case is I wanna parse a PDF, anonymize it, and, you know, output it to my text lane. Okay, return [00:21:00] the answer. Claude will build that for you.
Rod Christensen: It'll put all the Legos together because it completely understands exactly what each one of these nodes do, so you don't even have to go out and think about I need to connect this and this and this. Claude knows how to do it- Uh-huh ... because we've instructed it all about these different nodes and how they work and interoperate together, and it puts it together.
Demetrios: Uh-huh. Okay. So the different nodes are Very understandable by Claude Code or by whatever coding agent you're using, and you're giving it a simple framework to leverage when it wants to create these workflows.
Rod Christensen: Right. And the reason why it's easy is because we standardized on these lanes, right? Yes. It's very easy for Claude to understand.
Rod Christensen: This is the text lane. When you have text, run it through here. This is the image lane. You wanna do OCR on it, run it through this, right? Yeah. So Claude, those are in simple terms that Claude usually gets pretty darn accurate on.
Joe Maionchi: [00:22:00] Yeah, and we keep saying Claude, but any coding agent, right?
Rod Christensen: I, okay. Yeah. Oh, so I use Claude.
Rod Christensen: I'm sorry.
Joe Maionchi: Yeah, so these, these three elements, right? The agents.md, which kinda defines the rules and the instructions, the documentation, and then the source code itself, right? And with that, like you said, you know, we've actually given natural language instruction for the agent and watched it compose these relatively complex, I think, 45-node pipelines- Whoa
Joe Maionchi: to accomplish the, the objective, right? So, I mean, that's where we're going anyway- Yeah ... right?
Demetrios: Yeah. And how dif- how detailed are you in- explaining in English each step versus just explaining what you want at the end, and then Claude figures out, "I need to create this 45 nodes to get to that end state."
Rod Christensen: Claude is good enough, and our instructions are good enough. I have a PDF, give me the text back.
Demetrios: Mm-hmm.
Rod Christensen: Okay? I have, I have an invoice that I want to stick into a MySQL table. Here's the [00:23:00] table name. And it'll figure out exactly what, what it does- Yeah ... because we have a MySQL connector. Yeah. We have a table recognizer, extractor, and puts this whole pipeline together to actually take a printed invoice, you know, that you get on a piece of paper, send it through the OCR, and then actually put it out into the, into a database.
Demetrios: And am I plugging in... I can bring my own OCR model?
Rod Christensen: Yep.
Demetrios: And then you're giving me that observability on top, so if the OCR model, I can test them- Yes ... different ones. I can see this one actually gives me better results.
Rod Christensen: Yes.
Demetrios: Yeah. And then how are the evals being done?
Rod Christensen: Well, one of the things that I did, which was really interesting, uh, pipeline that I wrote, is...
Rod Christensen: that I used for t- actually testing OCR, is I had an image, right? And then I, uh, on the... We call it the dropper, and you basically just drop it on an image file, and it sends it through the pipe. I had all my different OCR models, right? And with those OCR models, they all have text output. I took those text [00:24:00] output, put them into an LLM, and say, "Which is the best one?"
Joe Maionchi: Mm.
Rod Christensen: So it ranked all the different models base- or the, the different OCR models that I had based on my inputs, so that I could get an impartial judgment, "Oh, TR OCR is better on this, you know, on this particular data," or, "Easy OCR is better on this," or, you know, a handwritten is a completely different thing, right?
Rod Christensen: Yeah. But that's, but that's what you can actually inherently test in, like, two minutes.
Demetrios: And you can do it for every use case, which I hadn't thought about because there's certain use cases that are going to be... They're gonna skew towards one model versus the other.
Rod Christensen: Yes, especially on OCR. OCR- Mm-hmm ... it has broad variances on how good it is depending on the model.
Demetrios: But then you can extrapolate that out for any type of use case. So if it's the PII anonymizer model, you can- Yeah.
Joe Maionchi: Sentence transformer, embedder, chunker- Mm-hmm ... parser, you name it.
Demetrios: And at the end of the day, if I wanna bring something, let's say that I'm using a paid [00:25:00] tool like Extend, I know they have a great OCR model.
Demetrios: I can just bring that, plug it in as one of those nodes. Because the nodes, if I'm understanding correctly, you're basically not opinionated about?
Rod Christensen: No. No, we don't care what node it is. As a matter of fact, we have nodes that we send data to. We send data to Google to do object recognition, or you can run object recognition on site.
Rod Christensen: We're not opinionated. The, the whole concept of, of RocketWrite is really pick the best tool for the job, right? Mm-hmm. And how do you actually prove that it's the best tool to- for the job is run it.
Joe Maionchi: Yeah. Yep.
Rod Christensen: Yeah. We don't care if it goes up to a website or whether you're running it locally or on-prem or wherever.
Rod Christensen: Just, you know, be able to prove which the best solution is.
Demetrios: Yeah, okay. Well, this makes sense on what you're doing and how you're doing it. And then, uh, are there any other primitives that we should know about?
Rod Christensen: Oh, boy. One of my personal favorites is what we call the frame grabber, [00:26:00] and a frame grabber's really cool because what you do is you feed video into it, and we can detect, uh, scene transitions within a video.
Rod Christensen: Okay? And then we can tell you exactly the time that it, it actually did that. We can pull out, you know, every third frame or a frame every 15 seconds or five seconds or whatever, and then we can send those through object recognition. And from object recognition, we can say yay or nay whether this matches, whether there's a blue car in there or not, and then we can actually pull highlight reels out of, out of, uh, you know, videos based on what the content is, send those frames to our assembler at the, at the back end, and get a full video with highlight reels, you know, for, you know, out of a big video.
Rod Christensen: So I like the frame grabber. Yeah. That's my favorite one. That and transcription, actually.
Demetrios: Oh, nice. Yeah. Transcription is a huge one, especially- Yeah ... I, I, like many other people, am addicted to the [00:27:00] speak-to-text on your computer. I know a bunch of friends who are using local models to make that happen. I just pay for a service out there.
Demetrios: But this is-
Rod Christensen: So one of the nodes that we have right now is, is speech-to-text. All right? Yeah. That's, that's, you know, one of the nodes w- and that's transcribed. So you put in audio or video. Audio is faster- Mm-hmm ... right? A lot more compressed. Uh, you can put audio in, and then it converts it to speech, going on the text lane.
Rod Christensen: From the text lane, you can go and convert that into a question, issue that question to an LLM. An LLM gives an answer, then we can send back the text to the answer, and what we're about ready to release right now is our text-to-speech. Okay? Nice. So then we can convert it back into speech- A full voice agent
Rod Christensen: and then, and then answer you, so you can actually talk to your LLM, and it'll talk back. It's not just a one-way street.
Demetrios: Yeah. Why use Rocket.Ride versus just using the native coding agent?
Rod Christensen: I think we may [00:28:00] be asking too much of, of our, you know, codegen AI to actually do because not only are we having to ask them to write our application for us or, you know, the user interface and then the engine, we're also asking them to figure out how to call and put all these different models together- Right?
Rod Christensen: And then what if I have to change a model six months from now? Do I have to go back into Claude? And, and we'd already talked about how bad Claude is at actually incremental stuff, right? Mm. So when you start making changes and maintaining that, it becomes much, much more difficult. W- opens your, your, you know, horizon for huge, you know, issues going forward.
Rod Christensen: Not only that, when you, when you ask Claude, you have to be very specific. Have you taken care of all the async problems, the multi-threa- you know, multithreaded problems, the multiprocessing problems, all these other things that, you know, are just nitty-gritty details? Yeah, it works fine as a proof of concept on my laptop.
Rod Christensen: Deploy that, and guess what? It [00:29:00] falls all over itself.
Demetrios: Yeah. Talk to me a bit more about those problems that you've seen happening and, and then how you solve them.
Rod Christensen: The solution to the problem is really a good architecture up front, okay? It's like deciding these are how the lanes are gonna fit together.
Rod Christensen: This is how the components are gonna fit together. This is their little piece, and then we control the outside of the piece. We give them everything that they need inside in order to do the task, but then we control all the outside so that we can run 256 threads concurrently without all the models stepping all over themselves, right?
Rod Christensen: So you can send 60 chats. 60 people can actually be chatting with this thing at the same time without it falling over. Those kinds of issues, a, a good architecture and actually disciplined engineering, you know, like I mentioned earlier, is absolutely critical for, you know, scalability, durability, maintainability up front.
Rod Christensen: Because if n- unless you design that up front, you've got problems. [00:30:00] You know, when you start going into production, you're gonna have multithread issues. Then you have with Python, Python, oh, my goodness, sometimes I love Python, but I hate Python because they have the synchronous model and the async model, and the two don't get along very well, right?
Rod Christensen: So y- you know, trying to explain to Claude, okay, this model, like CrewAI agent, requires asynchronous, but yet this model over here requires synchronous. So now you actually have to say async.run or async., you know, to thread in order to get it to actually, you know, interoperate with each other. Quad misses that a lot, I'll tell you that.
Rod Christensen: And then six months from now, you're gonna go in and change, you know, your OCR to an asynchronous model. Good luck.
Demetrios: Sounds like you've had that pain. I've been there. Yeah. I can feel it in your voice. Yeah. It's true. Let's talk a little bit about runtime observability and how that plays into it.
Joe Maionchi: There's different aspects to runtime observability, so for [00:31:00] example, you've got a running pipeline.
Joe Maionchi: It's working, but you're not satisfied with the quality of the answers from the LLM. Why? What's going on? We en- we enable that introspection at the input/outputs of each node in that pipeline, what the LLM is doing, how they're talking to each other, what kind of answers it's getting. You might be looking at why is the latency on this particular agent so bad, right?
Joe Maionchi: Mm. You can dig in and understand where, where the latency's coming from. You can un- understand errors. Uh, so there's different ways in which we provide that level of introspection to enable you to quickly get to the root cause of what your problem is, if there is a problem.
Rod Christensen: When you talk about, you know, 10 or 15 components in a pipeline, the only thing that has to fail is one, and your result sucks, right?
Rod Christensen: That's, that's the bottom line. A lot of times trying to determine which one of those compa- components actually failed or didn't produce the [00:32:00] results, is it, you know, step in two or is it step nine? And being able to... And it not only during development going through and looking at everything that was done within the pipeline, but also in production too.
Rod Christensen: You can keep all this, this logging information in production, and somebody has, you know, a problem in, uh, you know, in production and says, "Well, this really screwed up." Well, let me go take a look at the log and see what the LLM did here and, you know, what the transcriber said here. Oh, you gave it a bad image or something like that.
Rod Christensen: So it's not only in development, it's actually more key on observability in production because that's when your failures are actually gonna happen.
Joe Maionchi: Mm. You know, another aspect is cost observability, right? So as you run these AI-enabled applications at scale- How do you know whether, you know, what, what's, what's costing you the most tokens, right?
Joe Maionchi: Um, you can, you can dig into that. You can then fine-tune and say, "Hey, you know, can we find a [00:33:00] cheaper model? Um, am I running out of quota, and why am I getting this error now?" It's 'cause, you know, I, I've got my auto top-off, you know, capped.
Demetrios: And that I, uh, I thought about a bunch because you can quickly get out of your depth, especially when you're with a team and you're thinking like, "Why did all of a sudden we are...
Demetrios: Just, like, our spend went up 20 grand last week." Yeah. And you're thinking, "I hope it was for something good."
Joe Maionchi: Yeah.
Demetrios: Yeah. I, I hope it means we're doing way more pr- productivity or something like that. But if you don't have that visibility into it, you can kind of be, you know, flying blind.
Rod Christensen: Absolutely. You know, the, the observability and being able to look and see all the transactions, not only what the node did, but between nodes.
Rod Christensen: Mm. Why did this node go over here instead of over here? All that is really, really important, not just in, in, you know, development but in production. And how many production systems actually have [00:34:00] VS Code debugger connected to 'em- To see ... to find out exactly what happened?
Demetrios: I can imagine a world where some folks are using Rocket Ride, they're getting that observability, and from the logs they're just going back and they're updating autonomously.
Rod Christensen: Well, the interesting part about that is you can actually take those logs, feed them back, and see if you can- That's what I was thinking ... go with a cheaper LLM.
Demetrios: That's what I was thinking, yeah.
Rod Christensen: Take a look at the inputs that the LLMs are getting, gather those all up and say, "Issue these and check the results against what my new proposed LLM is doing with what I got before on the really hugely expensive LLM.
Rod Christensen: Maybe I can save a few bucks."
Demetrios: Yeah, I was thinking, I wonder if anybody has run auto research in that regard because it feels like something that potentially you could optimize with Just testing different LL- LLMs. Maybe it's testing prompts, then trying to still get the same output, or at least the same, you're passing a certain amount of tests and- Yep
Demetrios: [00:35:00] that-
Rod Christensen: That'd be really interesting- Uh ... to actually automa- automate that
Demetrios: and- That's what I was thinking ... yeah. 'Cause you know what the end state is, right? So it's easy to try and figure out. And with auto research, I think that's the key. As long as you know what that end goal is- Yeah ... it will... Now, you may spend some money going for it, and so it def- totally defeats the purpose of the whole, like, "Oh, we're gonna save money by running auto research for the next 24 hours-"
Demetrios: and burning a bunch of cash," but
Rod Christensen: it would still be fun. But you know what? If it saves you in the end over the year that this pipeline's gonna be up, if I can, you know, cut my LLM costs- Yeah ... in half, it's actually worth it,
Demetrios: right?
Joe Maionchi: Well, and things are changing on a weekly basis, right? And some of these newer model versions that are coming out are actually more token efficient.
Demetrios: Yeah.
Joe Maionchi: So what better way to find that out than as soon as these new models come out, actually run that through your pipeline and see if you're actually saving, and you're, you know the results are gonna be there. Yeah. Right? They're not gonna regress in terms of [00:36:00] capability, but they're doing it at a cheaper token cost.
Demetrios: Yeah, it's almost like something that you would wanna just kick off a long-running agent that does that every couple weeks or every month. It's constantly trying to optimize as far as, "All right, let's scan..." Maybe scanning Hugging Face is a little bit- ... Wild Wild West, 'cause you never n- quite know what those models are doing.
Demetrios: But yeah, it, it's probably worth looking into if you really are trying to optimize that. I think majority of the folks are just trying to make something work- Yeah ... and they're not at the let's-make-it-cheap phase yet. That might come in six months or a year, or it might come in five years. Who knows?
Rod Christensen: Well, it depends on who you talk to.
Rod Christensen: The engineers just wanna get it to work. The CFO says, "I'm not gonna, I'm not gonna pay the bill." Yeah. So, you know, it, it, the key is really de- designing it upfront, making sure what you have works, and you can do that [00:37:00] not just, you know, from guessing. You can actually prove that this model actually works for you because it's cheaper, and we'll go with that upfront.
Demetrios: Yeah. Uh, I think there was something interesting that you were talking about earlier on the inspiration behind building Rocket Ride. Can we go into that and what the pains were that you were seeing out in the market, and why you felt like, "We need something that can give these LLMs a little bit more determinism when we're kicking off all these agents"?
Rod Christensen: You were- I can go through my AI journey.
Demetrios: Yeah. Give us the whole breakdown on-
Rod Christensen: Okay, so, so we started in a, in a company called Opravi. Okay, and Opravi... I've been in the backup and, you know, data, you know, synchronization world forever, right? Um, and so when we, when we first started, you know, Opravi, we were doing data intelligence and automation.
Rod Christensen: And we actually [00:38:00] started reading files and not just blobs and sending them over, you know, from, to AWS or, you know, S3 and that. We actually started trying to be intelligent on it, like reading the text out of it and, you know, trying to figure out, you know, the classifications of data and all that kind of stuff.
Rod Christensen: So- Hmm ... we started, I started my journey on AI, this was probably about three, two years ago, three years ago. And, and it was actually very simple. It was actually taking, you know, a file, parsing it, chunking it up, embedding it, and putting it in a RAG system so that a user could very easily, you know, chat with their data over, you know- I don't know, 1.5 billion files or something like that
Demetrios: Yeah.
Demetrios: The classic 2024 system
Rod Christensen: That didn't work very, that didn't work very well Yeah. Or it was so expensive to actually embed all that data that it was just not... But, you know, it was a learning experience for me.
Demetrios: Yeah.
Rod Christensen: And so we have this engine that [00:39:00] we, you know, was doing this, this data processing, and actually pushing through the pipelines, and then we got the idea, "Oh, well, you know what?
Rod Christensen: We could actually add this over here and this over here, and what if we add this kinda node," and, you know, all this kinda stuff. And then it just really started gelling into, you know, a more general product than, you know, just pushing files through it and, and, you know, doing RAG on it. So that's where Operavi or RocketRide was actually born, is from, you know, technology that we've been working on for the last six or seven years with- Oh, wow
Rod Christensen: with Operavi.
Joe Maionchi: Yeah, and, and the reason why we decided to split this off and start a new venture is when you think about that core business for Operavi, it was really around enterprise, selling to business units and IT personnel in order to help them discover and clean up and leverage their unstru- prepare their data to be leveraged.
Joe Maionchi: Mm. So it's really around [00:40:00] that data discovery and preparation, whereas this new tool was really around taking that data, those, those curated data sets, and actually doing something with them. And that, that market is really developers. Yeah. And so it's completely different market, different sales motion, and so that's what, the reason why we split it off.
Joe Maionchi: But as Rod says, this engine's already been proven at petabyte scale, running in an, an existing company, you know, at enterprise.
Demetrios: And it's w- probably worth noting here, too, that RocketRide's fully open source, right?
Rod Christensen: Yeah. Correct.
Demetrios: So that's awesome. Yeah, we donated
Rod Christensen: it.
Demetrios: Oh, nice.
Rod Christensen: Yeah, we donated it all to a foundation.
Rod Christensen: So we're part of the Linux Foundation- Nice ... and what is it? The AAIF, right?
Demetrios: Yeah.
Rod Christensen: Yeah. Great. So we're part of the AAIF foundation. It's completely open source. Every bit of it, the C++, the T- the Tikka code, and Java, and all the, the Python is all open source.
Demetrios: Do you wanna go down the C++ route, or [00:41:00] do you wanna give me a recent bug?
Rod Christensen: I'll go with the concrete bug, bug. All right. Let's talk about it. This just, this, this just- This is crazy ... this just happened to me Thursday. Yeah, let's do it. Oh, my goodness. It was like a nightmare. I spent, like, I don't know- 12 hours on Thursday, and then I got up at 6:00 o'clock in the morning and spent another nine hours on it.
Rod Christensen: The problem was with Crew AI, okay? Crew AI agents. I, it, it really is a good technology. I love Crew AI, actually. But their API, okay, is synchronous.
Demetrios: And what were you trying to do? What was the-
Rod Christensen: I was trying to get two agents running at the same time. Oh. Okay? Two chats, okay? Actually, I had three chats. I said, "Write me a 500-word story about a dog."
Rod Christensen: And in another word I... another one I said, "Write me a 500-word story about a cat." And then the third one was, "Write a 500-word story on an elephant." Okay? Very simple chat, right? Yeah, you would [00:42:00] think. But it, it, but it runs through these things And since we were using the synchronous call, it's not multi-threaded.
Rod Christensen: The problem was as soon as you issued the second, you know, "Write me a 500-word story about a cat," all of a sudden the dog was thinking, the dog story was talking about the cat, too, or the elephant story just hung. Uh, so it was all stepping all over itself, and it turned out that it was, you know, CrewAI on the synchronous mode.
Rod Christensen: So then we actually had to go back and figure out, okay, how do we do this? Because it has three different API calls, uh, you know, to actually do it. It's, it's... I don't remember what the... There's a, a synchronous one, then there's a synchronous one that is, uh, an asynchronous one that is actually using the synchronous loop, and then there's a truly asynchronous one that they actually introduced about 60 days ago on their latest stuff.
Demetrios: And you saw this just by looking at the logs- Yeah ... in Rocket Ride?
Rod Christensen: Yeah.
Demetrios: And you recognized, "Wait a minute,
Rod Christensen: why is this-" "Wait, why is this over here, and why am I getting dog talk [00:43:00] in the cat, you know, chat?" And it's like, okay, something's wrong here, right? Yeah. Nothing... It didn't work out well. So figuring out exactly what to change on the CrewAI node and the Crew, you know, uh, sub-agent and the orchestrator node that we have to use the async version.
Rod Christensen: The new async version allows us to run 30 different chats concurrently without them stepping all over each other. Yeah. That's what Rocket Ride actually does because you know what? We'll find a problem, we'll fix it, and now, you know, you can just drop in these CrewAI chats and, you know, they just work.
Rod Christensen: You don't have to worry about the three different APIs and how the three different APIs work differently and break and have global variables. That's not the only one that happens, too, either. It's like OCR is, oh, my goodness. OCR is it's all over the place. Oh, very confusing. Whisper's great, but, you know, it's, it's all these different little details about all these things that, you know, your, your coding agents are gonna miss.
Rod Christensen: It seems to [00:44:00] work just fine because, you know, all the developers have been working w- you know, on CrewAI agent. We haven't released it yet, but all of the developer, "Yeah, it works, and you know, I've got a chat up, and it, you know, does this and this." Have you tried a second chat?
Joe Maionchi: Yeah.
Rod Christensen: Have you tried 10 chats-
Rod Christensen: throwing it at it? So those are the problems that, you know, you, you run into when actually developing an AI solution.
Joe Maionchi: And this is actually, um, an example of why, uh, a tool and a framework like, uh, Rocket Ride is valuable because it's also community-driven. So as the community discovers these kinds of issues that are general issues and they get fixed within the framework of the tool, everybody else benefits, and you, you no longer see that problem.
Demetrios: Oh, man, I'm just thinking about all The different APIs, the different nuances that the coding agents aren't really catching. I've also gone through this a few times where I'm just yelling at the coding agent, like, "Read the docs. Just read the docs and look, it says it right [00:45:00] there." And then I have to copy and paste, like, "Do this.
Demetrios: No, you're looking at it wrong. You're thinking about it wrong. Do this. Implement it this way." But by that time, it's like, "Well, why don't I just write it myself? Why am I having to go and-" Yeah ... copy-paste the docs? And sometimes you get ... I think nowadays, uh, it's a lot better. Most documentation sites have like a LLMs.txt.
Demetrios: But even so, we had a, a talk recently at one of the coding agent conferences where they were ... They did a bunch of research on how coding agents use different products, and how well they're able to understand how to use a product versus others. And some, the coding agents get like that because it's very well-written documentation.
Demetrios: It's like machine understandable or code, coding agent understandable documentation versus, "Oh, this documentation is written for a human," and so it's not quite the same. [00:46:00] A human understands and they're probably looking at it and thinking like, "Yeah, of course. It's fine." But then the agent has trouble when it tries to iterate over that, and then you get into those situations where the agent doesn't quite understand what it's doing wrong.
Demetrios: It's just doing something wrong, and you wanna be able to debug it.
Rod Christensen: It may be odd, but one of the things that I do is I, I, I go through the documentation. I then ask Claude and GPT-5, okay, "Rank this prompt on a scale of one to 10." And, you know, GPT usually comes back with nine or 10 or something like that.
Rod Christensen: Claude will say it's a six. And it's like, "Okay, how can I improve that? What can I do? Rewrite the prompt for me so that you better understand it the next time." And that's an iterative process that we go through, not only on prompts, but on documentation and stuff like that. Because, you know, at the end of the day, we want Claude and GPT to actually understand what we're trying to talk about, so the best thing you can do is ask them.
Demetrios: Yeah. Yeah, I've heard folks [00:47:00] doing this by way of skills, but and it's a little bit like a quick and dirty fine tune of a model on a certain product or a certain tool. So you can say, "Okay, now you're gonna be using CrewAI. We've created a specific CrewAI tool so you have all of these different understandings-" Yeah
Demetrios: if the documentation isn't good enough.
Rod Christensen: It's an art. It really is, and there are so many moving pieces with it that, you know, and, and the more that you can actually base things and, you know, make the pieces so that they stick together correctly- Yeah ... the easier your job becomes because then there are not so many moving pieces anymore, right?
Demetrios: Yeah. You have your own cloud. Can you talk to me a bit more about what that is and how it benefits folks?
Joe Maionchi: So RocketRide launched on March 4th with this open source tool that we've been talking about. Uh, later this week, we launch our preview of our cloud service because what we've basically made available [00:48:00] for free through open source is the way to build AI applications.
Joe Maionchi: Once you actually have an AI application and you want it, a runtime for it, assuming this is not a, you know, personal project, you wanna have many users using it, you need somewhere to host this thing, and you wanna be able to obviously have it scale and scale at a cost efficient way. So the RocketRide cloud is going to be basically the, the preferred option for us at least, uh, to, to run those AI applications at a reasonable cost, right?
Joe Maionchi: That's, that's the, the premise.
Rod Christensen: So one of the, some of the options that you have to actually deploy these pipelines, you know, once you draw all the pretty boxes and connect everything up, you know, it's working on your laptop, what are the three options you have to deploy? You can deploy as a service on an on-prem server somewhere, so you can supply your own GPUs, you can supply all your own stuff, your own hardware, uh, and run it yourself completely managed.
Rod Christensen: We're not involved at all, right? You get the source code, you can put it wherever you want. [00:49:00] S- same kind of deal that, you know, most AI companies like Milvus and Quadrant, they have the open software, and then they have this hosting software, right? If you don't wanna worry about, you know, your GPU utilization or getting the appropriate GPU, that's where our cloud really comes in.
Rod Christensen: You can also de- deploy via Docker too. So- Nice ... if you want a Docker, then, you know, you can put it in your own on-prem. The advantage of actually going with, uh, the RocketRide cloud is, number one, economy of scales, okay? Because if you're, if you're spending X number on OpenAI, and then y- you know, X number of requests, and then you have Y number of requests with this, you know, Llama parse, and then, you know, all this kind of stuff, we aggregate everything, so we get economies of scale.
Rod Christensen: Everybody who's using, you know, OpenAI, we, you're bundled in with them, you know, as far as that pricing goes, so we can actually run it a lot cheaper, you know- In your own company ... for your external resources.
Demetrios: Right? Or is it everybody that's using [00:50:00] OpenAI in RocketRide cloud? It's... Is it it's my own company that's using it, or-
Rod Christensen: It's, it's your own company, but- Yeah
Rod Christensen: it's, it's our own company, but we're aggregating the inference request to OpenAI, so we get cheaper pricing. We pass that on to you, much better pricing- Ah, so I'm hitting- ... than you can buy it yourself ...
Demetrios: OpenAI through you.
Rod Christensen: Yes.
Demetrios: Oh, okay. Okay. Through your cloud.
Rod Christensen: Yes.
Demetrios: All right.
Rod Christensen: Great. So when you, when you drag over the OpenAI LLM Okay?
Rod Christensen: That whole thing is being run in our, you know, in our cloud, so therefore you're with everybody else's OpenAI connections. Then we aggregate all those, the OpenAI, you know, usage into a much better price, negotiated price- Yeah ... so we can deliver OpenAI services at a much, you know, uh, reduced rate. And same with, with LlamaParse and all the other ones that are charged services.
Rod Christensen: Not only that, you only have one API key to, to manage, okay? Because now we [00:51:00] charge you for the OpenAI usage, so all you need is the RocketRide key, and it gets you all these different services based on your usage.
Demetrios: Mm-hmm. So you're, in a way, you're also being a gateway.
Rod Christensen: Yeah.
Joe Maionchi: Yes.
Demetrios: Yeah. Okay. Tell me more.
Joe Maionchi: One of the challenges, uh, in this new era of AI is you see a lot of custom tool stacks for each developer, and how does that work in a collaborative team setting?
Joe Maionchi: So by standardizing on something like the Rocket Ride framework, now you upgrade to using the cloud service. To run those pipelines, we can provide, uh, collaborative features that enable people to work together on bigger projects. Um, we're also look- uh, going to be offering, uh, governance and, and auditing and compliance, as well as SOC 2.
Joe Maionchi: And as we probably mentioned, we've come up with ways to maximize the compute resource utilization [00:52:00] to lower our cogs, which we can then pass those savings onto users as well, so that we can offer this cloud service at a cheaper rate than, say, you'd find on the open market.
Rod Christensen: You know, let's, let's go into that for just a second.
Rod Christensen: What, what's really interesting is, is the, the Rocket Ride Cloud and how we deal with GPUs. We have something called the model server, which allows aggregation of inference. So let's say we have 100 different customers running EasyOCR, right? Normally, what would happen is you have 100 different EasyOCRs loaded all over these GPUs all over the place, right?
Rod Christensen: So that means that you've got a lot of GPU resources that you're essentially, you know, wasting. So with our model server that we run within, within Rocket Ride and within the Rocket Ride Cloud, we only have one OCR, EasyOCR, and we do inference on a really powerful GPU on o- you know, so all those 100, 100 customers are sharing that one GPU with [00:53:00] the EasyOCR.
Rod Christensen: Now, that sounds terrible, you know, some kind of bottleneck or something like that, but we have big-ass GPUs, first of all. Second of all, we scale. If the queue starts getting too full and the latency is actually, you know, going up, we'll actually throw in another EasyOCR on it. You know, so we do two inferences concurrently.
Rod Christensen: If it's continues to grow, then we'll add a third EasyOCR, and it's so it scales dynamically, but most importantly, it scales down as well. Now, what would you do with, you know, inference on on-prem, right? Using O- you know, EasyOCR or Whisper or something like that. You're actually paying 24/7 for a GPU that may be only used five minutes a day on a Whisper model, or eight hours a day.
Rod Christensen: You're still spending 16 hours hosting this Whisper model that you're not utilizing. So in order to get the performance, you have a, a choice. Either buy a really expensive GPU that's only gonna be used eight, eight to 10 hours a day, [00:54:00] or use, you know, Rocket Ride Cloud, which you only get charged for usage.
Rod Christensen: So if it's only used for 10 hours a day, we'll kick that model out when you're not using it and bring in another model and reduce costs.
Joe Maionchi: And one of the things that we're capable, we're able to do is because we understand what's in each of these pipelines using our framework, we can judiciously load these models in a smart way-
Demetrios: Mm-hmm
Joe Maionchi: to, again, maximize the availability of GPU for other usage.
Demetrios: I'm sure there's somebody out there thinking, like, "Okay, that sounds great, but it also sounds dangerous, because if I'm sharing GPUs with other people's companies, like, my data is private and sensitive, and how do you make sure that it's isolated for just me?"
Rod Christensen: So when we do the inference, and we put, when we put models onto, onto a server, we typically batch things into 64 or 128 [00:55:00] batches. They go into the GPUs. Now, unless NVIDIA has some really serious bugs and then does crosstalk, you know, amongst the different channels, they're each individual channels. So when we feed in 128 channels into the GPU, tell it, "Okay, go do the inference, do the operation now," and then on the back end, we pull everybody's stuff off of it, so we split it back up.
Rod Christensen: We're only sharing it while it's on the GPU doing the inference. So unless there's a real serious problem with the GPU, there's absolutely no crosstalk between the, between different, different inference loads.
Demetrios: So the lanes are isolated.
Rod Christensen: The lanes are isolated.
Demetrios: The cost savings thing is huge, man, and I think that more and more as time goes on, folks are gonna just start asking for that.
Demetrios: You're probably already seeing it, that it's like, yeah, all right, we're doing the math here, and if we wanna scale, or if we really roll out this product to [00:56:00] our whole user base, it's gonna start being expensive.
Rod Christensen: We've seen this pattern before, right? With cloud.
Demetrios: Mm.
Rod Christensen: Okay? And, and how everybody wanted to go to the cloud, and then the CFO said, "What the hell's going on," right?
Rod Christensen: It's like... And I'm, even my CEO, he says, "We spent $25,000 a month on AWS," and it's like, why? Where
Demetrios: did that...
Rod Christensen: Yeah, where did that go? Where did it go? Yeah. It's like, I don't know. Yes. So it's like this is, this is just the beginning. We're just trying to get it to work right now, but in the next year or two years, you're gonna see a lot of pressure- Yeah, for optimization
Rod Christensen: you know, coming on, on the op- you know, the operational costs.
