MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Small Data, Big Impact: The Story Behind DuckDB

Posted Jan 09, 2024 | Views 2.2K
# Data Management
# MotherDuck
# DuckDB
Share
speakers
avatar
Hannes Mühleisen
Co-Founder & CEO @ DuckDB Labs

Prof. Dr. Hannes Mühleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs, a consulting company providing services around DuckDB. Hannes is also Professor of Data Engineering at Radboud Universiteit Nijmegen. His' main interest is analytical data management systems.

+ Read More
avatar
Jordan Tigani
Chief Duck-Herder @ MotherDuck

Jordan is co-founder and chief duck-herder at MotherDuck, a startup building a serverless analytics platform based on DuckDB. He spent a decade working on Google BigQuery, as a founding engineer, book author, engineering leader, and product leader. More recently, as SingleStore’s Chief Product Officer, Jordan helped them build a cloud-native SaaS business. Jordan has also worked at Microsoft Research, the Windows Kernel team, and at a handful of star-crossed startups. His biggest claim to fame is predicting world cup matches using machine learning with a better record than Paul the Octopus.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Navigate the intricacies of data management with Jordan Tagani and Hannes Mühleisen, the creative geniuses behind DuckDB and MotherDuck. This deep dive unravels the game-changing principles behind DuckDB's creation, tackling the prevailing wisdom to passionately fill the gap for smaller data set management. Let's also discover MotherDuck's unique focus on providing an unprecedented developer experience and its innovative edge in visualization and data delivery. This episode is teeming with enlightening discussions about managing community feedback, funding, and future possibilities that should not be missed for any tech enthusiasts and data management practitioners.

+ Read More
TRANSCRIPT

Hannes Mühleisen [00:00:00]: My name is Hannes Mühleisen. I'm the co-creator of DuckDB and I'm also the co-founder and CEO of DuckDB Labs. And I take my coffee black.

Jordan Tigani [00:00:09]: I'm Jordan Tigani. I'm the co founder and chief Duck Herder at MotherDuck. And I destroyed my taste buds drinking really terrible coffee when I worked at Microsoft. So I will basically drink any kind of coffee, even if it's been sitting around for, for hours.

Demetrios [00:00:28]: What's up, folks? My name is Demetrios and you're listening to the MLOps community podcast. And before we jump into the podcast today, I have a few musings of my own that I want to get into. I love mangoes. I just absolutely devour mangoes. And whenever I get a good ripe mango or whenever I'm in a place that has mangoes, I am like a kid in a candy store. Oof. They are delicious. But I have one thing that I'm realizing as I get older, I do not like about mangoes.

Demetrios [00:00:59]: And maybe you all can relate, or maybe you're just not old enough and you can't relate. What's the deal when you're eating the mangoes and the shit gets stuck in your teeth? Man, I feel like I'm going to be one of those old men that has a toothpick and is constantly picking at my teeth because I eat too many damn mangoes and the fibers get stuck in my teeth. Ah. But I'll be damned if I ever stop eating mangoes because of these fibers. So today we talk to the founder of DuckDB, the creator of DuckDB, Hannes, and the founder of Motherduck. Wow, what a conversation. I absolutely enjoyed every second of this because we broke down what the relationship is between mother Duck and Jordan got real technical on why he feels like this is a potential model that others can take and what the advantages of or how the advantages of having a setup like this are. And he also broke down what his main goals are when it comes to creating mother duck.

Demetrios [00:02:19]: And then Hollis went into what his focuses are as creating the open source project DuckDB. And then we went, of course, we explained what DuckDB is, why it's been garnering so much attention recently. And if you have not heard about the project, I think you're really going to like this because the TLDR of DuckDb is it's an open source, very fast, very developer friendly database. And that's about all you need to know. Go get your hands on it. Play with it. Hannes talked to us about creating a database that gives a developer a magical experience. And he talked about how he tries to focus on that end developer experience and what he felt made him be able to do it much more than other projects.

Demetrios [00:03:13]: And I loved his rationale. And really it comes down to this. Just care. Care about those experiences. Care about your end user. Have that empathy and make it not suck. Paraphrasing his words. So I hope you enjoy this conversation with Jordan and Hannes.

Demetrios [00:03:31]: And, wow. I know, I loved it. See you on the other side. All right, fellas, so here's the truth. I took a mushroom blend of like, seven different kinds of mushrooms that are supposed to make me smart tur. And I did it just so I can keep up with you all in this conversation today. It's great to have you both here.

Hannes Mühleisen [00:04:00]: There's no need for mushrooms, but thank you very much. Yes.

Jordan Tigani [00:04:04]: Yeah, that sounds like something you have to try.

Demetrios [00:04:08]: Yeah, it is definitely not.

Hannes Mühleisen [00:04:11]: Yeah, well, I live in Holland, so.

Demetrios [00:04:15]: Exactly. Not for honest, huh? No. Yeah. Everyone that has been listening to this show for a while probably has heard me talk about the swag that we created for all of our LLMs and production conferences, which is just a shirt that says I hallucinate more than chat GPT. And I created that and then I didn't ever wear it. I was just sending it to people and saying, like, yeah, use it, it's great, blah, blah, blah. And then I got myself one and I started wearing it. And then I started realizing it definitely attracts attention.

Hannes Mühleisen [00:04:53]: What kind of attention?

Demetrios [00:04:55]: Yeah, for the people who know, they laugh. For other people, it's like, why are you, like, advertising that you hallucinate?

Jordan Tigani [00:05:02]: Yeah.

Hannes Mühleisen [00:05:02]: Did you get some calls from law enforcement or something like that? No.

Demetrios [00:05:06]: Yeah. I just tell them that I am part of the medical studies that are happening these days. So it's all sanctioned. Right.

Jordan Tigani [00:05:14]: And that's what being a founder is all about, is you hallucinate the future and, you know, you helped, you know, ushered into existence.

Demetrios [00:05:23]: That's it. That is it. That's my. That's the story that I am going to stick with from now on. It's just I'm hallucinating the better future for us all to live and play in. And you will thank me in a few years.

Hannes Mühleisen [00:05:37]: I mean, the Greeks had this thing, right, where in Delphi, they, you know, they got, they got this quite high and then they were hallucinating about the future. So I mean, it, as in there is like a lot of sort of track record on that, and that's where.

Jordan Tigani [00:05:50]: A famous, famous database company got their name.

Demetrios [00:05:53]: Oh, yeah, the oracle. No, the oracle of Delphi. Oh, that's, that's so true. I put, I'm putting two, two together now. Yeah, it makes complete sense. So anyway, fellas, I'm insanely happy that we are getting to do this. I feel honored that you're both on a call at the same time. And I know there are a ton of people that are die hard fans of Duckdb and mother Duck and I have to show I can't.

Demetrios [00:06:24]: So for those that are not watching the video, I am lifting up my shirt right now and showing I have my duckdb swag on from the launch party. And I think I told you I've.

Jordan Tigani [00:06:36]: Got a Duck TV shirt on as well. Maybe give the original, original Duck TV logo.

Demetrios [00:06:41]: Yeah, that's the, yeah, that's the collector's item. Honest feels.

Jordan Tigani [00:06:45]: Honestly, you don't have to strip.

Hannes Mühleisen [00:06:46]: Yeah, hang on. I have, I have, I have, I have some stuff here. It's like, I have a, I have a mother duck Hawaii shirt here.

Demetrios [00:06:57]: So that's another good one that I saw people rocking and I would love those. I do use my Ducktv mug that I got from that same party every morning for my coffee because it holds the most coffee out of all my mugs. And I imagine the people that are listening are probably like, what's all the fuss about this duck Db mother duck thing? If they are not familiar? I would love to go through basically the story of first maybe Duckdb, the creation of Duckdb and the why behind it, and then I mother duck, the creation and the insane amount of traction that has come off the back of that and what the relation looks like between the two and how that works before we dig into the actual tech of it all. And so, Hannes, I guess this is where you get to jump in and tell us.

Hannes Mühleisen [00:07:57]: Yes. Well, yes, thank you. I like to compare this to this story in the Matrix where somebody tells Neo that he's always felt there was something wrong with the world, but he couldn't really put his finger on it. And that's a bit where, you know, where this whole thing started. We were teaching courses for people to use like big data systems and it was a very, very bad experience. It didn't really work. And at first you don't really actively question this, but for a couple of years it was like maybe there's something like, wrong and maybe we can do something differently. And that's kind of how we got the first ideas to make DuckTB which was this a totally ridiculous departure from the common prevailing wisdom at the time where everybody was says, like, oh, you need to be scale out.

Hannes Mühleisen [00:08:43]: Otherwise we were doing research, people were laughing at us, like, if he doesn't scale out, it's not, it's worthless. Right? And then Dakti became around and, you know, we thought, okay, we, this is a pretty wild story. Nobody, but it was the hill we were kind of willing to die on. And so we were kind of building our system that was like really like proudly, proudly signal node and proudly, like in process, so we can talk about what that means later. But we thought, okay, we are just going to be happy with being the weirdos to the rest of time. And then with this crazy thing happened because this very respectable gentleman from America, Jordan, turned up and said, no, I think you're actually right. That was something there. Yeah, you may be onto something with this, with this sort of like going small.

Hannes Mühleisen [00:09:33]: And that, for me, this was a total shock when I, when I first heard this. So that was really fun and respectable.

Demetrios [00:09:39]: I think it's probably a good point to mention your background, Jordan, and why you are respectable.

Jordan Tigani [00:09:48]: Yeah, respectful is. Respectable isn't necessarily something I aim for. But alas, I've ended up here. But no, I worked on databases for a while. I snuck into them, I guess, as a big data person and we could maybe talk about big data a little bit later, but helped start Google, Bigquery worked on that for ten years. Then I worked at single store for a couple years. I was very into databases, databases of service, distributed databases. And I remember somebody saying like, oh, if you're not distributed, people will laugh at you.

Jordan Tigani [00:10:29]: And I'm like, but why is that? Because there actually were at single store a lot of actually users who were scaling them up and scaling them up and doing really well. And then we also had a bunch of customers that wanted to scale down. And scale down with a lot of these big distributed systems is really hard because there are just so many components and so many moving pieces. And then I encountered DuckdB. I think somebody was doing some performance comparisons. I'm like, wow, where did this come from? And I started doing some research into it and realized that these people have something really interesting and, and that you can scale up and they have a whole actual research behind what they were doing and they could do a lot of things that customers that I'd seen before wanted to do. And there's a lot of databases out there, a lot of database projects, and I think the one that made me realize that this was real was actually, they had a blog post on, on time zones and I remember in bigquery it took us like five years to add sort of proper time zone support because like, it was like, it's just so hard. Like there's just so many ugly, nasty details and like, and it just breaks your brain.

Jordan Tigani [00:11:55]: It's like, it's like if you have a limited size team, like why would you, why would you spend time on this? But it's like the fact that they had spent the time on this to do it actually right. Meant that like they're not just sort of fooling around in this sort of academic research prototype. They actually want to build a real database. And then that's sort of one of the triggers that let me know that like, hey, this is something that I would be good to sort of hitch my wagon to.

Demetrios [00:12:22]: Yeah, the time zones made you realize it was legit. That is awesome. That's such a good story. So then honestly, going back to the inception was your why of creating it something that you're teaching classes, you're realizing that the experience, the developer experience is quite painful and overkill, I guess is kind of the word. And you said we don't necessarily need. Was it trying to counter the narrative of we don't need this big data all the time, we can go smaller. Or did that come later? How did that play out?

Hannes Mühleisen [00:13:03]: I think at a time we didn't really have a really, I would say straight story despite being researchers and supposedly being very sort of brainy all the time. As I said, it was more like a feeling that maybe it wasn't really necessary. But I think from just, also just talking to practitioners we kind of realized, oh, maybe this petabyte sort of datasets are actually quite rare and people's real problems are like some, you know, some, some couple of hundred megabytes of CSV files and there's absolutely nothing out there in terms of software that, that, that can support that. So we, yeah, it was, it was really like based on a need, which is very uncommon for research. Right. Like normally people love to do research into thin air and just dream up a challenge and it's making me very angry. But I don't want to live like that, especially not if you're funded by taxpayers. I feel like you somehow are sort of indebted to them and need to solve some of their problems at least.

Hannes Mühleisen [00:14:09]: So we thought, okay, let's try to fix the data problems for the people that we have spoken to. It's as simple as that. Right. It wasn't trying to, you know, make, make a whole, you know, as we say in German, to open a big barrel. You know, we didn't try to open a big barrel just to try to, like, fix the problem that we perceived in the world. Yeah. And I think another thing that it was different was that we also, from being from the research world, we were kind of sick and tired of people making these throwaway research prototypes when they were making software. Right.

Hannes Mühleisen [00:14:44]: So when we started, doctor, we were like, okay, you know what, let's try to do this properly, of course, to the best of our knowledge. But it's also a bit of a departure. I think it's one of the reasons why we don't have to constantly fix horrible issues from the past, because we say it back then. No, we want this to be actually used by people.

Demetrios [00:15:04]: Yeah. And what were some of these architecture design decisions that you made early on that you felt like were against the grain and you're still happy about making them to this day?

Hannes Mühleisen [00:15:18]: Yeah. So I think the whole single node thing is still something I'm very happy with. That was not obvious in 2018. Today, I think it's clearer than it was back then.

Demetrios [00:15:30]: Can you explain what the single node is?

Hannes Mühleisen [00:15:32]: Single node means single mode. You have a data system that runs on one computer. Okay, it can be a big computer, but it's one computer. It's not a collection of many computers. Like, what was the prevailing wisdom at the time? So it's single node, it's unapologetic single node. And the other thing is the in process architecture of DB, right, where instead of you talking to your data management system through some sort of socket protocol, which is like everyone else does this, we said, no, we're going to make our lives even harder and try to run inside other people's processes, because that has giant advantages for data transfer, because you can have super efficient interaction with application programs, 30 party libraries, these kind of things. I think those are two things that we picked, and I'm still quite glad with other things may have changed down in the last five years, but nothing, I think fundamentally the fundamental idea is still the same.

Demetrios [00:16:30]: Yeah. And I guess to put a time frame on this too, that would be useful. The project started in what year it feels like what, 2018.

Hannes Mühleisen [00:16:41]: That 2018 is the first commit? Yeah, obviously that's, that's, uh.

Demetrios [00:16:45]: And then Jordan came and rang you up in what, 2020 or 2019 later?

Jordan Tigani [00:16:51]: No, just last year. It was April. April of last year.

Demetrios [00:16:54]: Wow. Okay.

Jordan Tigani [00:16:55]: A lot is a lot has happened since then.

Demetrios [00:16:58]: Yeah, I'll say you guys have been busy, man.

Jordan Tigani [00:17:00]: Yeah. At that point, DuckDB, I think people were just starting to hear of DuckDB and it was kind of, it was still a little bit under the radar, kind of. If you follow like the DB engines ranking, which is sort of like the, I don't know, the, like Premier League tables, you know, league tables for database nerds. I think they were down at like 270 and they sort of had been rocketing up the, the list since then. It was a little bit under the radar, but just coming under its own. People were starting to talk about, hey, there's this great, amazing database, and talking about it with the kind of effusive praise that people don't usually talk about data management systems. They don't usually talk about technical things at all. That way, you know, you could tell that they were, they were on to, they were onto something.

Demetrios [00:17:53]: Yeah, you know, there's something special. That's true when I, I think that's how duckdb came onto my radar. Is that fanatical or. Yeah, how, how can I say this? It's, it's not fanatical. It's more, uh, the absolute love in which people talk about duck tv is incredible because just as you said, Jordan, it's not that often you hear people talk about technology that way. If anything, technology has to live up to a very high standard. And as soon as it does one thing wrong, it's like, ah, this is a piece of shit. But I started seeing a lot of people posting things and then I think one of the, I guess we could call it marketing ploys that I latched onto and I was like, whoa, this is a really easy way to understand what you're trying to do is the idea of big data is dead.

Demetrios [00:18:57]: So I don't know who, which one of you, which camp did this? If it was the duckDB camp or the mother duck camp, but that made everything so clear for me in whatever, four words, you know, so that was.

Jordan Tigani [00:19:12]: I mean, that was, you know, I wouldn't necessarily call it marketing ploy, but it was certainly like, you know, kind of the culmination of a lot of experience at databases and maybe it was a little bit over, you know, overly, overly done, but, you know, basically to make a point, because I think that we just would have been in this big data era for a long time and everyone just sort of assuming that everyone has big data and there's just a lot of people that we talked to who were like, who, you know, they have you know, reasonable sized data that, you know, is a few gigabytes and maybe tens of gigabytes or even hundreds of gigabytes, but that's. That's not. That's not really big data anymore. And people kind of felt almost like they weren't real data engineers if they weren't, you know, operating on these huge systems. And I think that kind of big data is dead. At least gave them, like, a justification of, like, yeah, you're valid. Your experiences are valid, too. And it sort of comes from working on bigquery for a long time.

Jordan Tigani [00:20:18]: Bigquery. You'd think people had big data and big workloads. But really looking at what people were doing is most people had actually very small amounts of data, even really big customers and really big names. Most of the work that they would be doing isn't over their giant logs tables. It's over some sort of summarization, cleaned up version of the data. When you clean up the data, you make it smaller. And virtually all of our queries were almost all of them, probably 99.999% were sub terabyte. 90% were sub 100 megabytes of kind of trying to get the idea out there.

Jordan Tigani [00:21:06]: I was trying to get the idea out there that, hey, most people don't have big data. And this is something that I'd seen in the real world. And I think since then, people have been sort of backing that up and validating that the people that do have big data, they tend to only use smaller portions of it at a time. You have sort of the hot data. You might have like, ten years worth of logs, but you're really looking at the last seven days or the last one day of data, or you are working on some sort of cleaned up version of that. And so people would always talk about the size of their data warehouse. And when they would talk about the size, they would generally talk about the terabytes or the ten terabytes of logs that they had. But that's actually not all that useful when you have separation of storage and compute.

Jordan Tigani [00:21:56]: Because basically everything's being stored on s three. Okay, you can dump a whole ton of crap into s three. But the really important part is how much of that are you actually using? And so systems have been designed for the last decade. A, to deal with that whole multi terabyte size, which is no longer relevant, and b, they were designed in an era where a machine that had a few gigs of ram was considered really huge. And nowadays, like, if you look, you look at your ec, two instances, very often your vm, maybe it's only 16 gigs, but those are running on a physical machine that are actually quite large. Many of them have terabytes of ram and hundreds of processors. So there's really not a whole lot of workloads that won't fit on those machines. And so just the design that we've been building systems around I think is not as relevant to the sizes and shapes of data and data workloads.

Jordan Tigani [00:23:09]: And sort of getting back to hannes point about single node and scaling up is like, yeah, single node is, single node can work and can be just so much simpler and so much faster. Because just remembering from the bigquery days, like a lot of the work that we did, a lot of the difficulty was on dealing with this complex distributed system. And it's single store was the same way. Sort of like how do we do like this distributed two phase commit? Like that's a really hard problem in a single node. Like a lot of these problems just get so much, so much easier. And that means you can move fast. And I think we've seen kind of duckDb, which was awesome in April of last year, versus DuckDB now is like incredible. Like the fact that they keep getting better and all these benchmarks and adding more features and innovating in SQL dialect and all of these things that they wouldn't have been able to do if they had been chasing this sort of complex distributed system problem.

Demetrios [00:24:12]: I like this because it really shows that ease of use is both for the developers and the user. So the user developers and the developers of the system. And so it has that basically the double whammy. So you can enjoy it from both sides. Whether you're building the actual database or you're a consumer of that database, you get that ease of use. And I also want to highlight something else that you said, Jordan, which is around the big data is dead type thing and how many people probably felt that they needed to over engineer, or at least that was the consensus, right? Like you're saying, if you're not doing this, if you're not setting up for scale, then what are you doing? Do you even call yourself an actual engineer? And then at the end of the day you realize, you know what, nine times out of ten, I don't need it. Even if I have the data, I'm not using it and I don't need it. And nobody else does either, by the way.

Demetrios [00:25:17]: Like my data scientists, they don't need that data either. They only care about the freshest features that are in the last hour, let alone the last like month. And so speaking about that, it shows how, again, it just goes to you were able to hit a nerve and it almost feels like there was a lot of people that were in the dark about something or they felt a certain way and there was a lot of pent up demand there. And then you gave a voice to that and everybody's like, yeah, this is what I've been feeling for this whole time. Of course, like, that makes so much sense. And so kudos to you for actually, like, naming it and letting the whole movement rally around it. I do want to talk a little bit, like staying on this Jordan, on the creation of mother Duck and then what that story was like because I understand you reached out to Hannes. You're like, hey, this is cool.

Demetrios [00:26:15]: I want to do stuff with it. How does that work? Or did you just start committing and then after a few commits you were like, wait a minute, I need to like talk to Hannes and do more.

Jordan Tigani [00:26:25]: So my plan was to, like, you know, was, was to hack on something. I'm like, somebody really should build a serverless, you know, duckdb, a cloud, cloud duckdb. And then I'm like, hey, well, I've worked on, you know, help build two cloud databases of service. You know, maybe that should be me. And I'm like, so maybe I'll just, I was, you know, hack on this for a little while. I get about three days into it and I'm like, you know, I should really kind of reach out to the duckdb folks and see if they'll hire me. And so I got an introduction to Hannes and Mark, and I remember I talked to them on like, on Monday, a Monday morning. And like, it's like trying to feel them out, like, hey, you guys thinking about doing something in the cloud? I'd love to work on that.

Jordan Tigani [00:27:11]: And they were like, no, we're really not going to, don't want to do that, but we'd love to partner with somebody who would do that and seems like you have a reasonable background and this might work. That same afternoon I talked to, because the person who had introduced us was Lloyd Tabb, the founder of Looker. And he'd been a bit of a mentor to me and I kind of had known him really well because he was one of the few people that really got bigquery and what was possible with bigquery from an early, early stage. And so he introduced me to Hannes and he said, you should also talk to my friend Tom who invested in Looker and Tom was Tom Tungus from then. He was at Redpoint. Now he has his own hung out his own shingle as a VC. And so I talked to him in the afternoon and I really had nothing written down and just sort of his vague ideas and vague plans that I talked him about for about 15 minutes. And he's like, yeah, this is a good idea.

Jordan Tigani [00:28:11]: I want to fund it, come to my partner meeting next week. And so I was like, went from like, hey, maybe I should hack on this thing to, you know, funding the partner, honest and Mark wanting to partner and like, a VC wanting to, wanting to back this. And it just sort of all of a sudden accelerated extraordinarily rapidly from there. A bunch of people kind of came out of the woodwork and said, hey, I'd love to work on this with you, with some really incredible people. And then I think we came up with an interesting way of partnering with the DuckTV team, which I think has also been a superpower for us in getting started.

Demetrios [00:28:52]: So before we talk about the partnership and how that looks, because I know for me it was very confusing in the beginning. I thought it was just the managed DuckDB. Hannes, do you remember that story?

Hannes Mühleisen [00:29:04]: Yeah.

Demetrios [00:29:04]: What was it like for you and for when that happened?

Hannes Mühleisen [00:29:06]: I mean, I know the Americans are crazy, but I didn't think they would be that crazy. So, like, we talk, as Jordan said, we know we talk on the phone and we are like, yeah, no, this sounds great. Like, we totally can talk about that more. And then he gets like in a plane the next day or something like that and comes out here to Amsterdam where we at, and just to talk things. And that was just at the time we were like, what's going on here? And we had already been a bit burned by VC's, essentially, like a lot of interest from VC's that wanted to fund this. And they were all like, yeah, you need to make a service. But we didn't really want to because we know some people that did databases and service really well. And I said, look, this is really a separate thing and you really need to know what you're getting into.

Hannes Mühleisen [00:29:53]: So we thought maybe we can find some competent person, respectable person, not at all crazy person that wants to do this. And then Jordan just shows up. And it was really wild, I have to say. But we got along and so it was a quick success. I'm still very grateful for that.

Demetrios [00:30:20]: Matched perfectly, man. Wow. And what a story, Jordan going from trying to get a job to having funding secured all on the same day and then feeling like, all right, tomorrow I'm going to Amsterdam.

Jordan Tigani [00:30:33]: It was pretty much like that. It was actually crazier than that. But like, well, I need to probably write a, write a book or something at some point.

Demetrios [00:30:40]: Yeah.

Jordan Tigani [00:30:41]: Like, I think if this works out, it'll be an interesting book. If it doesn't work out, it'll be sort of like, you know, what can go wrong?

Demetrios [00:30:49]: Yeah, the memoirs, man. This is all. Yeah. Make sure you're taking notes and keeping it for later because that is an awesome book that I'm sure many would love to read. Now let's get into the partnership and how it looks because I was under the impression basically I heard a duckdb and then next thing I know it's like, oh yeah, Motherduck. They came out of stealth and they've got this cool product and then they've got funding. And so that was what I heard. And I immediately assumed, okay, Duckdb is the open source and then Motherduck is the managed service on top of that.

Demetrios [00:31:23]: Like the managed database. But I think there's a lot of nuances and differences and also inspiration in the ways that you both work together and the partnership works.

Hannes Mühleisen [00:31:34]: Yeah, I mean, it's what? It's true. The impression is true that Duct DB is the open source part and motherduck is a managed service. That is fair. That's correct. But I think what is different is that it's two different companies that are basically doing these two things and they're very clear with very clear sort of goals and different staff, different leadership, all that stuff. So.

Demetrios [00:31:57]: Yeah, and it does. So there's so many questions that come to mind. It is for you, honest, right? Like you're probably just trying to figure out how to make a better database and you don't care about shareholder value or the VC money that got dumped into Jordan's startup. None of that really matters to you. So it feels like at some points there could potentially be disaligned incentives or you could want to go two different ways. Have you hit that yet or is it still too early and you haven't had that problem?

Hannes Mühleisen [00:32:30]: I don't think we have hit this problem yet. I think just to call making database something like is just making database, I think is a big insult to me. Making database is the end all to everything optimizing, especially.

Demetrios [00:32:44]: I knew you were german when you were like, I just want to optimize the shit out of this database.

Hannes Mühleisen [00:32:49]: Well, we haven't optimized the shit out of it yet in a big departure from being german, we have actually looked at making the user experience good. So this is a very non german sort of activity. But no, I think the relationship is absolutely special. I think I haven't heard it a lot. We didn't want to go for VC's just to keep because we want to really focus on making something that as many people can use as possible. And then sometimes there's a bit of a conflict there with the product strategy of a company, let's say. I mean, we've seen this time and again there's an open source sort of project. They add a feature and they later have to drop it again because their investors say you can't and you have to increase revenue and then you better take this feature out.

Hannes Mühleisen [00:33:36]: And then everybody screams at them. I mean, this is very unfortunate. And by having two different companies, we kind of, I think, elegantly sized at this problem.

Jordan Tigani [00:33:47]: Hey, this is Mike Delvalso, co founder and CEO of Tektone. Mlops community is the best way to stay in the loop on the latest mlops news and best practices. It's also a great way to connect with experts and get support from an amazingly helpful community. Subscribe and stay in the loop. Yeah, there seems to be like this pattern of people that are building companies based on open source products that, that really the way they get funding is adoption of the open source product. And so they focus only on the development of whatever that product is and getting it out of as many people's hands as possible. And then people ask, well, how are you going to monetize? And they say, we're going to monetize via a service. We'll have SaaS, and then they have two people working on that kind of in the background, but it's not really their goal is to build something amazing because that's really how they get, they get the funding, that's how they get the product led growth, adoption, et cetera.

Jordan Tigani [00:34:52]: But the problem about that is that that really underfeeds the databases or the service part, whether it's a database or something else. And so they really don't focus on innovating in terms of delivery of the service, how that service works, making something amazing and different and differentiated. And then also the, that ends up leading to something that AWS can just clone because you have this great open source thing and then, okay, you have this sort of half assed service and Amazon can be like, okay, well, I can run this service better than you can. And meanwhile they're eating your lunch. And so we wanted to sort of take it from the other direction is be like, okay, the Duct DB team, the DOCDB labs team is going to build an extraordinary database and that's going to be their focus. That's what they want to do. And that's wonderful. We're going to build an awesome, differentiated, innovative database as a service.

Jordan Tigani [00:35:50]: And we think we're doing a bunch of really cool things with like the hybrid execution and kind of how we're doing auto scaling and individual duckdb backends. I'm happy to talk through some of those things. And then the key thing is you got to have a way to kind of tie yourselves together, at least loosely. And so when we started, one of the things that we really wanted to make sure that we did was not just say, hey, you have awesome open source technology. Thank you. We're going to take that and make a bunch or try to make a bunch of money out of it. We did want to include them. We gave them a co founder share.

Jordan Tigani [00:36:28]: We have a development relationship with them that does make it, you know, make it so that if things work out for us, it works out. It works out for them, but they're also free to go and build things how they believe is the right, is the right way to build the database. And I think it's this relationship that I think if this works out, it'll be seen as like, hey, this is a good way to do kind of open source development. And if it doesn't work out, it'll be like a cautionary tale. Um, and, uh, you know, because people have warned, you know, as you, as you mentioned, that, you know, the, the relationship is going to be, is going to be hard as, you know, incentives, incentives diverge. Um, you know, if, you know, people who are building competitive things, you know, pick up on DuckDB, the DuckTB team is going to be super excited about that. And meanwhile, we're going to be like, hey, they're eating our lunch and not nearly as excited about it. And I think that's something we just have to, it's a bridge we'll have to cross when we get to it.

Jordan Tigani [00:37:35]: And we want to just focus on building a high trust relationship that we hope will take us as far as possible.

Demetrios [00:37:45]: So I do want to get into all the features and the differentiators and when you see people using duckdbad versus when you see people using motherduck and what the use cases and benefits are for each of those. But before we get into that, I'd love to just talk about building an incredible developer experience because it's going back to this idea of, there's been a bit of a movement and there's, people don't usually talk about technology the way they talk about Duckdb and Hannes. You're like, we haven't even started with the optimization. We've just been focused on the developer experience. Both of you, I imagine, have that in the, like, I don't know if you have it written on your wall or if it's framed. What does this look like and how do you optimize for that? How do you go about building that? Is it that you're in 90%, uh, user meetings versus actually 10% developing the product meetings or like break it down so that the rest of us, when we are building out, maybe it's, we're building out our internal machine learning platform, we can have these experiences that people come to us and say, whoa, this was a cool developer experience.

Hannes Mühleisen [00:39:09]: So I do not sit in a lot of meetings with you didn't strike.

Demetrios [00:39:14]: Me as a type, but I, no.

Hannes Mühleisen [00:39:17]: I think maybe there's an aspect that we actually care about. I mean, we actually care about data management systems and we want to not make it difficult to use these things because we feel like this would really, really hinder their adoption and means that our impact is, is not as big as it could be with the work that we do. I think it harks a bit back from sort of first principles about what you want to do as a scientist that we kind of started out as. It's like you want to make you have your ideas sort of be adopted in the world. And how do you do that? Well, you know, make it not suck. I don't know.

Demetrios [00:40:03]: It's very easy to grasp.

Hannes Mühleisen [00:40:06]: It's a simple concept, Jim. You know, you ask yourself, is this really the best it can be? And is it like, can we change this to be better? Can we, like, for example, DucktB has no external dependencies, right? You can just install it if you want. You can just compile it with just a compiler. Okay. And that is annoying, you know, for us, because we have to, you know, we can't just like pull in like some library. When we do something, we have to be like, no. Okay, can we maybe inline some part of this code? Can we maybe do it ourselves? Maybe can trick the operating system into doing something. But the upside is that it sucks less to work with DuckDB because it means that you can just install it and you don't have to ask your admin and you just can download a binary from the Internet.

Hannes Mühleisen [00:40:53]: By the way, ask your parents about downloading binaries from the Internet. But it's just simple. And I think I, we really love data management systems as a piece of technology and it's really hurting my feelings to see things like oracle being or other big database companies. I don't want to single them out, but being these things, being incredibly hard to use, being incredibly clunky, being like this, like the mummies of the seventies are looking at you when you stare into their command line interface or SQL dialects. It's just like this is, if you care about this, it's really hard to see and it's nothing. I think it's also like there's this step where data management systems have been around for a while. They're an incredible piece of technology, but they have aged a bit and nobody has really been able to reimagine them without throwing away a bunch of other stuff in the process which may have gotten a bit over the edge. And in some sense, DuctTB is an incredibly traditional system.

Hannes Mühleisen [00:41:54]: It has a SQL interface.

Demetrios [00:41:55]: Wow.

Hannes Mühleisen [00:41:56]: It deals with tables like, wow. It doesn't do like graph or stream or whatever, it's bulk processing. No, you run the query on the data. That's it. But at the same time, it's also the case that this is still the biggest use case for data management systems in analytics. So targeting, that was a really great idea, but then just take the pain away. It's hard to describe for me, but I care deeply about how people perceive the things I love. You know what I mean?

Jordan Tigani [00:42:25]: There's a certain interesting thing about the perfectionism like that I find pretty amazing is duckdb is super fast. And on all these benchmarks duckdb has been kicking butt. But the performance is almost the exhaust of the engine that is just trying to do the right thing. And it's like, hey, we build this right, we do it the right way. Like, I love the story of the new changes to aggregation that they're trying to make it scale better to be able to not run out of memory when you have, we're aggregating a lot of things and basically in the mechanism that they were using to do that, oh, it has a side effect of making these benchmarks much faster and making a lot of these types of queries much faster. But that was just sort of a side effect. It wasn't like, hey, we're trying to go crush these competitors on these TPCh. Query nine needs to be faster and so we need to put in some special optimization rule for this.

Jordan Tigani [00:43:29]: It was really just, hey, this is the right thing to do. They do it and it has incredible outcome.

Demetrios [00:43:38]: Yeah. And I like this principle of doing the right thing, not the easy thing. It's like, ah, man, no external dependencies is not going to make our life easier, but it is going to make the experience better. So we might as well just get in there and do it. How much are you having students dog feed or eat the product per se? And then you are learning from them. It's just basically a research lab and you got a bunch of guinea pigs going on in there.

Hannes Mühleisen [00:44:13]: I don't think we've waited a very long time till we use DuckTV in courses, actually, because it's, nothing is more embarrassing than the professor's half baked thing that he forces on you. I've been in, I've been that student and it doesn't make anyone look better. Right. So we have actually waited quite a long time. Now we're like, okay, I think the, in the beginning we didn't have a ton of feedback, but we did have some interactions with people in sort of the practitioners to see whether we're going the right direction. And as you mentioned, there's all this excitement. There was a ton of excitement about DuckdB version 0.1 that we couldn't comprehend because it was quite half baked. But people got excited and I think they got excited because.

Hannes Mühleisen [00:45:07]: Yeah, because we cared. I don't know. It's very mean to say we cared because everybody else also cares, but they have very, very, very competing goals. Like if you're in a company, you care about paying your people. That's an important sort of thing. We were at a research institute when we came up with a thing where your incentives are aligned quite differently, where you're there to do the, do the right thing, I don't know. But you dare to, like, you have simply have simplified, like abstracted away a lot of concerns that other people usually have. So we could say, okay, this is going to take us a year longer to do it like this, but it's the right thing to do.

Hannes Mühleisen [00:45:45]: So there we go. Which you could never do in a startup. Right? Your seed funding round will not allow these kind of things.

Jordan Tigani [00:45:53]: It's, yeah, I think also, like, you know, most database people and databases, they focus on just the database part. Once you give me the query and I compute the results, then my job is done. But the places where people have problems ends up being, okay, what happens before then? How do you install it? How do you get the query there how do you get the data there? How do you make it easy? How do you get the results out? And that Duckdb focused on the whole end to end experience, whereas most other database companies, that's somebody else's problem. And so people were kind of not being receptive. A lot of database companies and people tend to not be receptive to the, oh, well, that must be, you weren't able to install that. That must be a user error or, oh, like, you know, the, you couldn't import this CSV file. That must be a user error. That's not my problem.

Jordan Tigani [00:47:01]: Like, and I think DuckDB just, you know, focused on those problems as well, which are hard problems. I need to make those, to make those easy. And I think that that also is one of the reasons why people were so excited about it.

Demetrios [00:47:17]: And now I think the thing about it is that after we have that experience, it's like it's opened up our mind to that it's possible. And so now we hold everything else to such a higher standard.

Hannes Mühleisen [00:47:29]: Oh, I'm so sorry. This is like, yeah, but I, but, you know, the CSV thing is actually an excellent example because it's, we observe this, right? So you want to do something with data. You have a CSV file, because of course you have a CSV file. And so it's actually after installation, it's the second thing you kind of do with your shiny new data system. Right. And if that is a horrible experience, then you are sort of scarred for life and you would rather go home and, I don't know, punch the wall or something. So we actually wrote a research paper about CSV, reading that, the results of which are now part of Duckdb. And this is why we have, I think, the planet's best CSV parser at this point, which is something that you would.

Hannes Mühleisen [00:48:15]: That we have, like one PhD of computer science who does nothing else than work on the CSV parser because we realize it's such a incredible crit, it's such a critical piece. It's like if people can't get their data into your system, it doesn't matter how good your join operator is, right? It's absolutely irrelevant. So, yeah, sometimes it's amazing and as Jordan has said, like this entire package, but it's also just sometimes amazing. Like, have actually these people ever tried to use these things themselves? And it's actually something we do with, we encourage our team and we do it ourselves. It's just use this thing, just try it, you know, just process some data, like see what happens. And then if you run into something like, hey, maybe other people run into this as well, right? It's pretty obvious.

Demetrios [00:49:01]: But then you throw a PhD on it, you say, hey, well, that intern, that research, I insist, that's your next PhD. It's good luck.

Hannes Mühleisen [00:49:13]: I mean, no, I mean, we're fortunate that, let's say that the degree density at the company is quite insane. So it's like, well, I love this.

Demetrios [00:49:22]: Idea, too, because in events, I've heard it explained as the moments in between the moments. So, like when you have an event, you have the big moment of the singer going out onto stage, but there's a lot of moments that happen before you get to where the singer is on stage and that incredible song where your heart is jumping out of your body, there's the parking, there's getting through the ticket line, there's waiting, watching the opening act. All of that are the moments before the moment, right. And so if you can optimize those, the whole experience as a memory in your mind is going to be incredible. And it feels like that's kind of how you are looking at it. You're saying, we're not going to just focus on how incredible these joins are because that's great and everything, but what we're really going to focus on are these moments before the moments or in between the moments, I guess, is what you could call it. And so it's cool that you also have the ability to think long term and say, you know, what's the biggest pain here? And it's probably going to take us a year, but we're going to figure out the CSV headache and so we go after it.

Hannes Mühleisen [00:50:40]: Yeah, absolutely. I think that's a very good way of explaining it. I'm actually going to steal that metaphor with the concert because I like it. I think airports are also a great example for this. The flying is exciting, but the process of getting there does impact the overall experience quite drastically, let's say.

Jordan Tigani [00:51:00]: But are you saying you're going to be the Taylor Swift of databases?

Hannes Mühleisen [00:51:05]: No, I don't think I'm attractive enough for that. No, but I think it's also like we also talk to people and we are not the reference. If I struggle with something, it's probably really bad, because if you're an expert in data management systems struggles with something, it's not a great look, I have struggled immensely with all the database systems out there, which is not a great experience. But I mean, we also really love to hear from people. So if you're listening and something really makes you angry about duckdb. We really want to hear from you. Right. Because we are not the absolute reference and what is good or bad and what is difficult is not difficult.

Hannes Mühleisen [00:51:43]: In fact, we are a terrible reference, but at least we try. And we really need to hear from people where they're struggling with. And yesterday, two days ago, I was in Berlin at like a DuckTB motherduck meetup and somebody just walked up to me and said, like, listen, here's this thing that really bothers me about Doctor B. Can you do something about it? And actually this afternoon I was, you know, working on this because it matters. It matters that this, this person had a valid point and there's something we can do about it. So let's do it right. It's incredible.

Demetrios [00:52:17]: Yeah. And that is the love for the experience. And it's very clear that you understand the value in that and that it's almost like now that you have it, you can't let it go to waste. Right. You've got to keep it up. And so that is, that is one of the other questions I wanted to ask around. How do you go out there and gather feedback and talk to the community? It feels like the community is gigantic now and they can give you this feedback. It's almost like the opposite problem of a company or a open source project that is starting where you're just trying to get anybody to give you feedback.

Demetrios [00:53:00]: Now, I imagine you're trying to figure out what the biggest signals are or what the best feedback is or all of that. It seems like it's a bit of a difficult position to be in.

Hannes Mühleisen [00:53:14]: It's a big data problem. No, but in all seriousness, we have people in charge of this. There's Devrel people at duct, DB Labs, there's Alex and Gabor. And part of their job is to collect these signals from the community. And obviously sometimes it reaches me. I can't say I read every single issue report or every single sort of post on discord anymore, but if things come up often enough, it does leave sort of a mark, like, oh, yeah, now this is a common problem. All these, like these 20 people have all run into the same problem. Okay, interesting.

Demetrios [00:53:49]: Awesome. Yeah.

Hannes Mühleisen [00:53:50]: So I think we're still at this point because we're still at this point where people I don't report have no issues to report back. Like sometimes it's difficult to kind of deal with, you know, yet another sort of issue report because our worldview is deeply skewed. We only see the problem. So I'd be never really. We don't hear about the success story so often, but it is a good sort of channel into the users and see where it goes. And of course you have to understand the same time everybody, every time that somebody says it didn't work, there have been ten or 100 people that write on the exact same issue and haven't said anything. So it's an interesting sort of dynamic, I would say.

Demetrios [00:54:29]: And so Jordan, getting back to mother duck because thank you for your patience while I went on that gigantic tangent with honesty, but I do want to know, talk to us about motherdog and what basically you saw as, all right, this is mother Duck. We feel like this managed service can have extra abilities and how you go about building those features and why you think those are, are useful to have. Is it also going out there and talking to people? I imagine, uh, is it what the users are seeing and you had to start from somewhere, right. And you came out of stealth not too long ago. I was at the party, I can't remember when it was. And yeah, last June.

Jordan Tigani [00:55:16]: Well, so first, you know, Duck TV is set a very high bar of, you know, like, so making, making the, the end to end, you know, process work. And that's why people love duck tv. And if we don't kind of continue that to the service oriented parts of the product, then we're going to be sort of letting people down and letting down their expectations and things aren't going to work. So we have to also make sure that we have very, very great attention to detail on how do you connect, how do you visualize, what are the ways that mother duck can work? I think we're spending a lot more time, energy and just focus on those parts than a lot of other companies might. I think the nice part is we can take the awesome database for granted, which is hard, and making sure we don't get wrong answers and all those kinds of things. We don't have to focus on those the way other databases as service companies companies might. But I think we're also trying to innovate in the delivery of the database and of the service. I think when Snowflake and Bigquery came out, they had a lot of things that they were doing differently in terms of separation of storage and compute and simplicity of very few knobs and how you interact with them that have started to become standard.

Jordan Tigani [00:56:53]: Separation of storage and compute is now table stakes. But I feel like it's been a long time since people have actually innovated in what are the things you can only do in the cloud. Those are the kinds of questions that we're asking ourselves. For example, one of the things that we're doing is, and it's sort of something that's also special with DuckDB is DuckdB is so lightweight, can scale down, has no dependencies. The JDBC driver, which is the connector that most code uses to actually talk to the database, has the database code inside of it. The web UI can actually run, DuckDb can run the database inside the web UI as webassembly, it can do so very, very fast. And so one of the decisions we made is that our client is always going to be duckdb. So every time you're talking to motherduck, there's a duckdb, there's a duckdb locally.

Jordan Tigani [00:57:52]: And so all the people that are building connectors to duckdb and all the ways that duckdb is being, being used, those will just work with, those will just work with motherduck. And the other kind of interesting thing that does so if you have a full blown analytics database on your client as well as in the cloud, what are the things that become enabled by it? Because I think also when I was working on bigquery, there was a bunch of times where we wanted to do complex manipulations or interesting things, client side, and we basically just said, no, no, no, don't do that. Because basically clients can't be trusted to do hard things. But now we have a real database there and the same database that we're running in the cloud. Things like, because hybrid execution where we basically you can join local data against remote data. So if you have like if you're running in a Jupyter notebook and you're running Python and pandas code, you have a data frame, you can join that data frame against data that lives in the cloud. And we'll make that join optimal and we will, there's a bunch of interesting things we can do on the security side and also just building incredibly reactive user interfaces and visualizations where if you're running the database in your client, you can basically do 60 frames per second data visualizations like video game style against datasets that actually are quite large, that won't fit in your browser. And you can do some really clever things.

Jordan Tigani [00:59:30]: And those are some of the things that we're working on to just take DuckDB and supercharge it. Because I think Duckdb is not a data warehouse. It's an amazing query engine's amazing data management system. And I think also we are adding the pieces around it from user management and working on a team and larger data sets and durability and time travel and some of those kinds of things that people expect now out of a data warehouse and layering that on top of this core data engine while at the same time giving people the same feel because they have a local duckdb, that it's just like using duckdB, but now, boom, I get access to the cloud.

Demetrios [01:00:19]: And this idea that you're saying, you started from saying, how can we rethink what is possible now that we have the cloud? It again goes back to basically what you were saying. Hannes on man, the databases have collected cobwebs over the years and nobody's rethink how we're going to make this experience better. And so, Jordan, like, when you're thinking ahead in the future for mother Duck and what is possible because we have the cloud, what are some things that get you excited?

Jordan Tigani [01:00:58]: I mean, I think some of the innovating in the delivery, innovating in visualization, innovating in sort of this sort of tight coupling where people don't have to know where data is, where people don't have to know where something gets run. You know, you can basically move, you can move data around and you can move, like, to the extent that, like, sometimes people care where the data is. Like, you know, you know, there are laws that say, like, you have to process german data in Germany and, like, australian data in Australia, but for people that don't care, like, they just want it to be fast. And so we can actually make sure that things are close to where the users are running. You know, is one, is one example. And, you know, I think there's, there's a bunch, there's a bunch of other things that I think we'll, we'll hopefully start to see in, in mother duck in the, in the next upcoming coming months.

Demetrios [01:01:46]: Yeah, and, but just talking about that, not caring where the data is, you could still put constraints on it for saying, hey, I don't care where my data is as long as it's in Germany. Right. Or as long as it's in Australia. And I don't, I don't need to think about that, but I put my constraints on it in the beginning. I said it once and then forget it.

Jordan Tigani [01:02:07]: Exactly. Rather than saying, okay, this needs to run in EU west four, which happens to be a data center in Berlin, because that just doesn't give you the flexibility of, okay, maybe there's other data centers in Germany and, like, and some of your other data is in that other data set in Germany, and all of a sudden you have to start paying egress fees because somebody else is using that other data center and like et cetera, we want to just make it so you set the constraints that you care about, but otherwise you shouldn't have to, you shouldn't have to take up mental energy with having to think about those things.

Demetrios [01:02:41]: So being that the majority of people here that are listening are probably dealing with machine learning and machine learning problems, duckdb and motherduck obviously have thought about the way that machine learning engineers play with the data and the database. What are some things? And I'll just loft this up to either of you. What are some things that you see when it comes to machine learning? And I the newest buzzword of them all, AI and all that fun stuff, even, dare I say, large language models or vectors or any of those big old buzzwords. But what are the places that you're seeing machine learning engineers interact with? Duckdb or motherduck?

Hannes Mühleisen [01:03:33]: Yeah, I'm going to skip my usual rantin and go straight for we do actually, we interact with people that use duct DB in machine learning pipelines, and there are several things we do to make their lives better. Part of it comes from the architecture of Duct DB itself because it's in process. It can be in a Python process, and then you're going to run your model in the same python process and hey, your data is already in that process, which means we can actually very cheaply ship data back and forth from, you know, any kind of models and then do things that machine learning frameworks are usually terrible at, like reading data or, I don't know, handling updates, persistence, consistency, that whole stuff. So that's something, I think, where we have clear strengths that can be great benefits to classical machine learning workflows. You mentioned vectors. That's also something people are trying to actually, people build vector databases on top of DuckDb, which is which we love. And we've recently added a new type to duckdb, a data type called a fixed size list data type, because it turns out these vectors always have the same length. And if they have the same length, it means we don't have to store the length for any, every individual vector, which means we can store it more efficiently.

Hannes Mühleisen [01:04:50]: But that's kind of the level we think about is like, what can we do from a basic infrastructure perspective to be more useful to that community? That's our take on it. We don't have to integrate a large language model ourselves. I don't think we want to, but we can be a building block for machine learning communities. So that's my take on it.

Jordan Tigani [01:05:17]: People in machine learning want to integrate and interact more closely with their data. And I think we've seen things like BQML and other attempts to operate over do machine learning on data that's stored in your data management system. And I think a lot of people use data lakes and lake houses to be able to do that. But I think there's an advantage to, to something like DuckDB or to DuckDB, which in the fact that it is scale up, since a lot of these machine learning algorithms scale up better than they scale out, they can be very hard to split across multiple machines. And so having a local scale up database is very useful. And I guess the other thing is the lingua franca of, of AI and ML is Python. DuckDB has extremely good python integration and some special things that you can do in Python with the DuckDB relational API that can turn into either pandas operations or spark operations. They make it easier to work with your data that's stored in a database.

Jordan Tigani [01:06:40]: And they finally, I'll add that mother Doug. We are adding some LLM features, text to SQL, text to SQL stuff. People are changing data analysts are changing how they're interacting with data with the advent of LLMs. And so I think we would be remiss if we weren't at least sort of investing in that area. We're also working on a joint duckdb specific LLM, you know, tuned, fine tuned LLM with a company called number station AI. Oh, I was going to open source at least one version of that we've been hard working on. And so we do want to also make sure that we're giving back to the community.

Demetrios [01:07:24]: Incredible. Yeah. Diego, who is, I think the head of product there has been in the community for ages, since the beginning. And also Ines has come on here a bunch of, and I love number station, love what they're doing. And I'm very happy to hear that you are creating your own text to SQL model. That was one of the questions I was going to ask. I was like, oh, are you using the open source number station? Are you going to be, what is it the. I can't remember Q SQL or some, I can't remember what their open source foundational model is called, but it's very nice to see that.

Demetrios [01:08:00]: And I, fellas, this has been awesome and I'm so happy that you did this. What do you think? Did the mushrooms work? Was I asking good questions?

Hannes Mühleisen [01:08:12]: Those are great questions. You had great questions. Thanks for having us.

Demetrios [01:08:16]: Excellent.

Jordan Tigani [01:08:17]: Yeah, thanks much. Hey, everyone. My name is Aparna, founder of Arise, and the best way to stay up to date with mlops is by subscribing to this podcast.

+ Read More

Watch More

Building for Small Data Science Teams
Posted Dec 19, 2021 | Views 791
# Spothero.com
# SpotHero
# ML
The Birth and Growth of Spark: An Open Source Success Story
Posted Apr 23, 2023 | Views 6.2K
# Spark
# Open Source
# Databricks
The Power of Small Language Models: Compact Designs for Big Impact
Posted Aug 08, 2024 | Views 94
# SLMs
# RAG
# PremAI