Sign in or Join the community to continue

Going Beyond Two Tier Data Architectures with DuckDB // Hannes Mühleisen // DE4AI

Posted Sep 18, 2024 | Views 645

Share

speaker

Hannes Mühleisen

Co-Founder & CEO @ DuckDB Labs

Prof. Dr. Hannes Mühleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs, a consulting company providing services around DuckDB. Hannes is also Professor of Data Engineering at Radboud Universiteit Nijmegen. His' main interest is analytical data management systems.

+ Read More

SUMMARY

DuckDB is an in-process analytical data management system. DuckDB is lightweight yet fast and available under the permissive MIT license. DuckDB can be deployed everywhere, from a smart watch to a big iron server. This flexibility has lead to a plethora of new and exciting data architectures, for example on-device processing , SQL lambdas, efficient large-scale pipelines, in-browser SQL, and more. In this talk, Hannes will give an overview of architectures observed in the wild and some ideas on what would be possible.

+ Read More

TRANSCRIPT

Jono Vondono [00:00:22]: Hey, my name is Jono Vondono, and I've been birding for the past 35 years. Some people say it's in my DNA. I don't know. You know, my favorite thing to do on the weekend is just go out with my trusted notebook and spend some time in nature listening to the birds, trying to catch a glimpse of some of these beauties. But for the last 15 years, I've been looking for the ever-elusive mother duck, and she's been evading me. It's been hard. So today we're gonna try and go look for her. Come.

Jono Vondono [00:01:01]: Come with me.

Demetrios [00:01:02]: Let's go do it. And we're back. Where is that mother duck? I really want to know. I really want to know. So, sir, we're a little behind on time, because I wasn't planning on sharing that, but I'm gonna leave it to you. I really appreciate you coming on here and doing this. As you know, I'm a huge fan of the duck DB movement and especially mother duck. I'll share your screen, and then I'll be back in just a little bit.

Hannes Mühleisen [00:02:03]: Very good. All right, let me just click the button here so I can get this going. This looks good to you? Yes.

Demetrios [00:02:12]: Oh, yeah.

Hannes Mühleisen [00:02:13]: Very good. Very minimal. Yes. Hello, everybody. Welcome to this virtual thing. My name is Hannes Muller Eisen, and I'm going to talk about going beyond the two tier data architecture with DuckDB. And I want to thank you again for the invitation. It's always a pleasure to talk to you.

Hannes Mühleisen [00:02:34]: So a little bit about me. I am the co founder and CEO of DuctDB Labs, which is the company behind Ducttv, DuctDB system. It's kind of where most of the contributors are employed. But I also, in my sort of side job, I'm a professor of data engineering. Very fitting. At the university here in the Netherlands, and I teach data engineering to students. So I think it's very fitting. But let me start with duckdb.

Hannes Mühleisen [00:03:03]: And in case you haven't heard about DuckDB, it is a relational analytical data management systems. It's fast, free, open source, all these wonderful things. I'm going to talk a bit more about it in a bit, but in case you just don't know what it does, it's like a SQL system, so you talk SQL to it whether you like it or not. But Duckdb actually comes out of a mission that we are on, and the mission is to actually turn people's general, like to work on people's generally difficult relationship with data. Right. People generally fear data. It's my observation, like, especially when it stops fitting in Excel. We are sort of in fear.

Hannes Mühleisen [00:03:45]: We are gripped by fear that the tools will run out of steam. We are, you know, in fear that we have to rewrite our data platform from like something that's maybe running pandas to something that's maybe running spark. It's all not very great. We are, you know, we are living in fear of the dbas that are the gatekeepers of the data. And I think it's somewhat true, like the tools people commonly use, they can be very unwieldy, they can be very clumsy, and I sometimes feel downright hostile. Right. So I have, I have, I don't know how many times I've stared at a Java stack trace in utter anger. So this is something you want to change.

Hannes Mühleisen [00:04:24]: We want to actually create tooling so people can confidently wrangle very large datasets, very large scale datasets. And we want to build this confidence that dealing with data isn't necessarily something that you have to fear or that datasets are not necessarily hostile. Don't fear that CSV file. And it's interesting because, yeah, it's also something that I've noticed myself, but I'm getting ahead of myself so I'll continue. And in order to instill confidence in people, what we realized is that data management systems had been focusing very much on sort of the meat part of this hamburger here, where really all that data management system builders were thinking about was how to make the join algorithm better or the distribution better or things like that. And really we ignored the end to end user experience. So we actually, we have built a system to write papers maybe, or to, you know, to look good in benchmarks, but we have not necessarily built systems to actually spark joy, to be like easy to use, be easy to ingest data, easy to get data out, so on and so forth. And that's kind of design philosophy behind DuckdB.

Hannes Mühleisen [00:05:42]: So let me talk a little bit more about what that leads to in sort of, if you start with this prior sort of assumptions, what does this lead to in terms of system design? So the first thing is that DuckDB is in process. What does this mean? DuckDB actually doesn't run on a separate database server. It runs whatever process you want it to run in. Very often, for example, people will be running DuckDB in something like Python or talk a bit about various things later. But typically very often you have a Python process, you run duct DB inside that process. That is very cool because not only does that mean that you can actually move data back and forth with that process very easily. But it also means that you don't have to actually set up a server, you don't have to have a docker container, you don't have to upgrade the database file, you don't have to ask for permission to install it. For most people it's just a pip install or a brew install duckdb or install packages or whatever you want to call it, NPm install.

Hannes Mühleisen [00:06:42]: It really doesn't really matter. So this in process architecture is something that we didn't pick at random. It's something that we picked because we wanted to actually build something that's easy to use. And it also has other benefits. More on that in a second. But people think, hey, this is like a small in process thing, maybe like SQLite. So it's like nice to play with, but actually generally shitty once you want to actually do something more serious with it. But this is not true in DuckDb it is batteries included.

Hannes Mühleisen [00:07:14]: We have crazy amount of support for complex SQL. We have transactions, we have persistence. We can read CSV, Parquet, JSON out of the box, no problem. We can connect to cloud storage like s three AWS. We can talk to Azure, we can talk to Google Store, whatever you name it, right? We can read iceberg, Delta, all these things. We can talk to postgres, MySql. We are really well integrated with arrow. And there is clients for all sorts of languages like Python R, Java.

Hannes Mühleisen [00:07:49]: Really a zoo long tail of languages which are wonderful to maintain, I can tell you that. So it is really serious in terms of feature set. It's not just the toy, despite its really trivial deployment model. And Duct DB is also fast. I know everybody says their system is fast, but it is really fast in an important way, which is that it's very good single core efficiency and it can automatically paralyze. DuCtTV will automatically take whatever data transformations that you're doing and will paralyze them across the available cpu cores. But blind parallelization alone isn't good enough, right? Like you can paralyze like a, you know, slow Java implementation across a thousand cores and it won't matter because it's still slow single core. But we managed to do this with a very high efficiency C code for all operators.

Hannes Mühleisen [00:08:41]: We can even use the hard disk if we should run out of memory to complete an operator. And again, all of this happens completely automagically. And as a result you get sort of these wonderful tweets where it says somebody says ductDB is probably the most magical piece of technology in recent years, and we are flattered, but we all know that sufficiently advanced technology is just indistinguishable from magic. And we're not kidding when we're saying this is the culmination of decades of research in analytical database systems design, have been many sort of PhDs given out on this topic. So Doctor B sounds wonderful. So then your next question is like what do you have to pay for it? And the answer is nothing. It's completely free and open source under the MIT license. And so that means that you can just take it, you can put it in your applications, you can build a company on top of it, we don't care.

Hannes Mühleisen [00:09:36]: So that's really nice. And it has been actually getting quite popular. So interesting insight is that what, building this kind of technology like a query engine is actually not possible for even the richest organization. It's super expensive to build your own query processor. And while you think I just need projections or I just need a bit of aggregation, doing it really end to end with all the operators that you can kind of need is quite difficult, as I said, even for the richest organizations. And with DuctTb you kind of get the state of the art of query engines for free. So what do people use duckdb for? Well, there's sort of three main use cases that we can see. The first is kind of what we built duckdb for, which is you just load up duckdb in your python shell.

Hannes Mühleisen [00:10:25]: You know, you start analyzing some data, you run queries you, maybe you use the persistence, you read some parquet files, you write some parquet files, I don't know, this is really interactive and sort of the single player. Great data experience on a laptop is really important for us as a use case because it's what builds the confidence that I mentioned. But we were also really surprised to see people using Dark DB as a component in some really huge enterprise pipeline to transform data for something. I'm give some examples later then. Lastly there is the creative architectures that people came up with. Duckdbhenne for example, people have put Duckdb into lambdas, people have put Duckdb into wasm. And I'll go more about sort of examples about that later. But since this is mlops, I want to just stress a bit more the python integration of Duckdb.

Hannes Mühleisen [00:11:22]: So we do love Duckdb, loves Python. It is the biggest API by download count. We are currently at around 6 million downloads per month, which is wild. And because it's one of our main APIs also most features will come to the Python API first yeah, well they come to c first and then come to Python API and it's really something that we are trying to really make right. And one of the advantages of being in process is you can do things that other people cannot do. So for example, DuckdB runs within the Python process. If you have something like say a pandas dataframe in the same process, we can directly read it because the format isn't rocket science and it's just a pointer in the end. So there is no sockets, no serialization or nonsense like that.

Hannes Mühleisen [00:12:07]: We just understand that in memory layout of these data frames, this can also, the same is also true for duckdB results. We can convert those very efficiently back into pandas data frames. So this is, I think the only database management system that can do this where we directly read these data frames. And we also have some nice tricks that like if you use a table name in a query and the table doesn't exist, but there is a data frame with the name that exists, we will automatically read that. It really means you should try it out. The same is also true for arrow. So both in Python R and R we can directly read and transfer query results back and forth from arrow. We can, you know, can read arrow as a table and we can write it back out as arrow files.

Hannes Mühleisen [00:12:51]: And people use this, for example when they stick duckdb into these pipelines, right? Because, you know, single node analyst doesn't really care maybe about, you know, error integration, but somebody who wants to shove the things that go out of DuctDB into something else might care. Okay, so much sort of a higher level overview of what duckdb is and what it can do. And now I want to get a bit more about, talk a bit more about the, the two tier architecture as promised. So the two tier architecture, well it's, it's something that I think everybody knows really well. It's like we have like a database server somewhere and then there is a bunch of clients. In this case we have three clients. Two are like desktops, browsers and one is like a mobile device and they talk to this database. Really boring.

Hannes Mühleisen [00:13:36]: Interesting, fun fact here is that this actually wasn't always the case. Like before we had reliable networks, it was totally normal to just directly work on database server. But that's just an aside. Since 1984, so we've had client server and sometimes people also have a three tier architecture which is like, okay, it is not like really great to have the clients talk to the database server directly. So we throw in an app server to run node or whatever and now we have a three tier architecture. It's a bit boring. And I know this is like something you have seen on every single slide architecture slide you've ever looked at. Well, of course the data has to come from somewhere.

Hannes Mühleisen [00:14:15]: So what we also typically see is that we have these two little operational databases there on the left that traditionally would write to a data warehouse and then that would run the analytics, and that goes for the observatory clients and so on, so forth. I've tried to simplify this picture. So I've not shown the two or three kafka that people tend to plug between these things, but you get the idea. Recently we have the data lake, or data swamp as I like to call it, really didn't change a whole lot. Instead of the data warehouse, we have a data lake now, and we've also gained a ugly spark cluster to deal with the data. The rest is actually pretty much unchanged, right? We still have the operational databases that somehow have to dump their stuff into the data lake. We still have the app servers that basically read whatever data product that you create with Spark from the data lake, and then the app server still, as in the long, long ago, distributes this down to the clients. So let's have duckdb come in this, let's add duckdb into this mix step by step and with examples.

Hannes Mühleisen [00:15:19]: That's the whole idea. So first, let's put duckdb into the operational servers. And you said why? Hang on, these are operational database servers. They shouldn't be running DuckDB. They should be running, I don't know, postgres or something like that. And I agree. I am not saying you shouldn't be running postgres on these servers. I'm saying you should also run DuckDB.

Hannes Mühleisen [00:15:40]: And just with postgres, we recently released PGDKDB, which is actually a postgres plugin. So you can run DuckDB directly inside the postgres server, which is kind of cool, but why would you do that in the first place? Well, you could, for example, now pre aggregate, pre filter, pre enrich and pre encode your data to parquet files directly before you upload this into data lake. In the past, it would be common that these operating servers maybe spit out, I don't know, a change stream or they regularly dump to something and that gets then transformed by yet another server and then uploads a data lake. But why don't you already do the pre aggregation and pre filtering and all that sort of stuff on that thing with Duckdb? Duckdb can write to data lakes, duckdb can write files no problem. We are very good at aggregation, so we could already do that and thereby really reducing the complexity and also the cost of this whole architecture. Here's an example of Fivetran, which has just been blasted in the previous presentation for being expensive, but they actually use DuckdB in their data lake writer. So if you're using Fivetran and you're writing to Iceberg, this will actually use DuckDB behind the scenes, because it actually turns out that these kind of transformations can be greatly improved in terms of efficiency by just using DuckDB in the process. And in this case, they're using the JDBC driver that we of course also have, because the world still runs Java app servers.

Hannes Mühleisen [00:17:18]: Next, instead of the spark cluster. We've seen people saying, hey, DuckDB actually doesn't really need any sort of state or startup time or anything like that. Like DuctTB takes a few milliseconds to start up. So instead of having a spark cluster there, that's like sitting there and eating money, or maybe a redshift cluster or something like that, but also eating money, we can just use lambdas or like a VM that runs for a while with duckdb on it. And again, it's a millisecond startup time. It doesn't really cost anything to deploy. It's like 20 megabytes or so of binary. And it's a highly efficient implementation.

Hannes Mühleisen [00:17:58]: So we'll actually use less, far less hardware than, than you, than you would previously have. There's an interesting startup called boiling data that actually uses a fleet of lambdas to run crazy SQL queries in parallel. But that's a different discussion. Here's also an example from a company called Okta. They are doing defensive cyber operations, and they're using something like this here. I've stolen their architectural diagrams too. So they are doing, they're ingesting a bunch of data from the left, from various sources, in this case like GitHub or whatever. Write it to s three, run a bunch of lambdas with DuckdB and write that back to their s three buckets so they can do more interesting things later on.

Hannes Mühleisen [00:18:43]: Like, as I said, this whole preprocessing, cleaning up general data engineering tasks. But that's super interesting. So. Okay, so we have that now, by the way. Yeah, the duct DB logo here is pretty small, but I think you can spot it. Next, we can put DuckdB in the app server, right? No need. In the past, we would have this database server on the left here, and often that was a cluster, actually. Now we had in the past we had this wonderful problem that we had to somehow scale the app servers and the database service together and it never quite worked because the load characteristics were so different.

Hannes Mühleisen [00:19:20]: But now nothing keeps you from putting a duckdb in your app server. And that means that now your data transformation on behalf of clients run within the app server requests, that is not talking to the client. So you can do really cool things like for example live websockets that push data as it speeds being retrieved from the database directly to the clients. Or you can react to the clients changing their demands really in real time without having to talk to database service by the way, which you don't have at all. So you get better scaling, you don't get database protocol bullshit, which is a whole different source of pain. Really cool. Here's an example that I found on Reddit. I think here's something.

Hannes Mühleisen [00:20:01]: These are just plots that you have to love, right? Somebody said if your effect is big enough, you don't need statistics to prove that it's a good idea. So I think you would agree with me that this is one of these cases where somebody migrated like the node JS app to use DuckDB instead of SQL Lite directly in the app server. And obviously they got a huge performance boost and a huge reduction in database size. So that's really cool. Finally, we can also put duckdb in desktop apps. We can put DuckDB even in the browser with webassembly. What that gives you is extremely exciting. It gives you video game latency for data transformations in your dashboard.

Hannes Mühleisen [00:20:45]: Because instead of this multi second round trip between your dashboard and a spark backend, that just doesn't happen because the database runs directly in your browser. So you get milliseconds, milliseconds results. You can put duckdb into iOS or Android apps and people do like it's on device processing. It's not an issue. If the network is down, you can still look at your data. And actually since a few weeks we have Dart support for Duct DB. So we even build multi platform apps using Dart with DuctDB. And people do.

Hannes Mühleisen [00:21:17]: I'm going to show some examples as well. Here is our web shell shell dot duckdb.org dot we have Duckdb running in the browser fully as a database system. This is just the shell is kind of showing off. But people really integrate this into interesting apps. I'll show more examples in a bit. Here is the people from Tigereye that are marketing analytics company. They've integrated Duct DB into the native app front ends, as I mentioned using Dart here's hugging face. They use duct DB wasm again the webassembly in browser version to run this dataset viewer here.

Hannes Mühleisen [00:21:53]: So this dataset viewer that they have actually runs in your browser. They also auto convert all the datasets to duckdb files. So you can just grab the duckdb files and happily query off. And just yesterday they actually released a duckdb wasn powered SQL console right on the website. So basically on every hugging face dataset there's now a SQL button. You click it and it will query all the data. It's right there, the yellow button in case you're looking for it. And here is my first attempt at a video in a presentation.

Hannes Mühleisen [00:22:24]: Here you can see what I meant with millisecond response time. Here's something a demo from observable where basically the slider recomputes a whole bunch of SQL and renders it and it goes so fast you don't even notice. So ductDB can be everywhere in your stack, same SQL, same storage format, same capabilities. It's a state of the art processing engine that's now kind of unshackled from DB servers. It's actually a massive paradigm shift. I don't think people have fully gotten this yet. And I also don't think I have fully gotten this yet. Is that instead the database living on like a centralized server or a cluster, it can now live everywhere.

Hannes Mühleisen [00:22:58]: It's only your creativity limits where you can put it. You can put it on a watch, you can put it on a car, you can put it in planes, you can put it in satellites. And NASA has already gotten in touch, so maybe that's happening. It's really exciting. This paradigm shift from the database has to be in a single place to that state of the art query processing engine can just be everywhere. Briefly mentioning motherduck, we've shouted out they're doing actually some cool work in hybrid query processing and coordination between duckDB instances, which normal duckdb doesn't really do. So go check out Madhadak if you're interested. So that's what I've shown is just the beginning.

Hannes Mühleisen [00:23:39]: What kind of people, including myself, could already imagine doing. I would like to see, personally would like to see much more usage in mobile apps because phones are so powerful and it also has great privacy benefits. Or maybe ductdb being put in cars. We also have not seen that yet, but I really would love to. Why not put duct DB in hard disk firmware? I mean, we'll see. But with that I think I'm done just talking about going beyond the two tier data architectures. And I really want to say that we can liberate data processing. You don't have to have centralized data engine anymore and can just put that to be anywhere.

Hannes Mühleisen [00:24:15]: Thank you. Yes.

Demetrios [00:24:19]: Okay. I thoroughly enjoyed this because I specifically asked you for something like this, and I appreciate that you gave it to me. Even so, that was very creative, and I was not expecting it to be so creative. I think the big question that I have, while people are writing stuff in the chat and getting all these questions queued up was the lambda architecture basically was the reason for replacing spark with DuckdB because of the simplicity?

Hannes Mühleisen [00:24:52]: Yeah, I mean, the. Certainly simplicity, but also just, you don't have to have anything running, right. You don't have to wait for vms to go up or a cluster to be at the right size. You just run a lambda. Right. It's not. You can trigger a lambda whenever a new file is dropped on your data lake or something like that. That's what I see people doing sometimes.

Demetrios [00:25:12]: Oh, wow. Okay. All right. And then that last piece that you said, I'm not sure I fully understood it, where it was around taking the query engine and basically putting it anywhere you want.

Hannes Mühleisen [00:25:24]: Yeah, yeah. You can, in the past database, imagine teradata or spark, where do they live? They live on a fixed server or a cluster. They never leave that fixed thing. But now you can just put the query engine wherever you want. You can put it on your watch if you want. Right. And that's, I think, a paradigm shift that I find very exciting.

Demetrios [00:25:46]: Okay, wow. So can you use duckdb with pandas, allowing it to process data bigger than your available memory?

Hannes Mühleisen [00:25:54]: Yes. Well, pandas, well, you can't have it in pandas if it doesn't fit in your memory, because pandas, by definition, has to be in memory by design. But yes, you can process data that's way, way larger than your available memory, no problem. We can stream stuff out of files into files. We have all the operators, like the joins, the aggregates, the sort all that stuff is capable of out of core operations, which, meaning that we can go far beyond the available memory with the input sizes, output sizes and intermediate sizes in data pipelines.

Demetrios [00:26:27]: So apparently the chat doesn't care about questions for your talk. They're just. No, no, they're asking. There's a big debate going on right now if you are actually Sal Goodman or not, if you've changed your identity and career and put on a german accent.

Hannes Mühleisen [00:26:46]: I don't know who that is, so.

Demetrios [00:26:48]: Exactly. That's what somebody who was hiding their identity would say, I like that answer. So how does this compare to polars?

Hannes Mühleisen [00:26:57]: Yeah, so polars is of course a better pandas, but it can't do things like Doctor, you can like, it can't do transactions, it can't do persistence, it can't do wasm. So there's a big difference in what polars can do. So in a way, duckdb is like a superset because we can run queries like polar's can also run queries, but we can also deal with transactions. We can deal with persistence, we can deal with all sorts of other things. And yeah, we have SQL. I think our SQL support is very mature. Where in polars? I think they're just getting started.

Demetrios [00:27:40]: Huh. So the other question, I didn't even realize I'm all small in the corner there.

Hannes Mühleisen [00:27:46]: I mean, I like being big.

Demetrios [00:27:48]: Oh, you're the spotlight. This is it. So let me make myself bigger to just show off a little bit. And there's a really good question that came through here that I think a lot of people have when they first start hearing about or messing around with Duffy. My biggest concern is indeed that you would need to download the whole data set over the Internet. As a data engineer, I deal with huge data sets. Would you agree that DuckDB would not be suited to run big datasets on your local machine since its download speed would be a bottleneck?

Hannes Mühleisen [00:28:22]: There's two answers to this. One is that duckdb? With formats like Parquet, we can actually do partial downloads, so we only query whatever you're using. So that can sometimes really make a huge difference. The other thing is that nobody says you have to have it on your local machine. Lots of people run duckdb on beefy VMs next to s three and have a very good throughput to it. Then we can run directly on a local network, which also works really well.

Demetrios [00:28:48]: Yeah. All right, so now we've got the real DuckDB users coming through in the chat. Do you think there will be more updates for autocomplete to work on Windows natively?

Hannes Mühleisen [00:28:57]: That's a detailed question.

Demetrios [00:28:59]: Exactly. Then wait, wait, wait. Here's another one for you. Before you even think about answering that, is there a timeline for spatial support for PGDuckDB?

Hannes Mühleisen [00:29:09]: PGDB should already be able to use spatial because DuckDB can just install the packages. There might be some sort of detailed binary problems there, but in principle, there is no principle. There's no reason why it shouldn't work. If it doesn't work, it's probably a bug so maybe file it completely. DuckDB, with regards to the autocomplete question, the problem is that the shell on Windows is atrocious and like the autocomplete, requires some gymnastics with escape codes and so our. Well, I'm not sure that's going to be super soon. You might be getting better mileage by just using Windows subsystem for Linux or something like that.

Demetrios [00:29:56]: Excellent. That's, that's the honest answer. Might not be the one you wanted to hear, but it is the honest one. I like it. All right, so the last question that I've got for you, is duckdb more efficient than say spark for data Lake house? Like iceberg?

Hannes Mühleisen [00:30:14]: Yeah, I mean, yeah, I mean, spark itself is problematic in terms of single node efficiency. I think we typically outperform spark like a factor ten or so on a single node and spark takes a lot of parallel cluster nodes to catch up. So it's. Yeah, it is definitely feasible. I mean, the ispec support is a bit in early stages. We're working in it, but it's not entirely there yet. The Delta Lake format is a bit further along because we're working with databricks to improve it.

Demetrios [00:30:47]: Nice. Can duckdb be integrated into legacy programs like SPSs?

Hannes Mühleisen [00:30:53]: Yes, we have an ODBC driver. We have a JDBC driver. So if your thing can speak old school stuff, you can just load in duckdb. The cool thing is that the actual engine code is in the driver. So you can just load the driver and you have the whole thing.

Demetrios [00:31:09]: Oh, exciting. All right, well, mister harness, I appreciate you coming on here and doing this. I saw you before. I don't want to like throw you under the bus or anything, but I could see you before you could see me. And what people don't know is you were getting all amped up. You were doing some stretching before this talk. I thought you doing some of this?

Hannes Mühleisen [00:31:31]: Yeah, yeah, yeah. I mean, I usually do warm up because it helps me.

Demetrios [00:31:36]: Yeah. For all these aggressive questions you've got coming through in the chat, you gotta be able to kung fu the shit out of them. Exactly.

Hannes Mühleisen [00:31:45]: Very good.

Demetrios [00:31:46]: Sweet, man. This was awesome. Thank you for coming.

Hannes Mühleisen [00:31:49]: Have a good day. Bye bye.

Demetrios [00:31:50]: See you.

+ Read More

Sign in or Join the community

Watch More

Real-Time Data Streaming Architectures for Generative AI // Emily Ekdahl // DE4AI

Posted Sep 18, 2024 | Views 942

LLMs and Beyond with Lepton // Yangqing Jia // DE4AI

Posted Sep 18, 2024 | Views 906

Building Data Infrastructure at Scale for AI/ML with Open Data Lakehouses // Vinoth Chandar // DE4AI

Posted Sep 17, 2024 | Views 1.3K