Jovan and Maarten showcase Vaex, an open-source DataFrame library in Python, tailor-made to allow fast, interactive workflows with datasets that are too large to fit in RAM on a single node. Vaex makes this possible by leveraging lazy evaluations, efficient out-of-core algorithms, memory mapping, and computational graphs, all mostly behind the scenes and out of the user's way.
Using data from the New York City YellowCab taxi service comprising 1.1 billion samples and taking up over 100 GB on disk, Jovan and Maarten show how one can conduct an exploratory data analysis, complete with filtering, grouping, calculations of statistics, and interactive visualizations on a single laptop in real-time. Jovan and Maarten also demonstrate how one can automatically build a machine learning pipeline as a by-product of the exploratory data analysis using the computational graphs in Vaex.