With the volume of data increasing, a lot of data practitioners are needing to migrate existing Python or pandas code to distributed computing frameworks such as Spark and Dask. In this tutorial, we discuss the possible solutions and their specific behaviors. Pandas-like frameworks such as Modin (for Dask) and Koalas (for Spark) offer the promise of a drop-in replacement for Pandas.
Fugue, on the other hand, chooses to deviate away from the Pandas interface. Fugue users instead write minimal additional code to port existing Python and pandas code. To learn the tradeoffs of these approaches, we will learn underlying distributed computing concepts. Attendees will deepen their understanding of distributed computing and understand the pros and cons when evaluating these options.