Different Ways to Scale Python & Pandas

Name: Different%20Ways%20to%20Scale%20Python%20&%20Pandas
Uploaded: 2022-11-05T12:17:29.314Z

Posted Nov 05, 2022 | Views 443

# Python

# Pandas

# Fugue

# Prefect.io

Kevin Kho

Open Source Community Engineer @ Prefect

Kevin Kho is an Open Source Community Engineer at Prefect, an open-source workflow orchestration management system. Previously, he was a data scientist for four years working in the energy and HR spaces. Outside of work, he is a contributor for Fugue, an abstraction layer for Pandas, Spark, and Dask. He also organizes the Orlando Machine Learning and Data Science Meetup.

+ Read More

SUMMARY

With the volume of data increasing, a lot of data practitioners are needing to migrate existing Python or pandas code to distributed computing frameworks such as Spark and Dask. In this tutorial, we discuss the possible solutions and their specific behaviors. Pandas-like frameworks such as Modin (for Dask) and Koalas (for Spark) offer the promise of a drop-in replacement for Pandas.

Fugue, on the other hand, chooses to deviate away from the Pandas interface. Fugue users instead write minimal additional code to port existing Python and pandas code. To learn the tradeoffs of these approaches, we will learn underlying distributed computing concepts. Attendees will deepen their understanding of distributed computing and understand the pros and cons when evaluating these options.

+ Read More