MLOps Community
timezone
+00:00 GMT
SIGN IN
  • Home
  • Events
  • Content
  • Tools
  • Help
Sign In
Sign in or Join the community to continue

Different Ways to Scale Python & Pandas

Posted Nov 05, 2022 | Views 168
# Python
# Pandas
# Fugue
Share
SPEAKER
Kevin Kho
Kevin Kho
Kevin Kho
Open Source Community Engineer @ Prefect

Kevin Kho is an Open Source Community Engineer at Prefect, an open-source workflow orchestration management system. Previously, he was a data scientist for four years working in the energy and HR spaces. Outside of work, he is a contributor for Fugue, an abstraction layer for Pandas, Spark, and Dask. He also organizes the Orlando Machine Learning and Data Science Meetup.

+ Read More

Kevin Kho is an Open Source Community Engineer at Prefect, an open-source workflow orchestration management system. Previously, he was a data scientist for four years working in the energy and HR spaces. Outside of work, he is a contributor for Fugue, an abstraction layer for Pandas, Spark, and Dask. He also organizes the Orlando Machine Learning and Data Science Meetup.

+ Read More
SUMMARY

With the volume of data increasing, a lot of data practitioners are needing to migrate existing Python or pandas code to distributed computing frameworks such as Spark and Dask. In this tutorial, we discuss the possible solutions and their specific behaviors. Pandas-like frameworks such as Modin (for Dask) and Koalas (for Spark) offer the promise of a drop-in replacement for Pandas.

Fugue, on the other hand, chooses to deviate away from the Pandas interface. Fugue users instead write minimal additional code to port existing Python and pandas code. To learn the tradeoffs of these approaches, we will learn underlying distributed computing concepts. Attendees will deepen their understanding of distributed computing and understand the pros and cons when evaluating these options.

+ Read More

Watch More

57:07
Posted Oct 14, 2020 | Views 206
# Presentation
# Coding Workshop