MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Different Ways to Scale Python & Pandas

Posted Nov 05, 2022 | Views 406
# Python
# Pandas
# Fugue
# Prefect.io
Share
speaker
avatar
Kevin Kho
Open Source Community Engineer @ Prefect

Kevin Kho is an Open Source Community Engineer at Prefect, an open-source workflow orchestration management system. Previously, he was a data scientist for four years working in the energy and HR spaces. Outside of work, he is a contributor for Fugue, an abstraction layer for Pandas, Spark, and Dask. He also organizes the Orlando Machine Learning and Data Science Meetup.

+ Read More
SUMMARY

With the volume of data increasing, a lot of data practitioners are needing to migrate existing Python or pandas code to distributed computing frameworks such as Spark and Dask. In this tutorial, we discuss the possible solutions and their specific behaviors. Pandas-like frameworks such as Modin (for Dask) and Koalas (for Spark) offer the promise of a drop-in replacement for Pandas.

Fugue, on the other hand, chooses to deviate away from the Pandas interface. Fugue users instead write minimal additional code to port existing Python and pandas code. To learn the tradeoffs of these approaches, we will learn underlying distributed computing concepts. Attendees will deepen their understanding of distributed computing and understand the pros and cons when evaluating these options.

+ Read More

Watch More

The Daft distributed Python data engine: multimodal data curation at any scale // Jay Chia // DE4AI
Posted Sep 17, 2024 | Views 413
Building a Python-Centric Feature Platform to Power Production AI Applications
Posted Feb 27, 2024 | Views 297
# AI Applications
# Python
# Tecton
Using LLMs to Power Consumer Search at Scale
Posted Jul 21, 2023 | Views 713
# LLM in Production
# Power Consumer Search
# Perplexity AI