MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Building for Small Data Science Teams

Posted Dec 19, 2021 | Views 841
# Spothero.com
# SpotHero
# ML
Share
speakers
avatar
James Lamb
Sr. Machine Learning Engineer II @ SpotHero

James Lamb is a machine learning engineer at SpotHero, a Chicago-based parking marketplace company. He is a maintainer of LightGBM, a popular machine learning framework from Microsoft Research, and has made many contributions to other open-source data science projects, including XGBoost and prefect. Prior to joining SpotHero, he worked on a managed Dask + Jupyter + Prefect service at Saturn Cloud and as an Industrial IoT Data Scientist at AWS and Uptake. Outside of work, he enjoys going to hip hop shows, watching the Celtics / Red Sox, and watching reality TV (he wouldn’t object to being called “Bravo Trash”).

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

In this conversation, James shares some hard-won lessons on how to effectively use technology to create applications powered by machine learning models. James also talks about how making the "right" architecture decisions is as much about org structure and hiring plans as it is about technological features.

+ Read More
TRANSCRIPT

Quotes

"A split is a combination of feature and a threshold in some way to divide the feature into two different buckets."

"Each time you choose a split, you have to evaluate all of these different combinations of features and thresholds."

"The big innovation in light GBM was to take continuous features and bucket them into histograms."

"Light GBM can be a very intimidating project to work on."

"Right now where we're at, the data science team has so much opportunity to work on problems and they're limited by the physical resources they have access to try experiments and their time."

"We're really focusing for the next year on our ML platform is around enabling experimentation."

"I wanted to be careful about us getting hung up on trying to define how mature our practice is. I feel that exercise can lead to maybe misunderstanding. People feel like they are being criticized."

"Spot hero is a marketplace. It's a consumer-facing company. We make money by people booking parking spots and we get a little cut out of those transactions."

"People are not necessarily using Spot Hero because machine learning is a core part of the product. We are being judicious about where we apply machine learning to make some of our applications better."

"I don't think that there's pressure right now for us to have every button that you click in the Spot Hero app be powered somewhere down the chain by a machine learning model."

"We're not trying to fly through building the tunnel as fast as we can, we're trying to build it in a way that the parts behind us don't fall down as we get further along."

"Airflow is not ideal in the way that it lives for experiment orchestration. Airflow's not made for that. It's made for batch processing."

"I think there can be value in companies writing those sort of middleware that let data scientists that work with the company's data using the domains that the companies applications manage."

"With a small enough team, there's a limit to the proliferation of code that gets written that someone else has to maintain if someone left."

"Another way that we can build empathy and work together is in the way that we've designed the definition of our teams."

"We're an enablement team. Our job is to build self-service technologies whether that's libraries, services, or infrastructure."

+ Read More

Watch More

1:01:40
Building Better Data Teams
Posted Aug 04, 2022 | Views 1.5K
# Data Teams
# Data Tooling
# RN Production
# Financial Times
# Ft.com
Small Data, Big Impact: The Story Behind DuckDB
Posted Jan 09, 2024 | Views 8.4K
# Data Management
# MotherDuck
# DuckDB
Challenges and Opportunities in Building Data Science Solutions with LLMs
Posted Apr 18, 2023 | Views 1.5K
# LLM in Production
# Data Science Solutions
# QuantumBlack
# Rungalileo.io
# Snorkel.ai
# Wandb.ai
# Tecton.ai
# Petuum.com
# mckinsey.com/quantumblack
# Wallaroo.ai
# Union.ai
# Redis.com
# Alphasignal.ai
# Bigbraindaily.com
# Turningpost.com