MLOps Community
timezone
+00:00 GMT
SIGN IN
  • Home
  • Events
  • Content
  • People
  • Messages
  • Channels
  • Help
Sign In
Sign in or Join the community to continue

Building for Small Data Science Teams

Posted Dec 19
Share
SPEAKER
James Lamb
James Lamb
James Lamb
Sr. Machine Learning Engineer II @ SpotHero

James Lamb is a machine learning engineer at SpotHero, a Chicago-based parking marketplace company. He is a maintainer of LightGBM, a popular machine learning framework from Microsoft Research, and has made many contributions to other open-source data science projects, including XGBoost and prefect. Prior to joining SpotHero, he worked on a managed Dask + Jupyter + Prefect service at Saturn Cloud and as an Industrial IoT Data Scientist at AWS and Uptake. Outside of work, he enjoys going to hip hop shows, watching the Celtics / Red Sox, and watching reality TV (he wouldn’t object to being called “Bravo Trash”).

+ Read More

James Lamb is a machine learning engineer at SpotHero, a Chicago-based parking marketplace company. He is a maintainer of LightGBM, a popular machine learning framework from Microsoft Research, and has made many contributions to other open-source data science projects, including XGBoost and prefect. Prior to joining SpotHero, he worked on a managed Dask + Jupyter + Prefect service at Saturn Cloud and as an Industrial IoT Data Scientist at AWS and Uptake. Outside of work, he enjoys going to hip hop shows, watching the Celtics / Red Sox, and watching reality TV (he wouldn’t object to being called “Bravo Trash”).

+ Read More
SUMMARY

In this conversation, James shares some hard-won lessons on how to effectively use technology to create applications powered by machine learning models. James also talks about how making the "right" architecture decisions is as much about org structure and hiring plans as it is about technological features.

+ Read More
TRANSCRIPT

Quotes

"A split is a combination of feature and a threshold in some way to divide the feature into two different buckets."

"Each time you choose a split, you have to evaluate all of these different combinations of features and thresholds."

"The big innovation in light GBM was to take continuous features and bucket them into histograms."

"Light GBM can be a very intimidating project to work on."

"Right now where we're at, the data science team has so much opportunity to work on problems and they're limited by the physical resources they have access to try experiments and their time."

"We're really focusing for the next year on our ML platform is around enabling experimentation."

"I wanted to be careful about us getting hung up on trying to define how mature our practice is. I feel that exercise can lead to maybe misunderstanding. People feel like they are being criticized."

"Spot hero is a marketplace. It's a consumer-facing company. We make money by people booking parking spots and we get a little cut out of those transactions."

"People are not necessarily using Spot Hero because machine learning is a core part of the product. We are being judicious about where we apply machine learning to make some of our applications better."

"I don't think that there's pressure right now for us to have every button that you click in the Spot Hero app be powered somewhere down the chain by a machine learning model."

"We're not trying to fly through building the tunnel as fast as we can, we're trying to build it in a way that the parts behind us don't fall down as we get further along."

"Airflow is not ideal in the way that it lives for experiment orchestration. Airflow's not made for that. It's made for batch processing."

"I think there can be value in companies writing those sort of middleware that let data scientists that work with the company's data using the domains that the companies applications manage."

"With a small enough team, there's a limit to the proliferation of code that gets written that someone else has to maintain if someone left."

"Another way that we can build empathy and work together is in the way that we've designed the definition of our teams."

"We're an enablement team. Our job is to build self-service technologies whether that's libraries, services, or infrastructure."

+ Read More

Watch More

1:01:40
Posted Aug 04 | Views 381
# Data Teams
# Data Tooling
# RN Production
50:35
Posted Mar 10 | Views 251
# Automate Data
# Infrastructure
# Data Science
23:18
Posted Jul 12 | Views 153
# Data Science
# Clean Architecture
# Design Patterns
See more