💡 This is a project for the MLOps Community to fully understand what different people do at their jobs. We want to find out what your day-to-day looks like.
From the most granular to the most mundane, please tell us everything! This is our chance to bring clarity around the different parts of MLOps ranging from big companies to small start-ups.
Today we shine the spotlight on Mr. James Lamb Machine Learning Extrodinar and LightGBM maintainer. You can find the first one of the series with Frata here.
Name: James Lamb (https://github.com/jameslamb)
Official Title: Staff Machine Learning Engineer
Company: SpotHero
Years in the game : 17 years since my first paycheck, 10 years working on computers with data, 8 years full-time
Years specifically working on ML: 6
Direct reports: 0
Luck, privilege, and geography contributed to me learning about timeseries data, which set me up to get a job on an amazing data science team at an IIoT startup, which gave me the core skills and self-confidence to pursue a career in data science and software engineering.
The Whole Story
Ok it was a long journey….stay with me here.
When I went to college, I majored in marketing in a business school. I wanted to work in marketing for a record label. At the school I was at, you could earn a double major by just taking one additional class. I was very fortunate to go to a high school that offered Advanced Placement (AP) courses, so I had 3 Economics credits already. Out of pure laziness, I decided to use those 3 credits to pick up a second major in economics.
Going into senior year of college, I thought “I like school, I’m gonna do grad school. I’m a busy businessperson (or at least want to be), so guess I’ll do an MBA”. One of my advisors told me “don’t do an MBA, it’s not really for you…you don’t have enough professional experience”. She then told me that I was luckily at one of a small handful of schools that offered a terminal masters degree in Economics. So I did that!
In that Masters in Applied Economics, I learned enough math and statistics to be dangerous, and got experience applying the theory to real-world datasets. My thesis was a time-series forecasting project, applying some flavors of the ARCH/GARCH models popular in finance to grocery store sales. This was a super repetitive process….fit, predict, store predictions, add another week of data, re-fit, predict, store predictions…etc.. I suspected that I could go a lot faster and use a lot more data if I could just write code, but I was afraid to devote the time to learning. I thought “at least with this button-clicking and Excel formulas, I know I’ll finish in time”.
As soon as I finished that degree, I started taking online courses to learn how to code. I took every language tutorial on Codecademy (link) and a handful of R courses on edX. I think the thing that permanently put me on a different career path was the Data Science Specialization from Johns Hopkins University, on Coursera (link).
I did those online courses on nights and weekends for about a year. I started them with a narrow interest in learning how to code in R, because I wanted to do more ambitious economics work like the “Atlas of Economic Complexity” (link), “Santa Fe Artificial Stock Exchange” (link), and Raj Chetty’s various projects on economic mobility (link). But besides R, those courses introduced me to core skills that I’d need to do professional applied statistics (”data science”). In those courses, I created my first git
repo, wrote my first function, built my first interactive dashboard, and trained my first tree-based model.
By 2016, I had a few years of professional experience doing data work in economics jobs, and knew enough machine learning and R to have a shot at entry-level data science jobs. Luckily for me, a startup in my hometown of Chicago got a big round of funding and started hiring data scientists with experience in time-series data (see this profile in Inc.).
I started at Uptake in July 2016 and over my 3+ years there, I worked with literally dozens of world-class engineers and data scientists. I learned SO MUCH in my time there. That job gave me my first experiences putting machine learning models into production, writing code and services intended to be used by other people, and communicating data science concepts to non-data-scientist audiences.
During that time, I also published my first open source project uptasticsearch
, (co-authored with Austin Dickey and Nick Paras), was invited to join as a maintainer on a high-profile machine learning project (LightGBM), and earned a Masters Degree in Data Science. I left that job with the raw skills to do machine learning engineering work, and a focus on getting better at it.
Since then, I’ve worked as a data science consultant, backend engineer, and machine learning engineer. I’ve worked in the world’s largest companies and at a startup with less than 15 employees. Throughout that time, I’ve loved contributing to open source projects, speaking about data science at conferences/meetups, and helping other people start their machine learning careers.
There’s a high level of trust and autonomy in the Engineering organization I’m part of…I can see every line of code in the company and provision any infrastructure I want. It’s awesome to be able to go from “hey I think we should do {X}
” to working on {X}
in weeks, not years. And to be clear, I don’t mean “we’re a small overworked group of engineers YOLO-ing things up into k8s and praying it works”. There’s a strong culture of code review, testing, and monitoring guiding everything we do.
As weird as it sounds, I also really love the simplicity of SpotHero’s business model. We’re a marketplace where people buy and sell parking. That’s it. Whether I’m working on data pipelines to power marketing automation, upgrading to a new major version of a backend Python library, or trying to cut 30 seconds out of a CI pipeline, I can always tie what I’m doing back to how it helps our business. That hasn’t been true at every job I’ve worked.
To be clear, I don’t mean we’re a small overworked group of engineers YOLO-ing things up into k8s and praying it works.
In my role as an open source maintainer, I love getting to learn about random technology via issues, feature requests, and the research process involved in debugging. Like did you know shared libraries (e.g. a .so
) have an optional property called RPATH that can be used to tell library-loading code to look in an alternative directory for dynamically-linked libraries? (click here for more details than you could ever want) Or did you know that conda
patches many popular Python libraries to make them work differently when installed via conda
? If you have conda
in your development environment, try looking at some of the files returned by this:
I don’t know if I would have learned and internalized all this random stuff on my own, but learning them along the way while working on a specific task like investigating a bug has been really fun.
There’s a solid amount of stuff in my current company’s environment that‘s like “thing someone created manually 3 years ago and everyone involved with it is gone and no one left knows whether or not we can delete it”. I love doing maintenance work (seriously!), but the investigations involved in trying to understand those old things can be draining. And it’s definitely frustrating when your focus is broken by an on-call alert related to something you’ve never heard of, which you eventually learn is old and unnecessary.
In my role as an open source maintainer, it’s frustrating to deal with people who aren’t respectful of my time.
In my role as an open source maintainer, it’s frustrating to deal with people who aren’t respectful of my time. For example, I once received a “bug report” on a project I maintain that was just titled "Doesn't work on AWS"
with no other information. Having to say over and over again “what version are you using? can you provide a reproducible example showing what you tried? what operating system are you on? can you share any logs?” is exhausting. I actually almost quit open source recently after a few particularly bad months dealing with unhelpful and rude people, but “Working in Public” by Nadia Eghbal (link) helped me develop a healthier, more sustainable approach to that work.
SpotHero is a parking marketplace. People who operate parking facilities (which can be as small as a single spot behind an apartment and as large as a downtown parking garage) list parking spots with us. Drivers can reserve them ahead of time on our website, mobile app, or over the phone. We deal with all the integrations work, like generating a QR code in the app that drivers can scan to make the gate at a garage open.
At SpotHero, the roles “data engineer” and “machine learning engineer” coexist on the Data Engineering team.
We own a mix of things:
*”low-code” = YAML 😛
My team also has a consulting role within the company. We answer other teams’ questions like “how should we make this application data available to analysts?” and “how should we store predictions from this model?”.
As of this writing, I’m not sure what I’ll be working on at SpotHero for the next 6 months.
In open source world, I’m planning to focus on:
hamilton
(https://github.com/stitchfix/hamilton)As a Machine Learning Engineer at SpotHero:
Google
→ trying to figure out how to make software workPython
→ for many things (click here to see my talk “Every Way SpotHero Uses Python”)numpy
, pandas
, psycopg2
, requests
make
→ gluing together development commands (e.g. running tests, building images, etc.)docker
→ for distributing software as self-contained environmentshelm
(link) → modifying Kubernetes
resourcesk3d
(link) → running Kubernetes
on my laptop to test thingsasdf-vm
(link) → installing and using multiple different versions of CLIsgit
+ GitHub → collaborating on code changesAmazon Redshift
→ storing small-to-medium-sized tabular datasets and querying them with SQLTrino
(link) → reading large datasets stored in Apache Parquet
files in cloud object storageDatabricks Container Services
(link) → running JupyterLab
in containers on arbitrary-sized Amazon EC2
instancessops
(link) → storing secrets in source controlPrometheus
→ collecting operational metrics + generating alerts based on those metricsGrafana
→ creating plots of metrics, and dashboards of those plotsApache Airflow
→ executing scheduled workloads and providing operational control over them (e.g. aggregating logs, visualizing historical runs, triggering alerts on job failures)As an open source contributor / maintainer:
Google
→ trying to figure out how to make software workCMake
→ compiling a C++ library with many combinations of operating system, architecture, compiler, and library featuresR
→ I contribute to a couple of libraries written in this language{data.table}
, {jsonlite}
, {lintr}
, {Matrix}
, {testthat}
Python
→ I contribute to a couple of libraries written in this languagedask
, numpy
, pandas
, pytest
, scipy
GitHub Actions
+ Appveyor
+ Azure DevOps
→ continuous integrationreadthedocs
→ deploying and hosting documentationdocker
→ for distributing software as self-contained environmentsAs a machine learning engineer at SpotHero, my responsibilities are:
😬 If this all sounds kind of vague, see “What do your days consist of?” section below for specific examples.
As a staff-level engineer, I have some additional responsibilities, like:
As an open source maintainer, my responsibilities are:
As a machine learning engineer at SpotHero, my time roughly breaks down as follows.
20% – recurring meetings
10% – non-recurring meetings
5% – free-use time for learning and experimenting
Once every two weeks, every engineer at SpotHero participates in something called “Discovery Day”.
This is a half day dedicated to working on whatever you want, and it doesn’t have to be SpotHero-related. I’ve used it for things like:
30% – writing
Writing is a really important part of my job, and something I put a lot of energy into. My primary value to SpotHero is my knowledge and ideas… given clear descriptions of those things, anyone could turn those ideas into software.
Some types of writing I produce in this job:
15% – reviewing others’ work
My team uses pull requests on GitHub as a way to get asynchronous feedback on code. We also use asynchronous comments on written proposals similar to architecture decision records (link) as a way to make larger design decisions.
As a result, a significant portion of my time is spent reviewing these outputs from other engineers and providing suggestions. I’d love to write a whole blog post or give a lightning talk some day about how to be effective in this important type of work…but I’ll spare you all for now.
10% – support
Support takes many forms on the Data Engineering team at SpotHero.
It includes the following activities:
10% – writing code
“writing code” takes many forms in my current role.
Some representative examples:
git-sync
(link) to read job configuration files from a git
repositoryDockerfile
to distribute that code as a container image.My time as an open source maintainer roughly breaks down as follows.
The only metric that I ever manually review reports on or make quarterly goals about improving is cloud cost.
Otherwise, my team currently just focuses on providing capabilities and ensuring that all the things we say work continue to work.
My team basically owns two types of software:
For those systems, we capture the metrics necessary to detect problems (e.g. CPU utilization, memory usage, disk usage, lag on Kafka topics), and use those metrics to generate alerts that automatically create incidents for on-call engineers on the team. We react to those incidents and alerts as they happen, but don’t typically review the metrics proactively or try to tie them to business value.
For pipelines, the business value of having the pipelines and the requirements for them to be considered “working” are presented to my team by other teams asking us to implement and maintain them. From the moment we agree to build and maintain them, the only metrics we follow are those necessary to meet the requirements.
I’m not saying that this is an ideal state to be in, but it is where we’re at right now.
Here are the clickbait titles to real stories from my ML past. I won’t write out the full stories here, but could be fun in a future blog post!
pip install pandas
no longer downloads any Powerpoint files”Paste as Values
“%
“if
statement was through two layers of transpilation used to produce a custom Kafka stream-processing app and dedicated output topic for model results. And it made sense!”In the areas of machine learning and computers, I really admire the people who take time to make complicated topics accessible to wide audiences, who are genuinely knowledgable and talented practitioners, and who demonstrate patience and empathy in their in-person and online interactions with the people using their software.
Here’s a short list of those people that I follow:
In general, like just in life, I tend to admire people who:
5 years from now, I want to be doing less implementation work and more architecture / design work, and I’d like to be doing that on systems that include machine learning workloads.
I’m not the most talented programmer or statistician, but I do think I have the qualities required to design large systems and the interactions between different systems, like:
No one should pay me to re-write some Java services into Rust or design a large-scale experiment, but I do feel confident proposing a design for something like “how should batch re-training of machine learning models on sensitive data be performed?” or “how should the company store and provide access to container images?”.
I really like doing that work. I enjoy the challenge of breaking a large, important, ill-defined problem down into more manageable pieces and some criteria for choosing between different options.
If you enjoyed this you might also like our newsletter where we give a round-up of all the good stuff happening in the MLOps Community. Subscribe here.