When PyTorch meets MLFlow
Written by Artem, Dimi, Laszlo and Paulo
June 25, 2021Written by Artem, Dimi, Laszlo and Paulo.
Premises
This article is part of Engineering Labs series which is a collection of reviews about the corresponding initiative provided by each team. In this case, it is about the YELP Review Classification solution of the TEAM 3.
If you are interested to know more about the initiative and how to join, here you can find all information you need.
HAPPY READING!
Introduction
MLOps Community đ is an open, free and transparent place for MLOps practitioners to collaborate on experiences and best practices around MLOps (DevOps for ML). Engineering Labs Initiative is an educational project, whose first lab had the goal to create an MLOps example including PyTorch and MLflow. We gave it a shot and were one of the two teams (out of four) to finish the project. Now we want to share our experience with you!
Team
- Artem Yushkovsky (@artemlops): MLOps Engineer @ Neu.ro
- Paulo Maia (@paulomaia20): DS @ NILG.AI
- Dimitrios Mangonakis (@dmangonakis): MLE @ Big 4
- Laszlo Sragner (@xLaszlo): Founder @ Hypergolic
Project: Summarising What We Did
Initial task definition was quite open: all teams needed to develop an ML solution using PyTorch for model training and MLflow for model tracking. We all had some more or less deep knowledge in different areas of Machine Learning, from Data Science and underlying math to infrastructure and ML tooling, from DS project management to enterprise system architecture. So the most difficult problem for us was to choose a dataset đ. At the end, we chose to use Yelp Review dataset for training an NLP model for classifying provided texts as positive or negative reviews. The data includes reviews on restaurants, museums, hospitals, etc., and the number of stars associated with this review (0â5). We modelled this task as a binary classification problem: determining whether the review is positive (has >=3 stars) or negative (otherwise).
đ From the MLOps perspective, there were several âstagesâ of the project evolution. First, we came up with a way of deploying the MLflow server on GCP and exposing it publicly. Also, we developed a nice Web UI where the user can write a review text and specify whether he or she considers this review to be positive or not, and then get the modelâs response along with the statistics over all past requests. Having a Web UI talking to the model via REST API allowed us to decouple front-end and back-end and parallelise the development. Also, in order to decouple the logic of collecting model inference statistics to a database from the inference itself, we decided to implement a Model Proxy service with database access, and a Model Server exposing the model via a REST API. Thus, the Model Server could be seamlessly upgraded and replicated, if necessary. For the automatic model upgrade, we implemented another service called Model Operator, which constantly polls the state of model registry in MLflow and, if the release model version has changed, it automatically re-deploys the Model Server.
Pipeline
đ So at the end, we managed to build a pipeline with the following properties:
- partial reproducibility: manually triggered model training pipeline running in a remote environment,
- model tracking: all model training metadata and artifacts are stored in MLflow model registry deployed in GCP and exposed to the outside world,
- model serving: horizontally scalable REST API microservice for model inference balanced by a REST API proxy microservice that stores and serves some inference metadata,
- automatic model deployment: model server gets automatically re-deployed once the user changes modelâs tag in MLflow model registry.
đ˘ Unfortunately, we didnât have time to close the model development cycle. Namely, we didnât implement:
- immutabletraining environment: training docker image is built once and used always,
- no code versioning: we use code as a snapshot, without involving a SVC,
- nodata versioning: we use dataset snapshot,
- no model lineage: can be done only if with code and data versioning,
- noGitOps: automatically re-training the model once input has changed (code, data or parameters),
- no model testing before deployment,
- no model monitoring and alerts (no hardware characteristics, health checks, data drift detection),
- no fancy ML tools (hyperparameter tuning, model explainability tools, etc.),
- no business logic features required for production (HTTPS, authentication & authorization, etc)
Our Engineering Labâs Solution
MLOps Architecture
Hereâs the bird-eye view on the architecture we came up with:
Note: For a complete walkthrough, check out our 13-minute presentation at Pie & AI Meetup.
As you can see, in our MLOps system, MLflow plays the central role linking all other components (training, tracking, deploying) together. On the scheme above, the green rectangles represent services implemented by our team, and orange rectangles represent third-party services. Rectangles with yellow borders depict services that are exposed publicly. Blue area indicates Google Cloud and the Kubernetes cluster deployed in it. The light grey area indicates the outside world, and the dark grey areas indicate the Streamlit Sharing hosting environment (left) and model training environment (right). Below we will briefly discuss each component of this system.
Infrastructure
[Github]
Initially, we created a Kubernetes cluster on GCP. There, we deployed the MLflow server via publicly available helm charts backed by a managed PostgreSQL database as a backend store and GCS bucket as an artifact store. The MLflow service was exposed via a LoadBalancer service that provided a public IP.
In this part, we spent some time trying to pass bucket credentials to the MLflow server, and when we finally succeeded, it appeared that it doesnât need it: when you train a model and your Python code runs mlflow.log_model(âŚ) to upload the model binary and its metadata, your code accesses the artifact store (GCS bucket) directly, so itâs your code that must have credentials to it, not the MLflow server.
Developing the model
[Github]
Following the Torch text tutorial, we implemented a model consisting of 2 layers: EmbeddingBag and a linear layer, followed by a sigmoid activation function (using PyTorch Lightning). Here, as it usually happens in Data Science, we faced some problems with PyTorch and pickle in making the model saving process smoother
Model Training and Experiment Tracking
[Github: jupyter, training scripts]
We used MLflow for model and experiment tracking. The model artifacts are stored in the GCS bucket, while the experiment metadata (parameters, metrics, etc.) are stored in PostgreSQL. With MLflow, you can save your training artifacts and access experiment parameters via its web interface:
Model Serving
Following the common deployment pattern, we decided to deploy the model as a REST API endpoint. Indeed, a standalone deployment can be easily scaled up horizontally in high demand, and even scaled down to zero for sake of saving GPU resources.
So the first thing we tried out was MLflow Model Serving, which, unfortunately, we failed to implement (we found the documentation rather vague and difficult to understand, and also we discovered only one example on the Internet). We also were considering Seldon to solve this task for us, but we found that the initial setup, which involves configuring service mesh, is too complicated for our POC, so we decided to implement our own REST service â at the end of the day, itâs not that difficult.
This service is based on FastAPI (basically, it is a Flask or Django alternative for REST with many cool perks like enforced REST principles or Swagger UI going out of the box). The service loads the PyTorch pickled model from the GCS bucket and serves it via a simple REST API. This service runs in Kubernetes as a single-replica deployment with a service providing load balancing with a static internal IP. The deployment has init containers to load the code from Git and model artifacts from MLflow via mlflow CLI.
This model server is backed by a model proxy, which implements some business-logic such as storing predictions results to a PostgreSQL database and calculating statistics of the model correctness rate:
Though this service does not implement any kind of authentication, and though its statistics calculation is rather straightforward (also, we would consider a distributed logging system based on ELK stack a better solution for adding business-level metadata to the model server), it serves the demo purposes well.
Model Operator
[Github]
The idea was simple: we wanted to synchronise the state of the Model Server deployed in Kubernetes with the state of MLflow Model Registry. In other words, we want a new model being rolled out once the user changes the tag âProductionâ for a model. Meet the Model Operator service! It follows the Kubernetes Operator pattern, and constantly polls the MLflow server to see which model has a Production tag. Once this tag has changed, it updates the deployment in Kubernetes.
For example, we want to deploy the recently trained model of version 10. In MLflow UI, we change its Stageto Production:
In a few seconds, the Model Operator notices the changes and modifies the deployment Kubernetes resource for Model Server and in a few minutes it gets re-deployed.
The current solution has a big drawback: it disables the served model for a few minutes during the redeployment process. It can be solved by rolling out the deployment instead of recreating it, but we faced the problem that sometimes the deployment wasnât actually recreated, so for this POC project we decided to delete and create it explicitly, however, Kubernetes Rolling Update would have worked better.
Web UI
[Github]
The main user-facing point is the small Web UI talking to the model-proxy only. The user writes a review text and specifies if he or she thinks this was a positive or a negative review. The Web UI then makes an inference request to the Model Proxy, and also it asks it the overall statistics of the model correctness.
This Web UI is written with the cool tool Streamlit in somewhat 50 lines of Python code! It is deployed on the Streamlit Sharing hosting and works pretty well.
Final Considerations
đĽ Participating in MLOps Engineering Lab 1 gave us an awesome opportunity to form a team with guys from all over the world, all with different backgrounds and experience, and work together on a new project! Below we list things we learned during the work on the project.
Technical takeaways
Though PyTorch (and Pytorch Lightning) is great and has tons of tutorials and examples, pickle for Deep Learning is still a pain. You need to dance around it for a while to save and load the model. We hope that the world will eventually come to a standardised solution with an easy UX for this process.
MLflow is an awesome tool for tracking your model development progress and storing model artifacts.
- It can be easily deployed in Kubernetes and has a nice minimalistic and intuitive interface.
- Though we couldnât find any good solution for authentication and role-based access control, this went out of the project scope.
- We also found MLflow Model Serving too difficult to run in a few hours, mostly because of the lack of clear documentation.
- In addition, we were surprised that we couldnât find a solution for automatically deploying the model that gets the âProductionâ tag in MLflow UI. Is this a viable pattern, to deploy models directly from the MLflow Server dashboard? Could this microservice be a good addition to âMLflow core functionalityâ?
Kubernetes is amazing! Itâs terrifying at first, but terrific after a while. It enables you to deploy, scale, connect, persist your apps easily in a very clear and transparent way. However, we found it difficult to parametrize bare Kubernetes resource definitions (without using helm charts). We needed to pass a single or a few parameters to the yaml definition before applying it, and here are the ways we know how to tackle this problem:
- Pack the set of k8s configuration files into a Helm chart (or use alternatives of Helm like kubegen). This is a jedi way to manage complex deployments as it gives you full flexibility, but it takes time to implement.
- Use k8s resource ConfigMap to configure other resources. This approach is very easy to implement (just add a resource configuration), but is not flexible enough (for example, you canât parametrize containerâs image). However, we used it for parametrizing the Model Server configuration.
- Another, the most âdirtyâ way to solve this problem, is by using envsubst utility. Briefly, you process your configuration yaml with a tool that syntactically replaces all entrances of specified environment variables with their actual values (see example for Model Operator). Any other sed-like tool would work here as well.
Self-management takeaways
Looking back, we can say that our team suffered from lack of communication: we started discussing system design not having a single call to meet each other and understand each otherâs feedback and wishes; we didnât define a clear MVP and didnât have a common understanding of whatâs the final goal. Nevertheless, we have learned many important truths in collaboration and project planning, namely:
- Do not try to over-plan the project from the beginning (each step in the project plan at the beginning should cover a large piece of responsibility, rather than be specific),
- Use an iterative approach (define a clear MVP and the steps to achieve it, and then distribute tasks among the team members),
- Respect project timing (avoid the situations where you have to write code during the last night before the deadline). This is especially hard in teams working in their free time, post-work!
Ciao
We would like to thank the MLOps community for the awesome atmosphere and cool insights every day! Specifically, we would like to thank the organisers (Ivan Nardini and Demetrios Brinkmann) of the MLOps Engineering Labs initiative for this cool opportunity to work together! đ
Weâre looking forward to joining the second round of Labs and applying the knowledge we acquired during the first round. Thanks to all and see you again! đ