To be successful with machine learning, you need to do more than just monitor your models at prediction time. You also need to monitor your features and prevent a “garbage in, garbage out” situation. However, it’s extremely hard to detect problems with the data being served to your models. This is especially true for real-time production ML applications like recommender systems or fraud detection systems. In this post, we’ll explore what feature monitoring for real-time machine learning entails and the common obstacles you will face. (Stay tuned for Part 2 where we will dive into how Tecton can help you solve some of these challenges.)
In machine learning, a feature is an input signal to a predictive model. Typically, a feature is a transformation on raw data. While it is important to monitor the raw data that is used to create features, it is even more critical to monitor the feature values after they have been transformed, as this is the data that the model will actually use.
Raw event data is transformed into features. To monitor features, you want to be able to observe and track the feature values post-transformation.
Feature monitoring can be grouped into two classes: Monitoring features at the value/row level, or monitoring the aggregations of features, referred to as either metrics or statistics.
The following are examples of monitoring that can be performed at the value or row level for machine learning features:
The following are examples of monitoring that require aggregations, by looking at many feature values over time:
In addition to monitoring the proper functioning of your production infrastructure, it is crucial to regularly assess your data for potential problems, especially when business decisions are being driven by models.
With platforms like Snowflake and BigQuery, machine learning (ML) teams naturally want to start feature development based on data that already exists within the organization. This data is often produced by the organization’s analysts and business intelligence (BI) teams and can provide valuable insights for ML, especially during development.
However, these upstream teams often have their own goals and aren’t focused on creating reliable data for production ML, leading to the following problems:
Computing reliable feature metrics for real-time machine learning can be a complex task due to the need for data collection at various points within the feature computation pipeline.
In order to accurately monitor features, it is necessary to define and monitor features in at least four different areas:
Each of these areas presents its own challenges and considerations, such as the need to:
Furthermore, defining and applying validation rules in these four contexts can also be challenging. For example, rules must be able to handle different data sources and contexts, such as a Spark-based job or a Go-based feature server. They must also be performant enough, since real-time ML systems require low latency in order to function effectively. Additionally, rules must be able to take some action if a metric value spikes, such as logging an incident or sending alerts.
Operational monitoring solutions are not suitable for monitoring feature data in real-time machine learning. These tools, such as Prometheus, are designed for monitoring infrastructure and do not work well with the short-lived, finite jobs used batch data. Additionally, these tools are purpose-built to monitor systems at processing time and do not allow for control over event time, which is necessary for computing and persisting metrics on historical features.
Other tools like Great Expectations are optimized for data science or notebook-based use cases and are not designed to run in a production setting. As a result, they add unacceptable latency to real-time ML systems and are cumbersome or impossible to integrate into a production stack.
Features can experience drift, both temporally as well as in the form of a skew between training and serving. This can lead to inaccurate predictions and suboptimal performance. Detecting drift is difficult because surface-level metrics and statistics may not reveal the underlying changes in the data distribution. Furthermore, real-world events or shifts in data patterns can trigger false alarms, leading to unnecessary investigations.
Addressing drift is a complex and time-consuming process. Visualizations and statistical algorithms can help identify the factors and data points that are contributing to the shift in feature distribution. Furthermore, it’s often necessary to implement advanced algorithms to reliably detect drift (like concept or feature-level drift detection):
Feature monitoring for real-time machine learning is a crucial and exciting challenge—and we are investing heavily in solving it at Tecton. Stay tuned for the next post in this series, where I’ll write about how you can overcome the common challenges outlined in this post when monitoring features for real-time ML.
If you are interested in learning more about how Tecton is tackling this important problem, please feel free to reach out to us at [email protected]! Or if you’d like to learn more about Tecton’s capabilities, check out the full Tecton demo and Q&A.