Skip to content

Model Observability

Surveys have found that ML teams spend their <1% of time in the production workflow, while a majority of time is spent in the data preparation, model development and model deployment phases. This is mostly due to the challenges that teams face in resolving these issues in production.

Model Performance

The most common problems that plague all teams are around model and data drift performance degradation, and data quality issues.

Taking a model from research to production is hard. These models fail silently. The process can be painstaking and no matter how much you work to make sure your models perform well pre-production, you don't know how they're going to perform in the real world. Utilizing every available tool during production to make sure your models are performing optimally is key.

ML performance tracing is the methodology for pinpointing the source of a model performance problem and mapping back to the underlying data issue causing that problem.

Here are the three most common reasons model performance can drop: - One or more of the features has a data quality issue; - One of more of the features has drifted, or is seeing unexpected values in production; or - There are labeling issues.


What is drift?

Drift is a change in distribution over time. It can be measured for model inputs, outputs, and actuals. Drift can occur because your models have grown stale, bad data is flowing into your model, or even because of adversarial inputs.

Models are not static. They are highly dependent on the data they are trained on. Especially in hyper-growth businesses where data is constantly evolving, accounting for drift is important to ensure your models stay relevant.

Data/Feature drift

Change in the input to the model is almost inevitable, and your model can’t always handle this change gracefully. Some models are resilient to minor changes in input distributions; however, as these distributions stray far from what the model saw in training, performance on the task at hand will suffer. This kind of drift is known as feature drift or data drift.

Data drift (aka feature drift, covariate drift, and input drift) refers to a shift in the statistical properties of the independent variable(s), i.e. a distribution change associated with the inputs of a model.

Concept drift

what would happen if the distribution of the correct answers, the actuals, change? Even if your model is making the same predictions as yesterday, it can make mistakes today! This drift in actuals can cause a regression in your model’s performance and is commonly referred to as concept drift or model drift.

Concept drift is the shift in the statistical properties of the target/dependent variable(s), i.e a change in the actuals. - It signifies a change in relationship between current actuals and actuals from a previous time period. - Concept drift can be:○A gradual change over time○A recurring or cyclical change○A sudden or abrupt change

Upstream Drift

Upstream (or operational data drift) refers to drift caused by changes in the models data pipeline.


Global explainability

Global explainability tells you which features most contributed most to the model’s decisions, global explainability is an average across all predictions. In other words, global explainability lets the model owner determine to what extent each feature contributes to how the model makes its predictions over all of the data.

Local explainability

Local explainability helps answer the question, “for this event, why did the model make this particular decision?.” The level of specificity is an incredibly useful tool in the toolbox for an ML engineer, but it’s important to note that having local explainability in your system does not imply that you have access to global and cohort explainability.

Local explainability is indispensable for getting to the root cause of a particular issue in production. Imagine you just saw that your model has rejected an applicant for a loan and you need to know why this decision was made. Local explainability would help you get to the bottom of which features were most impactful in making this loan rejection.

Cohort explainability

Sometimes you need to understand how a model is making its decisions for a particular subset of your data, also known as a cohort. Cohort explainability is the process of understanding to what degree your model’s features contribute to its predictions over a subset of your data.

Cohort explainability can serve as a helpful tool in this model validation process by helping to explain the differences in how a model is predicting between a cohort where the model is performing well versus a cohort where the model is performing poorly.

Shapley Additive exPlanations

SHAP (Shapley Additive exPlanations) is a method used to break down individual predictions of a complex model. The purpose of SHAP is to compute the contribution of each feature to the prediction in order to identify the impact of each input. The SHAP explanation technique uses principles rooted in cooperative coalitional game theory to compute Shapley values. Much like how cooperative game theory originally looked to identify how the cooperation between groups of players (“coalitions”) contributed to the gain of the coalition overall, the same technique is used in ML to calculate how features contribute to the model’s outcome. In game theory, certain players contribute more to the outcome and in machine learning certain features contribute more to the model’s prediction and therefore have a higher feature importance.