Monitoring¶
Data Drift¶
Feature-drift and prediction-drift detection using Evidently.
Pure compute module — no IO, no MLflow, no Prometheus. Callers are responsible for loading data and persisting outputs.
Public API¶
compute_drift(reference_df, current_df) -> DriftResult compute_prediction_drift(reference_df, current_df) -> PredictionDriftResult
Note: uses evidently.legacy.* which is the Evidently 0.7.x compat layer.
DriftResult
dataclass
¶
Results of a single feature-drift analysis run.
Attributes¶
drift_score:
Share of features that have drifted
(Evidently share_of_drifted_columns).
Range [0, 1]. Values > 0.2 are considered significant.
n_features:
Total number of features evaluated.
n_drifted:
Number of features whose drift was detected.
html_report:
Full Evidently HTML report as bytes. Suitable for writing to a file.
Source code in src/monitoring/drift.py
PredictionDriftResult
dataclass
¶
Results of a prediction-distribution drift analysis.
Attributes¶
prediction_drift_score: Mean drift score across the three probability columns. n_drifted_cols: Number of probability columns (0–3) where drift was detected. html_report: Full Evidently HTML report as bytes.
Source code in src/monitoring/drift.py
compute_drift(reference_df, current_df, *, stattest_threshold=0.05)
¶
Run Evidently dataset drift analysis on input features.
Parameters¶
reference_df: Baseline feature DataFrame (e.g. a sample from training). current_df: Recent production feature DataFrame to compare against the baseline. stattest_threshold: p-value threshold for per-feature drift tests (default: 0.05).
Returns¶
DriftResult
Source code in src/monitoring/drift.py
compute_prediction_drift(reference_df, current_df, *, stattest_threshold=0.05)
¶
Detect distribution shift in model output probabilities.
Compares the distributions of proba_home, proba_draw, and
proba_away between a reference period and the current window using
Evidently ColumnDriftMetric.
Parameters¶
reference_df:
Predictions DataFrame for the reference period. Must contain
proba_home, proba_draw, proba_away columns.
current_df:
Predictions DataFrame for the current window.
stattest_threshold:
p-value threshold for per-column drift tests (default: 0.05).
Returns¶
PredictionDriftResult
Source code in src/monitoring/drift.py
ML Quality¶
ML quality metrics: log-loss, ECE, and hit-rate.
Pure compute module — no IO, no MLflow, no Prometheus, no side effects. Callers are responsible for loading data and persisting outputs.
Public API¶
compute_ml_quality(y_true, y_proba, label_order) -> MLQualityResult
MLQualityResult
dataclass
¶
Quality metrics for a batch of finished-match predictions.
Attributes¶
n_matches: Number of finished matches in the evaluated window. logloss: Multi-class log-loss (lower is better; random baseline ≈ 1.099). ece: Expected Calibration Error on the predicted (argmax) outcome. Range [0, 1]. Values > 0.05 indicate significant miscalibration. hit_rate: Fraction of matches where the predicted class equals the outcome. hit_rate_home, hit_rate_draw, hit_rate_away: Per-outcome hit rate (correct predictions / total predictions for that class). mean_confidence: Mean maximum probability across all predictions (model's average certainty).
Source code in src/monitoring/ml_quality.py
compute_ml_quality(y_true, y_proba, label_order)
¶
Compute classification quality metrics.
Parameters¶
y_true:
1-D array of ground-truth class labels (integers matching label_order).
y_proba:
2-D array of predicted probabilities, shape (n_samples, n_classes).
Column order must match label_order.
label_order:
List of class integers in the same order as y_proba columns.
For this project: [0, 1, 2] (0=home_win, 1=draw, 2=away_win).
Returns¶
MLQualityResult