Implementation Status¶
This is the canonical source of truth for implementation readiness. All claims in other pages must be consistent with this page.
Last updated: May 24, 2026
v1.0 completed May 24, 2026. See Requirements — Definition of Done and Roadmap — v1.0 Demo Track for scope details.
Legend¶
- ✅ Operational — Implemented, tested, and working in practice
- 🚧 Partial — Partially implemented or requires manual steps
- 📋 Planned — Designed but not yet implemented
Implementation Matrix¶
| Component | Status | Notes |
|---|---|---|
| Data Engineering | ||
| Airflow ETL | ✅ Operational | Scraping + PostgreSQL ingestion |
| MinIO Storage | ✅ Operational | S3-compatible object storage |
| Fonbet raw snapshot collection | ✅ Operational | airflow/dags/etl_odds_fonbet_01_raw.py — Selenoid CDP scrape every 4 h → odds_fonbet/ in MinIO |
| Fonbet match linking | ✅ Operational | airflow/dags/etl_odds_fonbet_02_link.py → src/pipelines/link_fonbet_odds.py — fuzzy 3-layer match → match_links/fonbet_links.parquet |
| Fonbet odds extraction | ✅ Operational | airflow/dags/etl_odds_fonbet_03_odds.py → src/pipelines/fetch_fonbet_odds.py — 1X2 extraction → match_links/fonbet_odds.parquet |
| DVC Versioning | ✅ Operational | Data + model artifacts tracked |
| PostgreSQL | ✅ Operational | Canonical data store |
| ML Pipeline | ||
| Feature Engineering | ✅ Operational | stats_matches.py — time-windowed stats |
| DVC Pipeline | ✅ Operational | dvc.yaml orchestration working |
| MLflow Tracking | ✅ Operational | Experiment logging functional |
| Train/Test Splitting | ✅ Operational | Time-based + CV folds |
| Model Training | ✅ Operational | XGBoost + HGB classifiers active; LogReg is in the DAG but disabled (tuning_logreg.enabled: false); production-scale parameters active (classification.frac=0.01, tuning.n_trials=20); three-stage tuning → select_model → final_train |
| Model Selection | ✅ Operational | select_model stage compares CV log-loss across XGB / HGB tuners; writes data/models/best_model.json |
| Model Registry | 🚧 Partial | Registration automated; promote_model stage gates candidate alias automatically; champion alias requires manual sign-off |
| Model Promotion | 🚧 Partial | promote_model stage promotes to candidate alias if new logloss within tolerance; champion promotion is manual-only |
| Serving | ||
| FastAPI App | ✅ Operational | Routers, middleware, lifespan; CORS allow-list driven by CORS_ALLOWED_ORIGINS (default empty = no cross-origin) |
GET /predict/predictions/ |
✅ Operational | Bulk read from predictions.parquet in-memory cache; display cols only |
GET /predict/precomputed/{match_id} |
✅ Operational | Single-match parquet lookup; no Celery, no MLflow call at request time |
GET /predict/cards/ |
✅ Operational | Predictions merged with Fonbet 1X2 odds; primary Streamlit UI data source |
GET /predict/region-roi/ |
✅ Operational | Flat-stake ROI per region from live-betting simulation |
GET /predict/odds/ |
✅ Operational | Fonbet 1X2 odds from fonbet_odds.parquet |
GET /predict/{match_id} |
✅ Operational | Sync Celery dispatch to ml queue; features from match_features.parquet; 30 s timeout |
GET /predict/model/info |
✅ Operational | MLflow model metadata via sync Celery dispatch |
POST /predict/async/ |
🚧 Partial | Schemas + Celery task + polling endpoint operational; HTTP route not yet registered in src/app/routers/predict.py |
| Request Validation | ✅ Operational | Pydantic schemas in src/app/schemas/predict.py |
| Model Loading | ✅ Operational | Lazy-loaded once per worker process via PredictionService |
| Batch Predictions API | ❌ Out of scope | DVC batch_inference stage exists; HTTP batch endpoint is out of scope by design — see ADR-0006 |
| Streamlit UI | ✅ Operational | src/ui/app/main.py — match list with Fonbet odds, Value bets signal (>5 pp edge), prediction accuracy per match (Pred), dynamic ROI panel (Accuracy / ROI all picks / ROI value bets), Min region ROI slider, filters: Region / Status / Period. Demo disclaimer on every page. |
| Monitoring | ||
| Prometheus Metrics | ✅ Operational | GET /metrics, _PrometheusMiddleware, 9 counters/histograms/gauges; prediction_requests_total label is source="sync" for on-demand requests |
| Service Health | ✅ Operational | GET /healthcheck/, liveness probes, DB connectivity check |
| Celery Queue Stats | ✅ Operational | GET /monitoring/celery/queues, /celery/workers |
| Task Status Polling | ✅ Operational | GET /monitoring/task_status/{task_id} |
| Evidently Drift Detection | ✅ Operational | GET /monitoring/drift — reads reports/drift/latest.json; refreshes model_feature_drift_score Prometheus gauge. Report written by monitor_drift DVC stage / Airflow DAG soccer_ml_monitor_drift_01. |
| ML Quality Monitor | ✅ Operational | soccer_ml_monitor_quality_01 — computes log-loss, ECE, hit-rate on finished matches; writes Prometheus textfile + Evidently HTML report; triggered by soccer_etl_inference_01. |
| Grafana Dashboards | ✅ Operational | Two dashboards deployed: "Soccer — ML Quality & Betting" (tags: ml, monitoring, soccer), "SoccerPredictAI" (tags: mlops, prediction, soccer). |
| Alerting Rules | 📋 Planned | Runbooks documented; rules not deployed in Alertmanager |
| Infrastructure | ||
| Docker Images | ✅ Operational | Multi-stage builds for API + workers |
| K8s Manifests | ✅ Operational | Deployments + Services + ConfigMaps |
| Helm Charts | ✅ Operational | Values + templates parameterized; nginx-ingress rate-limit (rps/burst/connections) configurable via ingress.rateLimit |
| GitLab CI | ✅ Operational | Build + test + deploy pipeline |
| Secrets (SOPS) | ✅ Operational | age encryption for sensitive data |
| Quality | ||
| pytest Framework | ✅ Operational | 564 tests collected (unit, property, service, contract, load); see reports/planning/20260522_test.md for coverage gap matrix |
| Unit Tests | ✅ Operational | tests/unit/ — splitting, schemas, preprocess |
| Property Tests | ✅ Operational | tests/property/ — Hypothesis: features, splitting, metrics |
| Service Tests | ✅ Operational | tests/service/ — prediction service, Celery tasks |
| Contract Tests | ✅ Operational | tests/contract/test_pipeline_contracts.py |
| Load Tests | ✅ Operational | tests/load/locustfile.py — Locust load scenarios |
| Integration Tests | 🚧 Partial | API mock tests; no live MLflow/Celery required in CI |
| Pre-commit Hooks | ✅ Operational | ruff + basic linting |
| Data Validation | ✅ Operational | Great Expectations suites at raw / finished / future / features stages |
Known Limitations¶
- Model promotion —
candidatealias is automated viapromote_modelstage;championalias requires manual sign-off. No alerting on promotion failures. - Batch HTTP endpoint — out of scope by design (ADR-0006). Batch inference runs via the DVC
batch_inferencestage only. - Alerting — Alerting rules documented in runbooks; not deployed in Alertmanager. No
on_failure_callbackon critical Airflow DAGs. - Integration tests — All CI tests use mocks. No live Celery/MLflow dependency in CI.
- Feature store — Features are file-based Parquet; no dedicated online store.
- API authentication —
/predict/*endpoints requireX-API-Keyheader./monitoring/*endpoints are unauthenticated (planned).
See Architecture Limitations for deployment-level constraints.
How to verify¶
# Reproduce the ML pipeline
dvc pull && dvc repro
# Inspect experiments
mlflow ui --port 5001
# Run the test suite
pytest tests/ -q
# Check API health
curl http://localhost:8000/healthcheck/
# Check Prometheus metrics
curl http://localhost:8000/metrics
See Quickstart for the full reproducibility path.
Related¶
- Architecture Overview — system design and layer contracts
- Architecture Roadmap — planned improvements with justification
- Quickstart — reproducible golden path