Service & Infrastructure Metrics (Prometheus)¶
Status: ✅ Operational — GET /metrics endpoint live, 9 metrics exported
Metrics are collected from the FastAPI inference service via src/app/metrics.py
and a _PrometheusMiddleware applied to all requests.
Available at: GET /metrics (Prometheus exposition format)
Exported metrics¶
HTTP layer (✅ live)¶
| Metric | Type | Description |
|---|---|---|
http_requests_total |
Counter | Total HTTP requests by method, path, and status code |
http_request_duration_seconds |
Histogram | End-to-end HTTP request latency by method and path |
Prediction API (✅ live)¶
| Metric | Type | Description |
|---|---|---|
prediction_requests_total |
Counter | On-demand prediction tasks dispatched to the Celery ml queue (source="sync") |
prediction_duration_seconds |
Histogram | End-to-end prediction latency including Celery queue roundtrip (sync path) |
ML worker / model (✅ live)¶
| Metric | Type | Description |
|---|---|---|
inference_duration_seconds |
Histogram | Pure ML inference time inside the Celery worker (excluding queue wait) |
prediction_confidence |
Histogram | Model predicted probability per outcome class (outcome="home_win\|draw\|away_win") |
model_info |
Gauge | Metadata of the currently loaded model; value=1 when loaded (model_name, version, stage labels) |
model_registered_at_seconds |
Gauge | Unix timestamp when the currently loaded model version was last loaded by the worker |
model_feature_drift_score |
Gauge | Evidently dataset drift score (share of drifted features); updated by GET /monitoring/drift |
Celery runtime status is available via REST (not Prometheus-scraped):
- GET /monitoring/celery/queues — per-queue message count
- GET /monitoring/celery/workers — active worker ping status
Not yet implemented¶
- RabbitMQ queue metrics via dedicated exporter
- Kubernetes CPU / memory / pod restarts
- PostgreSQL query latency via pg_exporter
- Log aggregation (stdout only today)
Dashboards¶
Grafana dashboards for these metrics are planned — see Dashboards. Full coverage matrix: Monitoring Status