Inference API Contract¶
This page is the canonical reference for the inference API surface: all implemented endpoints, their request/response schemas, and error semantics.
For concrete examples see Examples. For model input/output contract see ML: Model Contract.
Authentication¶
All /predict/* endpoints require a valid X-API-Key header (configured via API_KEY env var).
/healthcheck/, /metrics, /monitoring/*, and /livescores/ are unauthenticated.
Implemented endpoints¶
GET /predict/predictions/¶
Bulk read of all precomputed predictions from predictions.parquet (in-memory cache).
Returns both future and historical matches. Feature vectors are omitted — display columns only.
Intended for the Streamlit UI data explorer and diagnostic tooling.
Response — 200 OK — array of objects with fields:
| Field | Type | Description |
|---|---|---|
match_id |
int |
Match identifier |
homeTeamName |
str \| null |
Home team name |
awayTeamName |
str \| null |
Away team name |
startTimeUtc |
str (ISO 8601) |
Match start time |
proba_home |
float |
Home win probability |
proba_draw |
float |
Draw probability |
proba_away |
float |
Away win probability |
predicted_class |
int |
Argmax: 0=Home Win, 1=Draw, 2=Away Win |
predicted_label |
str |
Human-readable label |
is_future |
bool \| null |
True if match has not started yet |
model_stage |
str \| null |
MLflow alias used |
model_run_id |
str \| null |
MLflow run ID for traceability |
GET /predict/precomputed/{match_id}¶
Returns the precomputed prediction for a single match from predictions.parquet.
No Celery task, no MLflow model call at request time — reads from in-memory cache only.
Response — 200 OK
{
"match_id": 99,
"proba_home": 0.58,
"proba_draw": 0.27,
"proba_away": 0.15,
"predicted_class": 0,
"predicted_label": "home_win",
"is_future": false,
"start_time_utc": "2025-05-10T18:00:00+00:00",
"home_team_name": "Arsenal",
"away_team_name": "Chelsea",
"model_run_id": "3f7a1c9d2e4b",
"model_stage": "champion",
"predictions_computed_at": "2025-05-10T12:00:00+00:00"
}
Error responses
| Code | Condition |
|---|---|
404 Not Found |
match_id not in current predictions.parquet |
GET /predict/cards/¶
Returns all precomputed predictions merged with Fonbet 1X2 odds in a single response.
Combines predictions.parquet and fonbet_odds.parquet on match_id.
Each entry contains probabilities, predicted class, 1X2 odds, outcome (if finished), and Fonbet URL.
Served from in-memory cache — no MinIO call at request time. Used by the Streamlit UI.
Response — 200 OK — array of merged match card objects.
GET /predict/region-roi/¶
Returns flat-stake ROI statistics per region from the live-betting simulation.
Data produced by the live-betting DVC pipeline stage and cached in memory (60 s MinIO re-check interval).
Returns an empty list when roi_by_region.csv has not been produced yet.
Response — 200 OK
| Field | Type | Description |
|---|---|---|
region_name |
str \| null |
Region label |
roi_pct |
float \| null |
Flat-stake ROI % |
n_bets |
int |
Number of simulated bets |
hit_rate |
float \| null |
Fraction of winning bets |
region_id |
int \| null |
Internal region identifier |
GET /predict/odds/¶
Returns Fonbet 1X2 odds (odd_home, odd_draw, odd_away) for all matches.
Reads from fonbet_odds.parquet in the data-raw MinIO bucket.
Returns an empty list if the file has not been produced yet.
GET /predict/{match_id}¶
Runs a prediction for a single match synchronously via the Celery ml queue.
Features are read from match_features.parquet (in-memory cache).
Blocks until the Celery predict_match task completes (up to 30 s).
Use ?stage=challenger to target the challenger model.
Response — 200 OK — full prediction result dict (same shape as PredictResponse).
Error responses
| Code | Condition |
|---|---|
404 Not Found |
match_id not in current feature parquet |
400 Bad Request |
Requested stage is not in the loaded set |
504 Gateway Timeout |
Celery worker did not respond within 30 s |
500 Internal Server Error |
Task failed on worker |
GET /predict/model/info¶
Retrieves MLflow model metadata synchronously via the Celery ml queue.
Use ?stage=challenger to query a specific model stage.
Response — 200 OK
{
"model_name": "soccer-match-outcome",
"stage": "champion",
"version": "7",
"run_id": "3f7a1c9d2e4b",
"metrics": {"log_loss": 1.006, "roc_auc_ovr": 0.643},
"params": {"n_estimators": "300"},
"feature_names": ["diff_win_5_mean", "diff_goals_for_5_mean"],
"created_at": "2026-03-15T10:00:00"
}
GET /livescores/¶
Returns matches from PostgreSQL filtered by year/month, ordered by startTimeUtc descending.
Query parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
year |
int |
current year | Calendar year |
month |
int |
— | Month (1–12); omit for full year |
limit |
int |
— | Max rows (1–10000) |
offset |
int |
0 | Pagination offset |
GET /monitoring/drift¶
Returns the latest feature drift summary from reports/drift/latest.json (written by the monitor_drift DVC stage / Airflow DAG). Returns {"drift_score": null} if the report does not exist yet.
Also refreshes the drift_score Prometheus gauge.
GET /monitoring/celery/queues¶
Returns active, scheduled, and reserved task counts across all Celery workers.
GET /monitoring/celery/workers¶
Returns active queues and ping status for all connected workers.
GET /monitoring/task_status/{task_id}¶
Returns current status and result for any Celery task by ID.
Response
result is null while the task is pending.
GET /healthcheck/¶
Liveness probe. Used by Kubernetes liveness checks. Checks database connectivity.
Response — 200 OK
{
"status": "healthy",
"version": "...",
"worker_pid": 42,
"memory_usage_mb": 210.4,
"database": true
}
GET /metrics¶
Prometheus-compatible metrics endpoint. Scraped by the in-cluster Prometheus instance.
Returns plain-text exposition format with 9 counters, histograms, and gauges:
http_requests_total{method, path, status_code}http_request_duration_seconds(histogram)prediction_requests_total{source="sync|async"}prediction_duration_seconds(histogram)inference_duration_seconds(histogram)prediction_confidence{outcome}(histogram)model_info{model_name, version, stage}(gauge)model_registered_at_seconds{model_name}(gauge)model_feature_drift_score(gauge)
Planned endpoints¶
| Endpoint | Status | Notes |
|---|---|---|
POST /predict/batch |
📋 Planned | HTTP batch endpoint; batch parquet exists but no HTTP API yet |
Validation semantics¶
- All requests are validated against Pydantic schemas (
src/app/schemas/predict.py) before any inference logic runs. - Path and query parameters are validated by FastAPI/Pydantic; invalid types return
422 Unprocessable Entity. - Input validation failures are client errors — they are not retried.
Schema boundary¶
Features for on-demand inference (GET /predict/{match_id}) are read server-side from match_features.parquet.
The serving layer does not accept caller-supplied feature dicts.
See ML: Model Contract for the full input/output contract.