ADR-0006: No HTTP Batch Prediction Endpoint¶
Status: Accepted Date: 2025-01-01 Deciders: MLOps team
Context¶
During Stage 2 of the portfolio build-out, the question arose whether to expose an
HTTP batch prediction endpoint (e.g. POST /predict/batch) that accepts a list of
match IDs or raw feature payloads and returns a list of predictions in a single
request.
The DVC pipeline already has an offline batch_inference stage that produces a
data/predictions/predictions.parquet file for all scheduled matches.
Results from that stage are accessible per-match via GET /predict/{match_id}.
Decision¶
We will not expose an HTTP batch prediction endpoint.
Rationale¶
| Concern | Detail |
|---|---|
| Latency contract | The serving SLA targets interactive, single-match queries (p95 < 500 ms). Batch processing of dozens of matches would violate this contract and complicate timeout handling. |
| Architectural boundary | The serving layer (src/app/) orchestrates and serves; it does not run bulk computations. Batch inference belongs in the src/pipelines/ layer, orchestrated by DVC/Airflow. |
| Operational simplicity | A dedicated HTTP batch endpoint would require its own rate-limiting, queue management, status polling, and result storage — all of which are already solved by Celery + DVC. |
| Reproducibility | Batch predictions run through DVC are versioned, logged to MLflow, and traceable. An ad-hoc HTTP batch endpoint would bypass this lifecycle. |
| Existing alternative | dvc repro batch_inference (or the Airflow DAG) already produces all scheduled-match predictions. Consumers query individual results via GET /predict/{match_id}. |
Consequences¶
GET /predict/{match_id}remains the only prediction retrieval endpoint.- Bulk access to predictions is achieved by querying multiple IDs in parallel or reading the parquet artifact directly from the data layer.
- If a genuine bulk-serving use case emerges (e.g. odds aggregator integration), it should be evaluated as a separate ADR with explicit latency and reproducibility trade-offs.
docs/status.mdKnown Limitation #2 ("No batch endpoint") is updated from a limitation to a deliberate design decision — see ADR-0006.