ADR-0006: No HTTP Batch Prediction Endpoint¶

Status: Accepted Date: 2025-01-01 Deciders: MLOps team

Context¶

During Stage 2 of the portfolio build-out, the question arose whether to expose an HTTP batch prediction endpoint (e.g. POST /predict/batch) that accepts a list of match IDs or raw feature payloads and returns a list of predictions in a single request.

The DVC pipeline already has an offline batch_inference stage that produces a data/predictions/predictions.parquet file for all scheduled matches. Results from that stage are accessible per-match via GET /predict/{match_id}.

Decision¶

We will not expose an HTTP batch prediction endpoint.

Rationale¶

Concern	Detail
Latency contract	The serving SLA targets interactive, single-match queries (p95 < 500 ms). Batch processing of dozens of matches would violate this contract and complicate timeout handling.
Architectural boundary	The serving layer (`src/app/`) orchestrates and serves; it does not run bulk computations. Batch inference belongs in the `src/pipelines/` layer, orchestrated by DVC/Airflow.
Operational simplicity	A dedicated HTTP batch endpoint would require its own rate-limiting, queue management, status polling, and result storage — all of which are already solved by Celery + DVC.
Reproducibility	Batch predictions run through DVC are versioned, logged to MLflow, and traceable. An ad-hoc HTTP batch endpoint would bypass this lifecycle.
Existing alternative	`dvc repro batch_inference` (or the Airflow DAG) already produces all scheduled-match predictions. Consumers query individual results via `GET /predict/{match_id}`.

Consequences¶

GET /predict/{match_id} remains the only prediction retrieval endpoint.
Bulk access to predictions is achieved by querying multiple IDs in parallel or reading the parquet artifact directly from the data layer.
If a genuine bulk-serving use case emerges (e.g. odds aggregator integration), it should be evaluated as a separate ADR with explicit latency and reproducibility trade-offs.
docs/status.md Known Limitation #2 ("No batch endpoint") is updated from a limitation to a deliberate design decision — see ADR-0006.