Skip to content

ADR-0006: No HTTP Batch Prediction Endpoint

Status: Accepted Date: 2025-01-01 Deciders: MLOps team


Context

During Stage 2 of the portfolio build-out, the question arose whether to expose an HTTP batch prediction endpoint (e.g. POST /predict/batch) that accepts a list of match IDs or raw feature payloads and returns a list of predictions in a single request.

The DVC pipeline already has an offline batch_inference stage that produces a data/predictions/predictions.parquet file for all scheduled matches. Results from that stage are accessible per-match via GET /predict/{match_id}.


Decision

We will not expose an HTTP batch prediction endpoint.


Rationale

Concern Detail
Latency contract The serving SLA targets interactive, single-match queries (p95 < 500 ms). Batch processing of dozens of matches would violate this contract and complicate timeout handling.
Architectural boundary The serving layer (src/app/) orchestrates and serves; it does not run bulk computations. Batch inference belongs in the src/pipelines/ layer, orchestrated by DVC/Airflow.
Operational simplicity A dedicated HTTP batch endpoint would require its own rate-limiting, queue management, status polling, and result storage — all of which are already solved by Celery + DVC.
Reproducibility Batch predictions run through DVC are versioned, logged to MLflow, and traceable. An ad-hoc HTTP batch endpoint would bypass this lifecycle.
Existing alternative dvc repro batch_inference (or the Airflow DAG) already produces all scheduled-match predictions. Consumers query individual results via GET /predict/{match_id}.

Consequences

  • GET /predict/{match_id} remains the only prediction retrieval endpoint.
  • Bulk access to predictions is achieved by querying multiple IDs in parallel or reading the parquet artifact directly from the data layer.
  • If a genuine bulk-serving use case emerges (e.g. odds aggregator integration), it should be evaluated as a separate ADR with explicit latency and reproducibility trade-offs.
  • docs/status.md Known Limitation #2 ("No batch endpoint") is updated from a limitation to a deliberate design decision — see ADR-0006.