Serving¶
Application Entry-point¶
lifespan(app)
async
¶
FastAPI lifespan: create DB tables and warm up the connection pool.
Source code in src/app/main.py
prometheus_metrics()
¶
Prometheus scrape endpoint.
Uses MultiProcessCollector when PROMETHEUS_MULTIPROC_DIR is set
(required for Gunicorn multi-worker deployments).
Source code in src/app/main.py
root()
async
¶
Return service name and links to the docs and health endpoints.
Database¶
Dependencies¶
get_token_header(x_api_key=None)
async
¶
Validate the X-API-Key request header.
Uses hmac.compare_digest for constant-time comparison to prevent
timing-based key enumeration attacks. Returns 401 when the header is
absent or does not match the configured secret (FASTAPI_HEADER_TOKEN).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_api_key
|
Annotated[str | None, Header(alias=x - api - key)]
|
Value of the |
None
|
Source code in src/app/dependencies.py
get_query_token(token=None)
async
¶
Validate the token query parameter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
str | None
|
Value of the |
None
|
Source code in src/app/dependencies.py
Routers¶
list_predictions(pred_lookup, future_only=Query(default=False, description='Return only future matches'), limit=Query(default=None, ge=1, le=5000, description='Max rows to return'), offset=Query(default=0, ge=0, description='Rows to skip'))
¶
Return rows from the in-memory predictions.parquet cache.
Includes both future and historical matches unless future_only is set. Only a fixed set of display columns is returned — feature vectors are omitted. Intended for the Data Explorer UI and diagnostic tooling.
Source code in src/app/routers/predict.py
predict_precomputed(match_id, pred_lookup)
async
¶
Return the precomputed prediction for a match from predictions.parquet.
Predictions are produced by the batch_inference DVC stage which runs
model.predict() over all matches and saves the result to MinIO.
This endpoint reads directly from the in-memory cache — no Celery task,
no MLflow model call at request time.
Returns 404 if the match is not found in the latest batch output.
Source code in src/app/routers/predict.py
list_cards(cards_svc)
¶
Return all precomputed predictions merged with Fonbet odds in one response.
Combines predictions.parquet and fonbet_odds.parquet on match_id.
Each entry contains probabilities, predicted class, 1X2 odds, and a direct
Fonbet URL (populated once the linking pipeline runs with fonbet_sport_id).
Served from the in-memory cache — no MinIO call at request time unless the underlying files have changed. Declared as a sync handler so FastAPI runs it in a thread pool and does not block the uvicorn event loop during MinIO I/O.
Source code in src/app/routers/predict.py
list_region_roi(roi_svc)
¶
Return flat-stake ROI statistics per region.
Data is produced by the live-betting pipeline stage (triggered daily
by the soccer_ml_live_betting_01 Airflow DAG) and cached in memory
with a 60-second MinIO re-check interval. Returns an empty list when
roi_by_region.csv has not been produced yet.
Source code in src/app/routers/predict.py
list_odds(odds_svc)
async
¶
Return Fonbet 1X2 odds (odd_home, odd_draw, odd_away) for all matches.
Reads from fonbet_odds.parquet in the data-raw MinIO bucket.
Returns an empty list if the file has not been produced yet.
Source code in src/app/routers/predict.py
predict_by_match_id(match_id, lookup, stage, request)
¶
Submit a prediction task and wait synchronously for the result.
Features are read from match_features.parquet (in-memory cache).
Returns 200 OK with the full prediction result once the Celery
predict_match task completes (up to _SYNC_TIMEOUT seconds).
Returns 404 if match_id is not in the current batch feature output.
Use ?stage=challenger to target the challenger model.
Source code in src/app/routers/predict.py
model_info(stage, request)
¶
Submit a model-info task and wait synchronously for the result.
Returns 200 OK with MLflow model metadata once the Celery
get_model_info task completes (up to _SYNC_TIMEOUT seconds).
The resolved result matches the ModelInfoResponse schema.
Results are cached in-process for _MODEL_INFO_TTL seconds per stage
to avoid repeatedly dispatching Celery tasks for static metadata.
Source code in src/app/routers/predict.py
get_drift_status()
¶
Return the latest feature drift summary and refresh the Prometheus gauge.
Reads reports/drift/latest.json from the local filesystem first
(DVC-managed runs), then falls back to MinIO (Kubernetes / Airflow runs).
Source code in src/app/routers/monitoring.py
get_queue_stats()
¶
Return active/scheduled/reserved task counts and worker stats.
Source code in src/app/routers/monitoring.py
get_workers()
¶
Return active queues and ping status for all connected workers.
Source code in src/app/routers/monitoring.py
get_task_status(task_id)
¶
Return the current status and result for a Celery task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
Celery task UUID returned when the task was submitted. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dict with |
dict
|
|
Source code in src/app/routers/monitoring.py
list_evidently_reports()
¶
List all Evidently HTML reports stored in MinIO, newest first.
Returns a list of dicts with filename, last_modified, and
url (presigned, valid for 1 hour).
Source code in src/app/routers/monitoring.py
open_evidently_report(filename)
¶
Redirect to a presigned URL for an Evidently HTML report.
Opens the report directly in the browser when accessed as a link.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Report filename, e.g. |
required |
Source code in src/app/routers/monitoring.py
get_livescores(year=Query(default=None, ge=1998, le=2100), month=Query(default=None, ge=1, le=12), limit=Query(default=None, ge=1, le=10000), offset=Query(default=0, ge=0))
¶
Return all matches filtered by year and optionally month.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
year
|
int
|
Calendar year to filter by. Defaults to the current year. |
Query(default=None, ge=1998, le=2100)
|
month
|
int
|
Month (1–12) to filter within year. When omitted the full year is returned. |
Query(default=None, ge=1, le=12)
|
limit
|
int
|
Maximum number of rows to return. |
Query(default=None, ge=1, le=10000)
|
offset
|
int
|
Number of rows to skip (for pagination). |
Query(default=0, ge=0)
|
Returns:
| Type | Description |
|---|---|
List of
|
class: |
list[MatchRawLive]
|
|
Source code in src/app/routers/livescores.py
healthcheck()
async
¶
Healthcheck endpoint for Kubernetes probes. Returns information about the application state.
Returns:
| Type | Description |
|---|---|
|
HealthCheckResponse with status, version, worker PID, and |
|
|
current process memory usage in MB. |
Source code in src/app/routers/healthcheck.py
Services¶
FeatureLookupService
¶
Load precomputed features for all matches from the batch inference output.
The parquet file is produced by the batch_inference DVC stage and has
the match id as its index. It contains both upcoming matches and
finished matches (with outcome_1x2, homeScore, awayScore).
The service is loaded lazily on the first call and cached in-process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features_path
|
Path | None
|
Absolute path to |
None
|
Source code in src/app/services/predict.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
features_computed_at
property
¶
Return the UTC datetime when the feature file was last written.
Corresponds to the last batch_inference DVC stage run.
Returns:
| Type | Description |
|---|---|
datetime | None
|
UTC-aware |
datetime | None
|
or |
get_features(match_id)
¶
Return the feature dict for match_id, or None if not found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
match_id
|
int
|
Integer match identifier (index of the parquet file). |
required |
Returns:
| Type | Description |
|---|---|
dict | None
|
Dict of feature name → value with NaN entries removed, |
dict | None
|
or |
Source code in src/app/services/predict.py
list_matches()
¶
Return a lightweight list of upcoming matches for UI display.
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dicts with |
list[dict]
|
|
list[dict]
|
|
Source code in src/app/services/predict.py
PredictionLookupService
¶
Serve precomputed batch predictions from the batch_inference DVC stage output.
Mirrors the FeatureLookupService caching pattern:
- Checks local file first (dev / CI).
- Falls back to MinIO with a configurable re-check interval.
The parquet file is indexed by match id and must contain columns:
proba_home, proba_draw, proba_away, predicted_class,
predicted_label, optionally is_future, startTimeUtc,
homeTeamName, awayTeamName, model_run_id, model_stage.
Source code in src/app/services/predict.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 | |
predictions_computed_at
property
¶
Return the UTC datetime when predictions.parquet was last written.
Returns:
| Type | Description |
|---|---|
datetime | None
|
UTC-aware |
datetime | None
|
or |
get_prediction(match_id)
¶
Return the prediction dict for match_id, or None if not found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
match_id
|
int
|
Integer match identifier (index of the parquet file). |
required |
Returns:
| Type | Description |
|---|---|
dict | None
|
Dict of column name → value with NaN entries removed, |
dict | None
|
or |
Source code in src/app/services/predict.py
list_matches()
¶
Return all prediction rows as a list of dicts for diagnostics.
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dicts with |
list[dict]
|
Empty list when no predictions file has been loaded. |
Source code in src/app/services/predict.py
FonbetOddsService
¶
Serve Fonbet 1X2 odds from fonbet_odds.parquet
in the data-raw MinIO bucket.
Produced by the fetch_fonbet_odds pipeline stage. Indexed by
match_id. Follows the same lazy-load + interval-based refresh
pattern as :class:PredictionLookupService.
Source code in src/app/services/predict.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 | |
list_odds()
¶
Return 1X2 odds for all matches.
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dicts with |
list[dict]
|
and |
list[dict]
|
with |
Source code in src/app/services/predict.py
MatchCardService
¶
Merged match cards: precomputed predictions + Fonbet odds in one place.
Merges predictions.parquet (via :class:PredictionLookupService) with
fonbet_odds.parquet (via :class:FonbetOddsService) on match_id
and holds the result in memory. Rebuilds when either source reports a new
_mtime value — so it piggy-backs on their existing cache logic without
adding extra MinIO calls.
Source code in src/app/services/predict.py
491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 | |
list_cards()
¶
Return merged match cards as a JSON-safe list of dicts.
Each entry contains prediction probabilities, predicted class, Fonbet
1X2 odds, and a direct Fonbet URL (when linking pipeline has run with
the fonbet_sport_id column).
Source code in src/app/services/predict.py
RegionRoiService
¶
Serve regional ROI data from roi_by_region.csv in the predictions bucket.
Written by the live-betting pipeline stage and uploaded to MinIO under
analysis/live_betting/roi_by_region.csv. Follows the same lazy-load
and interval-based refresh pattern as :class:PredictionLookupService.
Returns an empty list when the file has not been produced yet.
Source code in src/app/services/predict.py
652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 | |
list_region_roi()
¶
Return ROI per region as a JSON-safe list of dicts.
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dicts with |
list[dict]
|
optionally |
list[dict]
|
file has not been produced yet. |
Source code in src/app/services/predict.py
PredictionService
¶
Loads and serves a model from the MLflow Model Registry.
The model is loaded lazily on the first call to predict and then
cached in-process for the lifetime of the worker.
Source code in src/app/services/predict.py
760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 | |
load()
¶
Eagerly load the model. Safe to call multiple times (idempotent).
Call this during application startup (e.g. FastAPI lifespan or
Celery worker_process_init) to avoid paying the cold-start
penalty on the first user request.
Returns:
| Type | Description |
|---|---|
Any
|
The loaded |
Source code in src/app/services/predict.py
get_model_info()
¶
Return model metadata from the MLflow Model Registry.
Queries the registered model for the configured stage, then fetches run metrics and params. Does NOT require the model to be loaded.
Returns:
| Type | Description |
|---|---|
dict
|
Dict with |
dict
|
|
Source code in src/app/services/predict.py
predict(features, match_id=None, features_computed_at=None)
¶
Run inference for a single match.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
dict
|
Feature dict matching model input schema. |
required |
match_id
|
int | None
|
Optional identifier for downstream tracing. |
None
|
features_computed_at
|
datetime | None
|
UTC timestamp when features were produced (batch_inference stage). Stored in the response for traceability. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Dict compatible with |
Source code in src/app/services/predict.py
996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 | |
Schemas¶
PredictRequest
¶
Bases: BaseModel
Input features for a single match prediction.
Features must match the model's training schema exactly. The model expects rolling-window difference features (side='diff', window=5) plus a categorical 'sex' column (0=men, 1=women).
Source code in src/app/schemas/predict.py
PredictionResult
¶
Bases: BaseModel
Outcome probabilities and metadata for a single match prediction.
Source code in src/app/schemas/predict.py
PredictResponse
¶
Bases: BaseModel
Full prediction response including match identity and inference metadata.
Source code in src/app/schemas/predict.py
AsyncPredictRequest
¶
AsyncPredictResponse
¶
Bases: BaseModel
Returned immediately after submitting an async prediction task.
Source code in src/app/schemas/predict.py
ModelInfoResponse
¶
Bases: BaseModel
MLflow model metadata returned by GET /predict/model/info.
Source code in src/app/schemas/predict.py
RegionRoiEntry
¶
Bases: BaseModel
ROI statistics for a single region, served by GET /predict/region-roi/.
Source code in src/app/schemas/predict.py
PrecomputedPredictResponse
¶
Bases: BaseModel
Response for GET /predict/precomputed/{match_id}.
Served directly from predictions.parquet produced by the batch_inference DVC stage — no Celery task, no MLflow model call at request time.
Source code in src/app/schemas/predict.py
LivescoresUpdateData
¶
Bases: BaseModel
Parameters for triggering a WhoScored livescores database update.
Source code in src/app/schemas/models.py
MatchRawLive
¶
Bases: BaseModel
Projected subset of MatchRaw for live-scores display.
Source code in src/app/schemas/models.py
HealthCheckResponse
¶
Bases: BaseModel
Schema for healthcheck endpoint response.
Source code in src/app/schemas/healthcheck.py
Celery Tasks¶
Celery task: asynchronous match outcome prediction.
Submitted by POST /predict/async/ and executed by the ml Celery worker.
The result is stored in the Celery result backend and can be retrieved via
GET /monitoring/task_status/{task_id}.
The task result has the same shape as PredictResponse so the Streamlit
polling page can display it directly.
Architecture note¶
The PredictionService is initialised once per worker process via the
worker_process_init Celery signal. This avoids loading the MLflow model
(potentially hundreds of MB from MinIO) on every task invocation.
predict_match(self, match_id, features, features_computed_at=None, model_stage=None)
¶
Run 1×2 inference for match_id using pre-computed features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
match_id
|
int
|
Identifier used for logging and response tracing. |
required |
features
|
dict
|
Feature dict matching the model's input schema, produced by
the |
required |
features_computed_at
|
str | None
|
ISO-8601 UTC string of when the features were computed (batch_inference mtime). Stored in the response for end-to-end traceability. |
None
|
model_stage
|
str | None
|
MLflow alias/stage to use (e.g. "champion", "challenger").
Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Dict compatible with |
Source code in src/app/tasks/predict.py
get_model_info(self, model_stage=None)
¶
Retrieve MLflow model metadata from the registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_stage
|
str | None
|
Stage/alias to query. Defaults to
|
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Dict compatible with |
Source code in src/app/tasks/predict.py
Prometheus Metrics¶
Prometheus metrics registry for the SoccerPredictAI service.
All metric objects are defined here as module-level singletons so they are shared across the FastAPI app and Celery worker within the same process.
Gunicorn multiprocess note¶
When running under Gunicorn with multiple workers, prometheus-client
requires PROMETHEUS_MULTIPROC_DIR to be set to a writable directory.
The /metrics endpoint in main.py uses MultiProcessCollector
automatically when that variable is present.
Worker¶
Celery worker entrypoint for the ML worker.
Handles ML inference tasks. Loads the MLflow model into memory on startup
via worker_process_init signal defined in tasks.predict.
Start command
celery -A app.worker_ml:celery_app worker -Q ml
App-layer Storage¶
create_client_s3()
¶
Return a boto3 S3 client configured for the MinIO endpoint.
Source code in src/app/data/storage.py
save_file_to_minio(file_path, bucket_name, object_name=None)
¶
Upload a local file to a MinIO bucket.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path to the local file to upload. |
required | |
bucket_name
|
Target MinIO bucket name. |
required | |
object_name
|
Object key in the bucket. Defaults to the basename of file_path. |
None
|
Returns:
| Type | Description |
|---|---|
|
|
Raises:
| Type | Description |
|---|---|
Exception
|
Re-raised after logging on upload failure. |
Source code in src/app/data/storage.py
save_json_to_minio(data, bucket_name, object_name)
¶
Serialise data to JSON and upload it to MinIO.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
JSON-serialisable object. |
required | |
bucket_name
|
Target MinIO bucket name. |
required | |
object_name
|
Object key in the bucket. |
required |
Returns:
| Type | Description |
|---|---|
|
|
Source code in src/app/data/storage.py
save_binary_to_minio(data, bucket_name, object_name)
¶
Upload a binary file-like object to MinIO.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
File-like object opened in binary mode (must support
|
required | |
bucket_name
|
Target MinIO bucket name. |
required | |
object_name
|
Object key in the bucket. |
required |
Returns:
| Type | Description |
|---|---|
|
|
Source code in src/app/data/storage.py
save_text_to_minio(data, bucket_name, object_name)
¶
Upload a plain-text string to MinIO.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
String content to upload. |
required | |
bucket_name
|
Target MinIO bucket name. |
required | |
object_name
|
Object key in the bucket. |
required |
Returns:
| Type | Description |
|---|---|
|
|
Source code in src/app/data/storage.py
save_dataframe_to_minio(df, bucket_name, object_name)
¶
Serialise a DataFrame to CSV and upload it to MinIO.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
pandas DataFrame to upload. |
required | |
bucket_name
|
Target MinIO bucket name. |
required | |
object_name
|
Object key in the bucket. |
required |
Returns:
| Type | Description |
|---|---|
|
|
Source code in src/app/data/storage.py
get_file_from_minio(bucket_name, object_name, file_path=None)
¶
Download an object from MinIO to a local file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_name
|
Source MinIO bucket name. |
required | |
object_name
|
Object key to download. |
required | |
file_path
|
Local destination path. Defaults to object_name. |
None
|
Returns:
| Type | Description |
|---|---|
|
The resolved local file_path. |
Raises:
| Type | Description |
|---|---|
Exception
|
Re-raised after logging on download failure. |
Source code in src/app/data/storage.py
get_binary_from_minio(bucket_name, object_name)
¶
Retrieve binary data from MinIO and return it as bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_name
|
Source MinIO bucket name. |
required | |
object_name
|
Object key to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
|
Raw bytes content of the object. |
Source code in src/app/data/storage.py
get_text_from_minio(bucket_name, object_name)
¶
Retrieve a text object from MinIO and return it as a string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_name
|
Source MinIO bucket name. |
required | |
object_name
|
Object key to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
|
String content of the object. |
Source code in src/app/data/storage.py
get_json_from_minio(bucket_name, object_name)
¶
Download a JSON object from MinIO and parse it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_name
|
Source MinIO bucket name. |
required | |
object_name
|
Object key to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
|
Parsed Python object (dict or list). |
Source code in src/app/data/storage.py
get_dataframe_from_minio(bucket_name, object_name)
¶
Download a CSV object from MinIO and parse it as a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_name
|
Source MinIO bucket name. |
required | |
object_name
|
Object key to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
|
pandas DataFrame parsed from the CSV content. |
Source code in src/app/data/storage.py
file_exists_in_minio(bucket_name, object_name)
¶
Check whether an object exists in a MinIO bucket.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_name
|
MinIO bucket to query. |
required | |
object_name
|
Object key to check. |
required |
Returns:
| Type | Description |
|---|---|
|
|
Source code in src/app/data/storage.py
list_files_in_minio(bucket_name, prefix='')
¶
List all object keys in a MinIO bucket under the given prefix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_name
|
MinIO bucket to list. |
required | |
prefix
|
Key prefix to filter results. Defaults to |
''
|
Returns:
| Type | Description |
|---|---|
|
List of object key strings. |
Source code in src/app/data/storage.py
Configuration¶
Whoscored
¶
Bases: BaseModel
Root container for WhoScored livescores API response schemas.
Source code in src/app/config/validate.py
Livescores
¶
Bases: BaseModel
WhoScored livescores feed payload.
Source code in src/app/config/validate.py
MatchArgs
¶
Bases: BaseModel
Arguments payload for a WhoScored match centre request.
Source code in src/app/config/validate.py
MatchHeader
¶
Bases: BaseModel
Header metadata for a WhoScored match entry.
Source code in src/app/config/validate.py
parse_datetime(v)
classmethod
¶
Parse a datetime string in dd/mm/YYYY HH:MM:SS format.
Source code in src/app/config/validate.py
Incident
¶
Bases: BaseModel
A single match incident (goal, card, or substitution).
Source code in src/app/config/validate.py
Match
¶
Bases: BaseModel
WhoScored livescores summary entry for a single match.
Source code in src/app/config/validate.py
handle_empty_datetime(v)
classmethod
¶
Coerce empty string to None for optional datetime fields.
Tournament
¶
Bases: BaseModel
A tournament stage grouping a set of matches.
Source code in src/app/config/validate.py
Localization
¶
Bases: BaseModel
UI label strings returned by the WhoScored livescores feed.
Source code in src/app/config/validate.py
Meta
¶
MatchCentreData
¶
Bases: BaseModel
Full match centre payload including events, stats, and formations.
Source code in src/app/config/validate.py
parse_multi_format_datetime(v)
¶
Parse datetime strings in multiple WhoScored date formats.
Source code in src/app/config/validate.py
Referee
¶
Bases: BaseModel
Match referee information from the WhoScored match centre.
Source code in src/app/config/validate.py
Team
¶
Bases: BaseModel
Team data payload from the WhoScored match centre.
Source code in src/app/config/validate.py
Formation
¶
Bases: BaseModel
Tactical formation snapshot for a given match period.
Source code in src/app/config/validate.py
Position
¶
TeamStats
¶
Bases: BaseModel
Aggregate per-minute statistics for a team in a match.
Source code in src/app/config/validate.py
IncidentEvents
¶
Bases: BaseModel
A single event entry from the WhoScored match centre event stream.
Source code in src/app/config/validate.py
ValueDisplayName
¶
Bases: BaseModel
Integer code paired with a human-readable display name.
Source code in src/app/config/validate.py
Qualifier
¶
Bases: BaseModel
Event qualifier providing additional context for an incident.
Source code in src/app/config/validate.py
ShotStat
¶
ZoneStats
¶
ShotZones
¶
Bases: BaseModel
Shot outcome counts broken down by pitch zone for a team.
Source code in src/app/config/validate.py
PlayerStats
¶
Bases: BaseModel
Per-minute player statistics from the WhoScored match centre.
Source code in src/app/config/validate.py
509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 | |
Player
¶
Bases: BaseModel
Player entry in a team's squad for a specific match.
Source code in src/app/config/validate.py
Scores
¶
Bases: BaseModel
Score totals at each period boundary for a team.
Source code in src/app/config/validate.py
ExpandedMinutes
¶
Bases: BaseModel
Mapping from real match minute to expanded minute index per period.
Source code in src/app/config/validate.py
Offer
¶
Bases: BaseModel
A bookmaker odds offer for a single bet outcome.
Source code in src/app/config/validate.py
BetOption
¶
Bases: BaseModel
A named bet option with one or more bookmaker offers.
Source code in src/app/config/validate.py
Bets
¶
Bases: BaseModel
1X2 odds container with home, draw, and away bet options.
Source code in src/app/config/validate.py
TopParameters
¶
Bases: BaseModel
Fonbet API parameters controlling top-events and sports filtering.
Source code in src/app/config/validate_bets.py
TournamentInfo
¶
Bases: BaseModel
Fonbet tournament metadata entry.
Source code in src/app/config/validate_bets.py
Sport
¶
Bases: BaseModel
Fonbet sport or competition node from the events catalogue.
Source code in src/app/config/validate_bets.py
Event
¶
Bases: BaseModel
Fonbet betting event entry (match or tournament node).
Source code in src/app/config/validate_bets.py
CustomFactor
¶
Bases: BaseModel
Custom Fonbet betting factors for a specific event.
Source code in src/app/config/validate_bets.py
PariEvents
¶
Bases: BaseModel
Root Fonbet events catalogue packet from the live odds feed.
Source code in src/app/config/validate_bets.py
ColumnType
¶
Bases: str, Enum
SQL column types
Source code in src/app/config/database.py
ColumnConstraint
¶
Column
¶
Bases: BaseModel
Database column definition
Source code in src/app/config/database.py
to_sql()
¶
Convert column definition to SQL syntax
Source code in src/app/config/database.py
Table
¶
Bases: BaseModel
Database table definition
Source code in src/app/config/database.py
create_statement()
¶
DatabaseSettings
¶
Bases: BaseSettings
PostgreSQL connection settings for all project databases.
Reads credentials and connection parameters from SOCCER_POSTGRES_*
env vars. Toggle use_internal to switch between external
(docker-compose dev) and in-cluster (K8s) host/port.
Source code in src/app/config/database.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 | |
get_connection_string(db_name=None)
¶
Build a PostgreSQL connection URL for db_name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_name
|
Optional[str]
|
Target database name. Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
SQLAlchemy-compatible ``postgresql
|
//user:pass@host:port/db`` |
str
|
connection string. |
Source code in src/app/config/database.py
MLFlowSettings
¶
Bases: BaseSettings
MLflow tracking server and model registry settings.
Credentials and server URL are read from MLFLOW_* env vars.
model_name and model_stage default to values from the
inference block in params.yaml and can be overridden via
env vars for CI/CD or hotfixes.
Source code in src/app/config/mlflow.py
effective_stages
property
¶
Stages to pre-load in the ML worker.
Priority: 1. MLFLOW_MODEL_STAGES env var (CSV) — for CI/CD overrides. 2. inference.model_stages list from params.yaml. 3. Fallback: [model_stage].
ScraperSettings
¶
Bases: BaseSettings
Selenoid scraper infrastructure settings.
Reads the Selenoid server IP from the SCRAPER_IP env var.
Source code in src/app/config/scraper.py
SecuritySettings
¶
Bases: BaseSettings
API authentication and CORS configuration.
Reads the header token (FASTAPI_HEADER_TOKEN), query token
(FASTAPI_QUERY_TOKEN), and the CORS allow-list CSV
(CORS_ALLOWED_ORIGINS) from env vars.
Source code in src/app/config/security.py
cors_allowed_origins
property
¶
Parse the CORS allow-list from the CORS_ALLOWED_ORIGINS CSV env var.
Returns:
| Type | Description |
|---|---|
list[str]
|
Empty list if unset (blocks all cross-origin requests), |
list[str]
|
|
list[str]
|
strings for explicit allow-lists. |
MinioSettings
¶
Bases: BaseSettings
MinIO / S3-compatible object storage settings.
Reads credentials and endpoint URL from MINIO_* env vars.
All project buckets are declared explicitly so that references
are discoverable and auditable in one place.
Source code in src/app/config/storage.py
storage_options
property
¶
Return fsspec/s3fs storage options for pandas/pyarrow I/O.
Returns:
| Type | Description |
|---|---|
dict
|
Dict with |
dict
|
(containing |
dict
|
|