Continuous System Audits¶

This page documents the audit-cycle practice used to keep the system honest. The procedure itself is encoded as a skill (audit-system); the artifacts it produces live under reports/validation/.

Status: Operational. Three full or partial cycles have been executed.

What an audit cycle is¶

An audit cycle is a numbered, read-only inspection of every layer of the system at a point in time. It does not change code — it only observes and reports mismatches between intended design (docs / contracts) and implemented behaviour (code).

The cycle is reproducible because the procedure is captured as a skill, not as a freeform chat. Re-running the same skill on the same code produces structurally identical reports — so cycle-over-cycle deltas are meaningful.

Audit catalog (00 – 14)¶

Each audit number has a fixed scope. A full cycle runs all 15 in order; a delta cycle runs only the audits relevant to recent changes.

#	Audit	Scope
00	System	High-level architecture, component map, key risks
01	Data	Ingestion, storage contracts, versioning, freshness
02	Features	Feature pipeline, leakage, train/inference consistency
03	Training & Evaluation	CV/split, model selection, metrics
04	Pipeline (DVC + Hydra)	DVC DAG, params, reproducibility
05	MLflow Registry	Experiments, lineage, registry, promotion
06	Train ↔ Serve	Train/serve skew, preprocessing consistency
07	Serving	FastAPI endpoints, model loading, Celery async
08	Orchestration	Airflow DAGs, scheduling, retrain loop
09	UI	Streamlit UI, API integration, error handling
10	Ops / Security / Observability	Infra, secrets, metrics, model monitoring
11	Docs & Tests	Docs vs code, test coverage, contract tests
12	Docs Validation	Full claim-by-claim validation against codebase
13	Reproducibility	DVC lock state, seed discipline, env pinning, MLflow lineage
14	Monitoring & Drift	Drift artifacts, freshness SLAs, monitoring stack readiness

The catalog itself lives in audit-system/SKILL.md; the per-audit procedures live in audit-system/audits/.

Conventions¶

One subdirectory per cycle, named YYYYMMDD.
One file per audit, named <NN>_<audit-name>.md.
Same-day re-runs add _v2, _v3 suffix.
A full cycle covers 00–14and ends with a consolidated [SUMMARY.md`](../../reports/validation/20260428/SUMMARY.md).
A delta cycle covers only the audits relevant to recent change (e.g. only 00).

The full file-naming and re-run rules live in reports/validation/README.md.

Cycles run to date¶

Date	Type	Audits	Summary
2026-04-24	Full	00–11 (`v2` of 00 same day)	SUMMARY.md
2026-04-26	Delta	00 only	SUMMARY.md
2026-04-28	Full	00–12	SUMMARY.md
2026-04-30	Full	00–14 + v2 of all (`v2` same day)	SUMMARY.md · SUMMARY_v2.md
2026-05-16	Full	00–14 + v2 of all (`v2` same day)	SUMMARY.md · SUMMARY_v2.md

What a `SUMMARY.md` contains¶

Every full-cycle summary follows the same structure, enforced by the skill:

Header — date, auditor (model + skill), cycle scope, baseline reference, one-line verdict.
Per-audit links — table with audit number, link, one-line outcome.
Best-practices compliance scorecard — per-audit % and overall % with delta vs prior baseline.
Consolidated risk register — all P0 / P1 findings deduplicated; status open / re-confirmed / new / resolved-since-baseline.
Delta vs prior baseline — explicit statement of what changed (or that nothing changed).
Top must-fix items — ordered list (≤ 10) for the next cycle.
Open questions / unverified areas — anything the cycle could not validate.

The summary is a synthesis only — it never introduces a new finding without first adding it to one of the per-audit reports.

Example: 2026-04-28 cycle outcome¶

The 2026-04-30 cycle (prior baseline: 2026-04-28) ran all 15 audits for the first time, including 13 (Reproducibility) and 14 (Monitoring & Drift). It followed a large refactor (commit 45d2c86, +16 411 / −3 088 LoC) that resolved most P0/P1 risks from the baseline. Overall score: 65.7%.

The 2026-05-16 cycle (prior baseline: 2026-04-30) is the most recent. Overall score improved to 73.6% (+7.9 pp). Notable gains: Features (+28.6 pp), MLflow (+27.8 pp), Reproducibility (+37.5 pp), Monitoring & Drift (+30.0 pp). Main open risk: docs-validation score dropped sharply (−62.1 pp) reflecting new doc↔code contradictions introduced since the prior cycle.

See: reports/validation/20260516/SUMMARY.md.

How a cycle is invoked¶

/skill-audit-system full        # full cycle 00 → 14
/skill-audit-system 05          # only audit 05 (MLflow Registry)
/skill-audit-system 00 system   # alias form

The skill writes reports to reports/validation/YYYYMMDD/<NN>_<name>.md automatically. Same-day re-runs are versioned by suffix; cross-cycle history is preserved by date folder.