Skip to content

Continuous System Audits

This page documents the audit-cycle practice used to keep the system honest. The procedure itself is encoded as a skill (audit-system); the artifacts it produces live under reports/validation/.

Status: Operational. Three full or partial cycles have been executed.


What an audit cycle is

An audit cycle is a numbered, read-only inspection of every layer of the system at a point in time. It does not change code — it only observes and reports mismatches between intended design (docs / contracts) and implemented behaviour (code).

The cycle is reproducible because the procedure is captured as a skill, not as a freeform chat. Re-running the same skill on the same code produces structurally identical reports — so cycle-over-cycle deltas are meaningful.


Audit catalog (00 – 14)

Each audit number has a fixed scope. A full cycle runs all 15 in order; a delta cycle runs only the audits relevant to recent changes.

# Audit Scope
00 System High-level architecture, component map, key risks
01 Data Ingestion, storage contracts, versioning, freshness
02 Features Feature pipeline, leakage, train/inference consistency
03 Training & Evaluation CV/split, model selection, metrics
04 Pipeline (DVC + Hydra) DVC DAG, params, reproducibility
05 MLflow Registry Experiments, lineage, registry, promotion
06 Train ↔ Serve Train/serve skew, preprocessing consistency
07 Serving FastAPI endpoints, model loading, Celery async
08 Orchestration Airflow DAGs, scheduling, retrain loop
09 UI Streamlit UI, API integration, error handling
10 Ops / Security / Observability Infra, secrets, metrics, model monitoring
11 Docs & Tests Docs vs code, test coverage, contract tests
12 Docs Validation Full claim-by-claim validation against codebase
13 Reproducibility DVC lock state, seed discipline, env pinning, MLflow lineage
14 Monitoring & Drift Drift artifacts, freshness SLAs, monitoring stack readiness

The catalog itself lives in audit-system/SKILL.md; the per-audit procedures live in audit-system/audits/.


Conventions

  • One subdirectory per cycle, named YYYYMMDD.
  • One file per audit, named <NN>_<audit-name>.md.
  • Same-day re-runs add _v2, _v3 suffix.
  • A full cycle covers 00–14and ends with a consolidated [SUMMARY.md`](../../reports/validation/20260428/SUMMARY.md).
  • A delta cycle covers only the audits relevant to recent change (e.g. only 00).

The full file-naming and re-run rules live in reports/validation/README.md.


Cycles run to date

Date Type Audits Summary
2026-04-24 Full 00–11 (v2 of 00 same day) SUMMARY.md
2026-04-26 Delta 00 only SUMMARY.md
2026-04-28 Full 00–12 SUMMARY.md
2026-04-30 Full 00–14 + v2 of all (v2 same day) SUMMARY.md · SUMMARY_v2.md
2026-05-16 Full 00–14 + v2 of all (v2 same day) SUMMARY.md · SUMMARY_v2.md

What a SUMMARY.md contains

Every full-cycle summary follows the same structure, enforced by the skill:

  1. Header — date, auditor (model + skill), cycle scope, baseline reference, one-line verdict.
  2. Per-audit links — table with audit number, link, one-line outcome.
  3. Best-practices compliance scorecard — per-audit % and overall % with delta vs prior baseline.
  4. Consolidated risk register — all P0 / P1 findings deduplicated; status open / re-confirmed / new / resolved-since-baseline.
  5. Delta vs prior baseline — explicit statement of what changed (or that nothing changed).
  6. Top must-fix items — ordered list (≤ 10) for the next cycle.
  7. Open questions / unverified areas — anything the cycle could not validate.

The summary is a synthesis only — it never introduces a new finding without first adding it to one of the per-audit reports.


Example: 2026-04-28 cycle outcome

The 2026-04-30 cycle (prior baseline: 2026-04-28) ran all 15 audits for the first time, including 13 (Reproducibility) and 14 (Monitoring & Drift). It followed a large refactor (commit 45d2c86, +16 411 / −3 088 LoC) that resolved most P0/P1 risks from the baseline. Overall score: 65.7%.

The 2026-05-16 cycle (prior baseline: 2026-04-30) is the most recent. Overall score improved to 73.6% (+7.9 pp). Notable gains: Features (+28.6 pp), MLflow (+27.8 pp), Reproducibility (+37.5 pp), Monitoring & Drift (+30.0 pp). Main open risk: docs-validation score dropped sharply (−62.1 pp) reflecting new doc↔code contradictions introduced since the prior cycle.

See: reports/validation/20260516/SUMMARY.md.


How a cycle is invoked

/skill-audit-system full        # full cycle 00 → 14
/skill-audit-system 05          # only audit 05 (MLflow Registry)
/skill-audit-system 00 system   # alias form

The skill writes reports to reports/validation/YYYYMMDD/<NN>_<name>.md automatically. Same-day re-runs are versioned by suffix; cross-cycle history is preserved by date folder.