Development Workflow & AI Tooling¶

This section documents how the project is developed and maintained day-to-day. It covers two things: the engineering guardrails that keep the codebase correct and reproducible, and the AI tooling wired to operate within those guardrails.

Status: Operational. The customization layer, audit cycles, and iteration plans below are all active. See Implementation Status for system-level readiness.

Engineering discipline first¶

The codebase is governed by explicit rules that apply to every change — human or AI-assisted:

No bypassing Hydra, DVC, or MLflow.
No coupling training logic to serving logic.
No cross-layer shortcuts (data / features / models / pipelines / app are independent layers).
No silent change of public behavior.
No claiming planned design as implemented.
No new dependencies without justification.

These rules are enforced by code review, CI quality gates, and the test suite. The AI tooling is configured to operate within these same rules — it does not lower the bar.

How a request flows¶

flowchart TD
    U[Developer request] --> A{Activation tier}
    A -->|Always-on| R1[copilot-instructions.md\nproject-wide rules]
    A -->|Auto-attached| R2["*.instructions.md\nscoped by file path"]
    A -->|On-demand| R3[/prompts, skills, agents/]
    A -->|Deterministic| R4[hooks\ntool lifecycle]
    R1 --> M[Model proposes change]
    R2 --> M
    R3 --> M
    R4 --> M
    M --> H[Human review\n+ tests + DVC + MLflow]
    H --> C[Commit]

The four activation tiers work like IDE autocomplete vs. command palette vs. CI hooks — just applied to an agent. Full mechanics: .github/AGENT_CUSTOMIZATION.md.

What AI tooling is used for¶

Workflow	What the AI does	Human still does
New feature / endpoint	Scaffolds boilerplate, checks for boundary violations	Reviews, runs tests, verifies contracts
Refactoring	Proposes change with scope limit	Approves scope, checks no hidden side effects
Test coverage	Identifies gaps, generates property tests	Reviews invariants, runs in CI
Architecture review	Checks against documented boundaries and contracts	Decides on design changes
Audit cycle	Runs `audit-system` skill, generates report	Reviews findings, creates prioritized plan
Documentation	Generates drafts from code and specs	Verifies accuracy, removes speculation
Debug a test failure	Traces failure to root cause	Decides on fix

The principle: AI accelerates the typing; the engineering discipline comes from the project rules.

Real productivity examples¶

A new DVC pipeline stage (e.g., add_feature_group) takes ~20 minutes with AI scaffolding vs. ~90 minutes from scratch. The AI fills in boilerplate; the human designs the contract.
System-wide audit cycles (finding docs vs. code contradictions, status mismatches) that would take a full day manually are reduced to a 30-minute structured session using the audit-system skill.
Generating property tests for the leakage invariant took one iteration with AI vs. multiple hours of manual research into hypothesis strategies.

AI validation rules¶

Every non-trivial AI-assisted change must satisfy:

pytest tests/ -q passes with no new failures.
ruff check src/ passes (no lint regressions).
dvc repro still produces the same pipeline output (if pipeline-adjacent).
Documentation updated to match any behavior change.
No speculative or planned behavior described as implemented.

AI-generated code is never merged without these checks. The CI pipeline enforces most of them automatically.

Hard constraints enforced in every instruction file¶

Hydra / DVC / MLflow: configuration, pipeline, and tracking tools are mandatory, not optional.
Layer boundaries: src/data/, src/features/, src/models/, src/pipelines/, src/app/ are isolated.
No opportunistic refactor: AI changes only what the task requires. Unused cleanup is rejected.
Status honesty: every status claim must be backed by code or explicit task context.

Tooling inventory¶

Artifact	Count	Purpose
Always-on rules	1 file (`copilot-instructions.md`)	Project-wide guardrails
Scoped instruction files	9	Python, FastAPI, Airflow, MLflow, DVC, features, tests, docs, agent-customization
Subagents	2	Code reviewer (read-only), Docs agent (read-only)
Prompts	7	Add endpoint, add pipeline stage, add feature, register model, release checklist, debug test, sync docs
Skills	5	`audit-system`, `plan-test-coverage`, `error-analysis`, `dvc-pipeline-optimize`, `train-serve-skew-check`
Hooks	1 config	`pre-tool-checks.json`
MCP servers	2	`awesome-copilot-main` reference catalogue, `soccer-docs` filesystem
Audit cycles completed	5	2026-04-24, -04-26, -04-28, -04-30, -05-16

What’s in this section¶

Page	What it covers
Customization Layer	`.github/` contents: agents, instructions, prompts, skills, hooks. When each activates and how to invoke it.
Continuous System Audits	The `audit-system` skill and `reports/validation/` artifacts. How a system-wide health check is a reproducible procedure.
Iteration Plans	The `reports/planning/` artifacts: dated, phased plans generated and tracked with AI assistance.