Skip to content

Development Workflow & AI Tooling

This section documents how the project is developed and maintained day-to-day. It covers two things: the engineering guardrails that keep the codebase correct and reproducible, and the AI tooling wired to operate within those guardrails.

Status: Operational. The customization layer, audit cycles, and iteration plans below are all active. See Implementation Status for system-level readiness.


Engineering discipline first

The codebase is governed by explicit rules that apply to every change — human or AI-assisted:

  • No bypassing Hydra, DVC, or MLflow.
  • No coupling training logic to serving logic.
  • No cross-layer shortcuts (data / features / models / pipelines / app are independent layers).
  • No silent change of public behavior.
  • No claiming planned design as implemented.
  • No new dependencies without justification.

These rules are enforced by code review, CI quality gates, and the test suite. The AI tooling is configured to operate within these same rules — it does not lower the bar.


How a request flows

flowchart TD
    U[Developer request] --> A{Activation tier}
    A -->|Always-on| R1[copilot-instructions.md\nproject-wide rules]
    A -->|Auto-attached| R2["*.instructions.md\nscoped by file path"]
    A -->|On-demand| R3[/prompts, skills, agents/]
    A -->|Deterministic| R4[hooks\ntool lifecycle]
    R1 --> M[Model proposes change]
    R2 --> M
    R3 --> M
    R4 --> M
    M --> H[Human review\n+ tests + DVC + MLflow]
    H --> C[Commit]

The four activation tiers work like IDE autocomplete vs. command palette vs. CI hooks — just applied to an agent. Full mechanics: .github/AGENT_CUSTOMIZATION.md.


What AI tooling is used for

Workflow What the AI does Human still does
New feature / endpoint Scaffolds boilerplate, checks for boundary violations Reviews, runs tests, verifies contracts
Refactoring Proposes change with scope limit Approves scope, checks no hidden side effects
Test coverage Identifies gaps, generates property tests Reviews invariants, runs in CI
Architecture review Checks against documented boundaries and contracts Decides on design changes
Audit cycle Runs audit-system skill, generates report Reviews findings, creates prioritized plan
Documentation Generates drafts from code and specs Verifies accuracy, removes speculation
Debug a test failure Traces failure to root cause Decides on fix

The principle: AI accelerates the typing; the engineering discipline comes from the project rules.


Real productivity examples

  • A new DVC pipeline stage (e.g., add_feature_group) takes ~20 minutes with AI scaffolding vs. ~90 minutes from scratch. The AI fills in boilerplate; the human designs the contract.
  • System-wide audit cycles (finding docs vs. code contradictions, status mismatches) that would take a full day manually are reduced to a 30-minute structured session using the audit-system skill.
  • Generating property tests for the leakage invariant took one iteration with AI vs. multiple hours of manual research into hypothesis strategies.

AI validation rules

Every non-trivial AI-assisted change must satisfy:

  1. pytest tests/ -q passes with no new failures.
  2. ruff check src/ passes (no lint regressions).
  3. dvc repro still produces the same pipeline output (if pipeline-adjacent).
  4. Documentation updated to match any behavior change.
  5. No speculative or planned behavior described as implemented.

AI-generated code is never merged without these checks. The CI pipeline enforces most of them automatically.


Hard constraints enforced in every instruction file

  • Hydra / DVC / MLflow: configuration, pipeline, and tracking tools are mandatory, not optional.
  • Layer boundaries: src/data/, src/features/, src/models/, src/pipelines/, src/app/ are isolated.
  • No opportunistic refactor: AI changes only what the task requires. Unused cleanup is rejected.
  • Status honesty: every status claim must be backed by code or explicit task context.

Tooling inventory

Artifact Count Purpose
Always-on rules 1 file (copilot-instructions.md) Project-wide guardrails
Scoped instruction files 9 Python, FastAPI, Airflow, MLflow, DVC, features, tests, docs, agent-customization
Subagents 2 Code reviewer (read-only), Docs agent (read-only)
Prompts 7 Add endpoint, add pipeline stage, add feature, register model, release checklist, debug test, sync docs
Skills 5 audit-system, plan-test-coverage, error-analysis, dvc-pipeline-optimize, train-serve-skew-check
Hooks 1 config pre-tool-checks.json
MCP servers 2 awesome-copilot-main reference catalogue, soccer-docs filesystem
Audit cycles completed 5 2026-04-24, -04-26, -04-28, -04-30, -05-16

What’s in this section

Page What it covers
Customization Layer .github/ contents: agents, instructions, prompts, skills, hooks. When each activates and how to invoke it.
Continuous System Audits The audit-system skill and reports/validation/ artifacts. How a system-wide health check is a reproducible procedure.
Iteration Plans The reports/planning/ artifacts: dated, phased plans generated and tracked with AI assistance.