5. Workflow
The meta-layer that ties 1–4 together, including periodic and proactive loops.
Max: 30 points
The meta-layer that ties 1–4 together into a continuously running loop that improves with each cycle.
Split into the plumbing that moves work through the pipeline (5a) and the compounding loop that makes each cycle cheaper than the last (5b).
5a. Pipeline mechanics
The plumbing — how work flows from idea to shipped change.
| # | Criterion | 0 | 1 | 2 | 3 |
|---|---|---|---|---|---|
| 5.1 | PL5-pipeline-reliability Pipeline reliability — plan → implement → PR flows end-to-end with reliable triggers, webhooks, and transitions between stages | Manual end-to-end; or triggers frequently break | Pipeline exists but flaky — triggers break, webhooks drop, stages require manual nudging | Reliable pipeline with agent-driven transitions; webhooks monitored and self-healing | Pipeline bottlenecks and trigger failures auto-detected; throughput improves measurably over time; flakiness decreases |
| 5.2 | PL5-cicd-pipeline-health CI/CD pipeline health — the CI/CD pipeline itself (distinct from 2.1–2.4 which score what CI runs) is fast, reliable, observable, versioned, and environment-matched to production. A slow or flaky CI pipeline makes downstream scores meaningless | Slow (>30min), flaky (>10% infrastructure-caused failures), opaque logs, pipeline config untested | Config as code and versioned, but pipeline is slow or flaky; infrastructure failures mixed with real failures in ways agents can’t distinguish | Fast (<15min for PR validation), reliable (<2% infrastructure failures), agent-readable logs that distinguish infrastructure vs. real failures, config reviewed like production code, environment matches production | Pipeline performance and flakiness tracked over time; regressions auto-investigated; slow stages auto-flagged for optimisation; new stages emerge from observed task patterns |
| 5.3 | PL5-change-sets Change sets / release management — aggregated changelogs in a monorepo | Ad-hoc | Manual changelog | Automated change sets with release notes | Release patterns inform sizing recommendations; risky-release fingerprints learned from past incidents |
| 5.4 | PL5-multi-agent-delegation Multi-agent delegation — different agent roles (investigator, implementer, reviewer, planner) operate as differentiated full-stack roles: own context scope, tools, permissions, skills, and prompts | Single generic agent | Two or more roles differentiated by prompt only; shared tools, permissions, and context scope | Coordinator → field-agent hierarchy with differentiated full-stack roles — each role has its own context scope (structurally enforced per PL1-codebase-scoping), tools, permissions, and prompt. The partitioning serves not only parallelism and specialisation but also lethal-trifecta separation: no single role simultaneously holds (1) access to private data, (2) exposure to untrusted content, and (3) ability to externally communicate. Concrete realisations include sparse-checkout context scoping (role sees only its subset of the repo), differentiated MCP / tool scopes per role (investigator reads private data without outbound capability; outbound-capable roles do not ingest untrusted content), and segregated credential tenancy per role. Clear handoffs between roles. Approval authority for material changes (elevation requests, protected-branch merges, release promotions, production-impacting actions) remains human, extending PL4-branch-protection’s human-approval requirement to approval surfaces outside git. Agent-to-agent approval is permitted only within a platform-codified low-risk policy (e.g. dependency updates within declared semver bounds, cosmetic-only commits, test-only changes) and only where the approving agent is in a distinctly-credentialed role with audit trail. Segregation of incompatible duties between agent roles — both approval duties and trifecta-leg duties — is load-bearing, not optional | Role-specific skill libraries and tool sets evolve from outcomes; delegation patterns improve over time; underperforming roles auto-flagged for prompt / skill refinement |
| 5.5 | PL5-spec-first-loop Spec-first agent loop — implementation tasks with specifiable behaviour enter the agent’s loop with an executable acceptance criterion (failing test, type signature, or conformance check). The agent iterates against that gate before opening a PR. Exploratory work and UI spikes are explicitly exempt — the criterion scopes to tasks where behaviour can be specified up-front | Agent generates code first; tests written after (or not at all) in the same agent run | Spec-first used ad-hoc on some task types; no convention; acceptance criteria captured inconsistently | Default for implementation tasks with specifiable behaviour: task templates (see PL1-task-decomposition) include an acceptance-criterion field that becomes the pre-code gate; agent iterates until green before opening PR. Exploratory and UI spikes explicitly exempted | Gate quality compounds: recurring spec shapes become reusable acceptance-criterion templates; failed-gate patterns (tests that pass but don’t catch behaviour) inform prompt and template refinement; spec-to-green iteration count tracked and trends down |
| 5.6 | PL5-pr-reviewability PR reviewability — agent-generated PRs include test evidence, screenshots, decision rationale, and rejected alternatives so a human can glance in <5 min. Branch is current with target when review is requested | Code diff only | Some PRs include evidence; branches are sometimes out-of-sync when review is requested | Every PR is self-explanatory: tests, screenshots, reasoning, alternatives considered. Branch is current with target when review is requested — reviewers don’t receive stale diffs | PRs with high glance-rejection rates analysed; agent prompts updated to address recurring weaknesses; review time (Glance Threshold) trends down over time. PR freshness maintained across the review lifecycle — the diff a reviewer sees is the diff that actually merges |
5b. Compounding loop
Every criterion here closes a loop that would otherwise require a human to step in. This is where the FSD-of-engineering vision lives.
| # | Criterion | 0 | 1 | 2 | 3 |
|---|---|---|---|---|---|
| 5.7 | PL5-signal-driven-tasks Signal-driven task generation — signals from both proactive sources (scheduled security scans, UI regression runs, mutation testing, health checks) and reactive sources (user reviews, support tickets, app store ratings, meeting notes, production metrics) flow through automated triage into typed task creation. Both sources contribute; neither is manually gated. The proactive-source path depends on an agent-invokable scheduling primitive — scored here rather than as its own criterion, but load-bearing for PL2-test-quality, PL2-ui-test-coverage, PL2-load-stress-testing, PL4-release-strategy, PL5-pipeline-reliability, PL5-outcome-input-loop as well | Neither source creates tasks; signals are lost or read by humans only | Partial coverage — some scheduled jobs produce reports or external signals are manually triaged into tickets, but not both automated | Comprehensive across both sources: scheduled runs (scans, regression, mutation tests, health checks) and external signals (reviews, tickets, metrics, meeting notes) flow through automated triage into typed task creation; sanitization applied at ingestion per PL4-prompt-injection-defence. The proactive-source path requires an agent-invokable scheduler — the agent can create, edit, and cancel scheduled jobs through the project’s own tool surface, not merely observe jobs ops configured out-of-band. Without this primitive the criterion caps at level 1 regardless of reactive-source coverage. | Signal classification improves from past triage decisions; scan cadence and scope auto-tuned from finding-rate trends; new scan types added from observed incident classes; signal-to-task conversion quality measured and improves over time |
| 5.8 | PL5-outcome-input-loop Outcome → input loop — production / canary metrics from a deployed change automatically generate the next decision: deprecate, expand, A/B continue. Closes the FSD loop | Canary results read manually if at all | Metrics visible, decisions still manual | Metric thresholds trigger automated next-cycle tasks (deprecate / promote / extend experiment) | Decision quality improves — system learns which canaries predict success vs. false signals; thresholds auto-tune |
| 5.9 | PL5-experiment-tracking Experiment tracking — canary results feed into a learnings doc, including negative results, queryable by future agents | None | Wins logged, losses forgotten | Both wins and losses captured, indexed, retrievable as agent context | Learnings retrieval rate measured; recurring-decision rate trends downward; experiment-design quality improves from past failures |
| 5.10 | PL5-portfolio-skill-reuse Reusable skills extracted across projects — the compounding effect at the portfolio level | Each project reinvents | Informal sharing | Central skill / tool library reused across clients. Extracted skills operate on the abstract pattern, not the specific instance — tenant-specific context (client names, negotiated decisions, proprietary patterns) stays within its tenant; cross-project reuse carries only the generalised form | Skill usage tracked; high-value skills get versioned and improved; new skills extracted automatically from observed patterns across projects |
Recipes that advance criteria in this pillar
Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.
PL5-pipeline-reliability — Pipeline reliability
PL5-cicd-pipeline-health — CI/CD pipeline health
No recipes yet.
PL5-change-sets — Change sets / release management
No recipes yet.
PL5-multi-agent-delegation — Multi-agent delegation
PL5-spec-first-loop — Spec-first agent loop
No recipes yet.
PL5-pr-reviewability — PR reviewability
No recipes yet.
PL5-signal-driven-tasks — Signal-driven task generation
PL5-outcome-input-loop — Outcome → input loop
PL5-experiment-tracking — Experiment tracking
No recipes yet.
PL5-portfolio-skill-reuse — Reusable skills extracted across projects
No recipes yet.