Skip to content

5. Workflow

The meta-layer that ties 1–4 together, including periodic and proactive loops.

Max: 30 points

The meta-layer that ties 1–4 together into a continuously running loop that improves with each cycle.

Split into the plumbing that moves work through the pipeline (5a) and the compounding loop that makes each cycle cheaper than the last (5b).

5a. Pipeline mechanics

The plumbing — how work flows from idea to shipped change.

#Criterion0123
5.1PL5-pipeline-reliability Pipeline reliability — plan → implement → PR flows end-to-end with reliable triggers, webhooks, and transitions between stagesManual end-to-end; or triggers frequently breakPipeline exists but flaky — triggers break, webhooks drop, stages require manual nudgingReliable pipeline with agent-driven transitions; webhooks monitored and self-healingPipeline bottlenecks and trigger failures auto-detected; throughput improves measurably over time; flakiness decreases
5.2PL5-cicd-pipeline-health CI/CD pipeline health — the CI/CD pipeline itself (distinct from 2.1–2.4 which score what CI runs) is fast, reliable, observable, versioned, and environment-matched to production. A slow or flaky CI pipeline makes downstream scores meaninglessSlow (>30min), flaky (>10% infrastructure-caused failures), opaque logs, pipeline config untestedConfig as code and versioned, but pipeline is slow or flaky; infrastructure failures mixed with real failures in ways agents can’t distinguishFast (<15min for PR validation), reliable (<2% infrastructure failures), agent-readable logs that distinguish infrastructure vs. real failures, config reviewed like production code, environment matches productionPipeline performance and flakiness tracked over time; regressions auto-investigated; slow stages auto-flagged for optimisation; new stages emerge from observed task patterns
5.3PL5-change-sets Change sets / release management — aggregated changelogs in a monorepoAd-hocManual changelogAutomated change sets with release notesRelease patterns inform sizing recommendations; risky-release fingerprints learned from past incidents
5.4PL5-multi-agent-delegation Multi-agent delegation — different agent roles (investigator, implementer, reviewer, planner) operate as differentiated full-stack roles: own context scope, tools, permissions, skills, and promptsSingle generic agentTwo or more roles differentiated by prompt only; shared tools, permissions, and context scopeCoordinator → field-agent hierarchy with differentiated full-stack roles — each role has its own context scope (structurally enforced per PL1-codebase-scoping), tools, permissions, and prompt. The partitioning serves not only parallelism and specialisation but also lethal-trifecta separation: no single role simultaneously holds (1) access to private data, (2) exposure to untrusted content, and (3) ability to externally communicate. Concrete realisations include sparse-checkout context scoping (role sees only its subset of the repo), differentiated MCP / tool scopes per role (investigator reads private data without outbound capability; outbound-capable roles do not ingest untrusted content), and segregated credential tenancy per role. Clear handoffs between roles. Approval authority for material changes (elevation requests, protected-branch merges, release promotions, production-impacting actions) remains human, extending PL4-branch-protection’s human-approval requirement to approval surfaces outside git. Agent-to-agent approval is permitted only within a platform-codified low-risk policy (e.g. dependency updates within declared semver bounds, cosmetic-only commits, test-only changes) and only where the approving agent is in a distinctly-credentialed role with audit trail. Segregation of incompatible duties between agent roles — both approval duties and trifecta-leg duties — is load-bearing, not optionalRole-specific skill libraries and tool sets evolve from outcomes; delegation patterns improve over time; underperforming roles auto-flagged for prompt / skill refinement
5.5PL5-spec-first-loop Spec-first agent loop — implementation tasks with specifiable behaviour enter the agent’s loop with an executable acceptance criterion (failing test, type signature, or conformance check). The agent iterates against that gate before opening a PR. Exploratory work and UI spikes are explicitly exempt — the criterion scopes to tasks where behaviour can be specified up-frontAgent generates code first; tests written after (or not at all) in the same agent runSpec-first used ad-hoc on some task types; no convention; acceptance criteria captured inconsistentlyDefault for implementation tasks with specifiable behaviour: task templates (see PL1-task-decomposition) include an acceptance-criterion field that becomes the pre-code gate; agent iterates until green before opening PR. Exploratory and UI spikes explicitly exemptedGate quality compounds: recurring spec shapes become reusable acceptance-criterion templates; failed-gate patterns (tests that pass but don’t catch behaviour) inform prompt and template refinement; spec-to-green iteration count tracked and trends down
5.6PL5-pr-reviewability PR reviewability — agent-generated PRs include test evidence, screenshots, decision rationale, and rejected alternatives so a human can glance in <5 min. Branch is current with target when review is requestedCode diff onlySome PRs include evidence; branches are sometimes out-of-sync when review is requestedEvery PR is self-explanatory: tests, screenshots, reasoning, alternatives considered. Branch is current with target when review is requested — reviewers don’t receive stale diffsPRs with high glance-rejection rates analysed; agent prompts updated to address recurring weaknesses; review time (Glance Threshold) trends down over time. PR freshness maintained across the review lifecycle — the diff a reviewer sees is the diff that actually merges

5b. Compounding loop

Every criterion here closes a loop that would otherwise require a human to step in. This is where the FSD-of-engineering vision lives.

#Criterion0123
5.7PL5-signal-driven-tasks Signal-driven task generation — signals from both proactive sources (scheduled security scans, UI regression runs, mutation testing, health checks) and reactive sources (user reviews, support tickets, app store ratings, meeting notes, production metrics) flow through automated triage into typed task creation. Both sources contribute; neither is manually gated. The proactive-source path depends on an agent-invokable scheduling primitive — scored here rather than as its own criterion, but load-bearing for PL2-test-quality, PL2-ui-test-coverage, PL2-load-stress-testing, PL4-release-strategy, PL5-pipeline-reliability, PL5-outcome-input-loop as wellNeither source creates tasks; signals are lost or read by humans onlyPartial coverage — some scheduled jobs produce reports or external signals are manually triaged into tickets, but not both automatedComprehensive across both sources: scheduled runs (scans, regression, mutation tests, health checks) and external signals (reviews, tickets, metrics, meeting notes) flow through automated triage into typed task creation; sanitization applied at ingestion per PL4-prompt-injection-defence. The proactive-source path requires an agent-invokable scheduler — the agent can create, edit, and cancel scheduled jobs through the project’s own tool surface, not merely observe jobs ops configured out-of-band. Without this primitive the criterion caps at level 1 regardless of reactive-source coverage.Signal classification improves from past triage decisions; scan cadence and scope auto-tuned from finding-rate trends; new scan types added from observed incident classes; signal-to-task conversion quality measured and improves over time
5.8PL5-outcome-input-loop Outcome → input loop — production / canary metrics from a deployed change automatically generate the next decision: deprecate, expand, A/B continue. Closes the FSD loopCanary results read manually if at allMetrics visible, decisions still manualMetric thresholds trigger automated next-cycle tasks (deprecate / promote / extend experiment)Decision quality improves — system learns which canaries predict success vs. false signals; thresholds auto-tune
5.9PL5-experiment-tracking Experiment tracking — canary results feed into a learnings doc, including negative results, queryable by future agentsNoneWins logged, losses forgottenBoth wins and losses captured, indexed, retrievable as agent contextLearnings retrieval rate measured; recurring-decision rate trends downward; experiment-design quality improves from past failures
5.10PL5-portfolio-skill-reuse Reusable skills extracted across projects — the compounding effect at the portfolio levelEach project reinventsInformal sharingCentral skill / tool library reused across clients. Extracted skills operate on the abstract pattern, not the specific instance — tenant-specific context (client names, negotiated decisions, proprietary patterns) stays within its tenant; cross-project reuse carries only the generalised formSkill usage tracked; high-value skills get versioned and improved; new skills extracted automatically from observed patterns across projects


Recipes that advance criteria in this pillar

Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.

PL5-pipeline-reliabilityPipeline reliability

PL5-cicd-pipeline-healthCI/CD pipeline health

No recipes yet.

PL5-change-setsChange sets / release management

No recipes yet.

PL5-multi-agent-delegationMulti-agent delegation

PL5-spec-first-loopSpec-first agent loop

No recipes yet.

PL5-pr-reviewabilityPR reviewability

No recipes yet.

PL5-signal-driven-tasksSignal-driven task generation

PL5-outcome-input-loopOutcome → input loop

PL5-experiment-trackingExperiment tracking

No recipes yet.

PL5-portfolio-skill-reuseReusable skills extracted across projects

No recipes yet.