5. Workflow · Safe Agentic

Safe Agentic A working canon · v0.27

§ criteria

Pipeline reliability

plan → implement → PR flows end-to-end with reliable triggers, webhooks, and transitions between stages

current · target

CI/CD pipeline health

the CI/CD pipeline *itself* (distinct from 2.1–2.4 which score *what* CI runs) is fast, reliable, observable, versioned, and environment-matched to production. A slow or flaky CI pipeline makes downstream scores meaningless

current · target

Change sets / release management

aggregated changelogs in a monorepo

current · target

Multi-agent delegation

different agent roles (investigator, implementer, reviewer, planner) operate as differentiated full-stack roles: own context scope, tools, permissions, skills, and prompts

current · target

Spec-first agent loop

implementation tasks with specifiable behaviour enter the agent's loop with an executable acceptance criterion (failing test, type signature, or conformance check). The agent iterates against that gate before opening a PR. Exploratory work and UI spikes are explicitly exempt — the criterion scopes to tasks where behaviour can be specified up-front

current · target

PR reviewability

agent-generated PRs include test evidence, screenshots, decision rationale, and rejected alternatives so a human can glance in <5 min. Branch is current with target when review is requested

current · target

Signal-driven task generation

signals from both proactive sources (scheduled security scans, UI regression runs, mutation testing, health checks) and reactive sources (user reviews, support tickets, app store ratings, meeting notes, production metrics) flow through automated triage into typed task creation. Both sources contribute; neither is manually gated. The proactive-source path depends on an agent-invokable scheduling primitive — scored here rather than as its own criterion, but load-bearing for `PL2-test-quality`, `PL2-ui-test-coverage`, `PL2-load-stress-testing`, `PL4-release-strategy`, `PL5-pipeline-reliability`, `PL5-outcome-input-loop` as well

current · target

Outcome → input loop

production / canary metrics from a deployed change *automatically generate* the next decision: deprecate, expand, A/B continue. Closes the FSD loop

current · target

Experiment tracking

canary results feed into a learnings doc, including *negative* results, queryable by future agents

current · target

Reusable skills extracted across projects

the compounding effect at the portfolio level

current · target