2. Validation
Hard, deterministic rules that catch non-deterministic output.
Max: 28 points (PL2-hard-validation-gates and PL2-secret-hygiene are max 2)
Hard, deterministic rules that catch non-deterministic output before it reaches production.
Ordered by frequency of execution — fastest and most-frequent first, least-frequent last.
| # | Criterion | 0 | 1 | 2 | 3 |
|---|---|---|---|---|---|
| 2.1 | PL2-hard-validation-gates Hard validation gates — violations of deterministic code-quality rules (lint, typecheck, format; e.g. Biome, ESLint, SwiftLint) cannot reach the protected branch undetected. Enforcement happens at multiple checkpoints with the same ruleset across layers, so no surprise surfaces between them, and bypassing an earlier checkpoint is caught by a later one (max 2) | No enforcement at any checkpoint; violations routinely reach merge | Partial enforcement — rules enforced at one checkpoint (typically CI at merge) but not at earlier ones; a developer bypassing that checkpoint can land violations before review | Enforcement at multiple checkpoints (commonly pre-commit hooks + CI, also valid: pre-push + CI, merge queue + CI, pre-receive server-side hook + CI) with consistent ruleset across layers; bypass at any earlier checkpoint is caught by a later one, so violations cannot reach the protected branch undetected | — |
| 2.2 | PL2-test-colocation-coverage Test location discipline and coverage — a single test-location convention applies repo-wide (colocated with source, parallel-tree, or other), consistently applied so tests are predictably findable for humans and agents; coverage is enforced via global target and per-PR differential threshold | No coverage target; tests scattered across multiple conventions within a single repo | Coverage target set but location conventions inconsistent (mix of conventions within a single repo); differential coverage not enforced | Single test-location convention enforced repo-wide (no exceptions); 95% global coverage met; changed lines in each PR must also meet a differential coverage threshold (e.g. 90% of new/modified lines covered); both global and differential gates enforced in CI; deviations block merge | Coverage gaps (both global and per-PR differential) trigger automated test-generation tasks; uncovered code paths flagged from production traces |
| 2.3 | PL2-test-quality Test quality verification — tests are verified to actually catch bugs, not just exercise lines. Mutation testing (Stryker for TS, Muter for Swift) or equivalent mechanism confirms that tests assert behaviour, not merely execution | No mechanism beyond line coverage | PR-review checklist includes “do tests assert behaviour, not just execution?” | Mutation testing on critical-path modules (payments, domain protocols, state machines, auth flows) with mutation score target (e.g. >75%); run periodically, not per-PR (scheduling scored under PL5-signal-driven-tasks) | Surviving mutants auto-create test-improvement tasks; weak-test patterns identified and prevented at template level; mutation score trends tracked over time |
| 2.4 | PL2-ui-test-coverage UI test coverage on mobile / frontend | None | Happy path only | Coverage across critical flows, run daily | Test failures auto-create investigation tasks; flaky tests tracked and quarantined; new features auto-generate test stubs from spec |
| 2.5 | PL2-sast-dast SAST / DAST present — static and dynamic application security testing with agent-actionable findings; findings, suppressions, and rule disables carry accountability (rationale, named reviewer, expiry where applicable). Tool choice is project-dependent (e.g. Aikido, SonarQube for compliance cases); the concern is coverage across both testing classes, not a specific vendor | None | One tool, partial coverage | Both static and dynamic application security testing in place, tuned, with agent-actionable findings. All finding suppressions and rule disables carry a documented rationale and a named reviewer, stored in version control alongside the code they cover; suppressions of high-severity findings additionally carry an expiry date for mandatory re-review. Suppressions without documentation block merge | Findings triaged by past resolution patterns; recurring vulnerability classes auto-generate prevention tasks. Stale or expired suppressions auto-flagged for review; recurring suppression patterns generate hardening tasks rather than additional waivers; suppression rate tracked and trends toward zero |
| 2.6 | PL2-secret-hygiene Secret hygiene — Aikido blocks new leaks and historical secrets are rotated / cleaned (max 2) | Secrets in repo history | New leaks blocked, history dirty | New leaks blocked, history clean, keys rotated | — |
| 2.7 | PL2-external-pr-review External PR review — human glance or multi-model review (e.g. Claude Code Reviews $25/PR council-of-experts model) | None | Human review only, often rushed | Layered: agent pre-review + human glance | Review feedback patterns inform agent prompts; reviewer-rejection categories tracked and reduced over time |
| 2.8 | PL2-taste-validation Qualitative taste validation — humans (not agents) test for taste / UX / defect discovery, not just correctness. Delivered either via a dedicated usability research service (Netizen Experience, UserTesting, Lookback, Maze, PlaybookUX) or via your own user base through canary releases and A/B tests. Catches what automated tests can’t: is this confusing? Does this feel wrong? Would a real user hit this edge case? | None | Occasional usability review or ad-hoc user feedback | Continuous human validation wired into release loop — either via research-service cohort or via canary/A-B rollouts to real users with feedback capture | Taste feedback patterns extracted; design-system rules updated; prompts/components evolved to address recurring taste failures |
| 2.9 | PL2-agent-audit-trail Agent action audit trail — every agent decision is logged with reasoning, retrievable, and reversible at granular level (catches quiet drift — subtle wrong actions humans don’t notice for weeks) | No audit trail | Commit-level revert only | Decision-level reasoning captured; full action history queryable and reversible | Patterns extracted from audit log inform future agent prompts; “agents in this codebase tend to over-use approach X” feeds back as guidance. Audit signal also identifies gaps in pre-action gating — actions that slipped past a gate surface here and drive gate refinement |
| 2.10 | PL2-load-stress-testing Load / stress testing — capability exists and is actually exercised. Least frequent of validation types — typically weekly or on major releases | Never run | Run ad-hoc, non-representative | Run on production-mirrored env, scheduled | Historical baselines stored; regressions auto-detected and auto-task; baselines update with legitimate growth |
Recipes that advance criteria in this pillar
Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.
PL2-hard-validation-gates — Hard validation gates
No recipes yet.
PL2-test-colocation-coverage — Test location discipline and coverage
No recipes yet.
PL2-test-quality — Test quality verification
PL2-ui-test-coverage — UI test coverage on mobile / frontend
PL2-sast-dast — SAST / DAST present
No recipes yet.
PL2-secret-hygiene — Secret hygiene
No recipes yet.
PL2-external-pr-review — External PR review
PL2-taste-validation — Qualitative taste validation
No recipes yet.
PL2-agent-audit-trail — Agent action audit trail
PL2-load-stress-testing — Load / stress testing