Skip to content

2. Validation

Hard, deterministic rules that catch non-deterministic output.

Max: 28 points (PL2-hard-validation-gates and PL2-secret-hygiene are max 2)

Hard, deterministic rules that catch non-deterministic output before it reaches production.

Ordered by frequency of execution — fastest and most-frequent first, least-frequent last.

#Criterion0123
2.1PL2-hard-validation-gates Hard validation gates — violations of deterministic code-quality rules (lint, typecheck, format; e.g. Biome, ESLint, SwiftLint) cannot reach the protected branch undetected. Enforcement happens at multiple checkpoints with the same ruleset across layers, so no surprise surfaces between them, and bypassing an earlier checkpoint is caught by a later one (max 2)No enforcement at any checkpoint; violations routinely reach mergePartial enforcement — rules enforced at one checkpoint (typically CI at merge) but not at earlier ones; a developer bypassing that checkpoint can land violations before reviewEnforcement at multiple checkpoints (commonly pre-commit hooks + CI, also valid: pre-push + CI, merge queue + CI, pre-receive server-side hook + CI) with consistent ruleset across layers; bypass at any earlier checkpoint is caught by a later one, so violations cannot reach the protected branch undetected
2.2PL2-test-colocation-coverage Test location discipline and coverage — a single test-location convention applies repo-wide (colocated with source, parallel-tree, or other), consistently applied so tests are predictably findable for humans and agents; coverage is enforced via global target and per-PR differential thresholdNo coverage target; tests scattered across multiple conventions within a single repoCoverage target set but location conventions inconsistent (mix of conventions within a single repo); differential coverage not enforcedSingle test-location convention enforced repo-wide (no exceptions); 95% global coverage met; changed lines in each PR must also meet a differential coverage threshold (e.g. 90% of new/modified lines covered); both global and differential gates enforced in CI; deviations block mergeCoverage gaps (both global and per-PR differential) trigger automated test-generation tasks; uncovered code paths flagged from production traces
2.3PL2-test-quality Test quality verification — tests are verified to actually catch bugs, not just exercise lines. Mutation testing (Stryker for TS, Muter for Swift) or equivalent mechanism confirms that tests assert behaviour, not merely executionNo mechanism beyond line coveragePR-review checklist includes “do tests assert behaviour, not just execution?”Mutation testing on critical-path modules (payments, domain protocols, state machines, auth flows) with mutation score target (e.g. >75%); run periodically, not per-PR (scheduling scored under PL5-signal-driven-tasks)Surviving mutants auto-create test-improvement tasks; weak-test patterns identified and prevented at template level; mutation score trends tracked over time
2.4PL2-ui-test-coverage UI test coverage on mobile / frontendNoneHappy path onlyCoverage across critical flows, run dailyTest failures auto-create investigation tasks; flaky tests tracked and quarantined; new features auto-generate test stubs from spec
2.5PL2-sast-dast SAST / DAST present — static and dynamic application security testing with agent-actionable findings; findings, suppressions, and rule disables carry accountability (rationale, named reviewer, expiry where applicable). Tool choice is project-dependent (e.g. Aikido, SonarQube for compliance cases); the concern is coverage across both testing classes, not a specific vendorNoneOne tool, partial coverageBoth static and dynamic application security testing in place, tuned, with agent-actionable findings. All finding suppressions and rule disables carry a documented rationale and a named reviewer, stored in version control alongside the code they cover; suppressions of high-severity findings additionally carry an expiry date for mandatory re-review. Suppressions without documentation block mergeFindings triaged by past resolution patterns; recurring vulnerability classes auto-generate prevention tasks. Stale or expired suppressions auto-flagged for review; recurring suppression patterns generate hardening tasks rather than additional waivers; suppression rate tracked and trends toward zero
2.6PL2-secret-hygiene Secret hygiene — Aikido blocks new leaks and historical secrets are rotated / cleaned (max 2)Secrets in repo historyNew leaks blocked, history dirtyNew leaks blocked, history clean, keys rotated
2.7PL2-external-pr-review External PR review — human glance or multi-model review (e.g. Claude Code Reviews $25/PR council-of-experts model)NoneHuman review only, often rushedLayered: agent pre-review + human glanceReview feedback patterns inform agent prompts; reviewer-rejection categories tracked and reduced over time
2.8PL2-taste-validation Qualitative taste validation — humans (not agents) test for taste / UX / defect discovery, not just correctness. Delivered either via a dedicated usability research service (Netizen Experience, UserTesting, Lookback, Maze, PlaybookUX) or via your own user base through canary releases and A/B tests. Catches what automated tests can’t: is this confusing? Does this feel wrong? Would a real user hit this edge case?NoneOccasional usability review or ad-hoc user feedbackContinuous human validation wired into release loop — either via research-service cohort or via canary/A-B rollouts to real users with feedback captureTaste feedback patterns extracted; design-system rules updated; prompts/components evolved to address recurring taste failures
2.9PL2-agent-audit-trail Agent action audit trail — every agent decision is logged with reasoning, retrievable, and reversible at granular level (catches quiet drift — subtle wrong actions humans don’t notice for weeks)No audit trailCommit-level revert onlyDecision-level reasoning captured; full action history queryable and reversiblePatterns extracted from audit log inform future agent prompts; “agents in this codebase tend to over-use approach X” feeds back as guidance. Audit signal also identifies gaps in pre-action gating — actions that slipped past a gate surface here and drive gate refinement
2.10PL2-load-stress-testing Load / stress testing — capability exists and is actually exercised. Least frequent of validation types — typically weekly or on major releasesNever runRun ad-hoc, non-representativeRun on production-mirrored env, scheduledHistorical baselines stored; regressions auto-detected and auto-task; baselines update with legitimate growth


Recipes that advance criteria in this pillar

Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.

PL2-hard-validation-gatesHard validation gates

No recipes yet.

PL2-test-colocation-coverageTest location discipline and coverage

No recipes yet.

PL2-test-qualityTest quality verification

PL2-ui-test-coverageUI test coverage on mobile / frontend

PL2-sast-dastSAST / DAST present

No recipes yet.

PL2-secret-hygieneSecret hygiene

No recipes yet.

PL2-external-pr-reviewExternal PR review

PL2-taste-validationQualitative taste validation

No recipes yet.

PL2-agent-audit-trailAgent action audit trail

PL2-load-stress-testingLoad / stress testing