4. Safe Space
Blast-radius containment, so "going wrong" has bounded cost.
Max: 28 points (PL4-branch-protection and PL4-agent-invokable-rollback are max 2)
Blast-radius containment — if the agent goes wrong, the damage is bounded. The security of the setup is what lets the agent operate freely.
Safety is a composition of mechanisms, not a single gate. Human-in-the-loop, deterministic policy gates, metric-gated progression, capability scoping, dry-run defaults, audit with fast rollback — the agent operates freely when these layers compose.
Ordered as environment isolation → permissions and data boundaries → release safety → cost → memory governance.
| # | Criterion | 0 | 1 | 2 | 3 |
|---|---|---|---|---|---|
| 4.1 | PL4-environment-isolation Environment isolation — staging/production separation with parity-checked isolation AND on-demand production-mirrored replica for load testing | Shared or leaky environments; no load-testing isolation | Separate staging/prod but drift-prone; static undersized staging; no on-demand replica | Fully isolated staging/prod with parity-checking; on-demand production-mirrored replica for load testing with prod-sized data | Replica spin-up patterns optimised from past runs; staging configurations evolve with production topology; replica lifecycle automated |
| 4.2 | PL4-least-privilege IAM scoped read-only by default — DB, Kubernetes, AWS | Agents run as admin | Mixed scopes | Strict least-privilege; write requires structurally-enforced elevation — platform-gated (IAM policy-as-code + JIT, credential tenancy, GitOps-triggered grants), not procedural (ticketed approval that then executes with unscoped credentials). See GitOps JIT privilege elevation for a known-good shape | Permission requests logged; recurring legitimate elevations get scoped permanent grants; unused permissions auto-revoked |
| 4.3 | PL4-branch-protection Branch protection and source-control write scoping — protected branches are locked against direct push and direct merge by any actor, including agents. All changes to protected branches flow through a PR; agents have unrestricted write access to feature/task branches, but write access to protected branches is structurally impossible, not merely discouraged (max 2) | No branch protection; agents (and humans) can push directly to main; the PR review gate (PL2-external-pr-review) is procedural only | Branch protection enabled on main but inconsistently applied across repos or branches (e.g. develop unprotected); approval requirement exists but bypassable by repo admins without audit | All protected branches locked across every repo: direct push blocked; merge requires at least one human approval OR an audited automated-merge rule; branches must be current with target at merge time (platform “require up to date before merging” OR merge queue that rebases and tests before merge); bypass requires explicit override with audit log. Agents are scoped to feature branches by platform-enforced rules, not convention | — |
| 4.4 | PL4-pii-masking PII masking at data-access and telemetry layers — e.g. pg_columnmask for DB; scrubbing / allowlists for logs, metrics, traces | No masking | Application-layer only, or DB-only (logs leak PII) | Enforced at DB and telemetry layers; agent and logs cannot see raw PII | New PII fields auto-detected and masked; masking rules evolve with data model changes |
| 4.5 | PL4-prompt-injection-defence Prompt injection defence at ingestion boundary — all external content entering persistent agent context passes through an ingestion sanitization layer before indexing. Scope is durable ingestion paths (memory writes, indexed knowledge, unsupervised scheduled ingestion); interactive turn context in user-supervised sessions is out of scope — blast radius there is contained by Pillar 4 substrate (PL4-least-privilege, PL4-branch-protection). The layer strips, escapes, or sandboxes instruction-shaped text. The same policy is applied consistently across every ingestion surface — PL1-real-world-feedback (real-world feedback loop), PL5-signal-driven-tasks (signal-driven task generation), PL4-memory-safety (memory write-path) | No sanitization; untrusted text flows directly into context | Ad-hoc sanitization on some surfaces (e.g. PII redaction only); inconsistent between ingestion paths | Unified sanitization layer applied at every ingestion surface; instruction-shaped patterns (role-prompts, system-message mimicry, fake tool calls, jailbreak patterns) stripped, escaped, or sandboxed; policy version-controlled | Layer adversarially tested; evasion rate measured; new attack patterns auto-update policy; near-misses from PL1-real-world-feedback / PL5-signal-driven-tasks / PL4-memory-safety feed back into layer refinement |
| 4.6 | PL4-egress-capability-scoping Egress capability scoping at emission boundary — all outbound communications from unsupervised agent paths (chat posts, webhook calls, email sends, HTTP requests, image-rendering URLs, link-preview fetches) pass through an egress gate before leaving the trust boundary. Scope is application-layer egress from automated / scheduled / unattended agent action; interactive responses in user-supervised sessions are out of scope — symmetric with PL4-prompt-injection-defence’s ingestion-scope narrowing. IAM-level resource writes are covered separately by PL4-least-privilege. Gate enforces destination allowlists per channel, rate limits per destination, elevation gates on novel destinations. Content-based output scanning is defence-in-depth, not primary | No scoping; unsupervised agent paths can reach arbitrary external destinations | IAM-level write restrictions in place (per PL4-least-privilege) but application-layer outbound surfaces (Slack channels, email, webhooks, HTTP, image / link rendering) not individually scoped per channel | Per-destination allowlist per outbound surface; rate limits per destination; novel-destination sends require elevation (PR review for git egress; tenancy + approval gate for chat / email / webhook surfaces via bot-token credential tenancy + GitOps JIT privilege elevation). Image-rendering and link-preview egress vectors explicitly considered as potential exfiltration paths | Egress patterns learned from legitimate traffic; novel-destination attempts auto-flagged; exfiltration-shaped patterns (bursty volume, novel recipient combined with sensitive-content signatures) auto-detected with automated block + human review; allowlist evolves from observed legitimate use |
| 4.7 | PL4-release-strategy Canary / blue-green / partial release — percentage rollouts with metric-driven promotion, with the agent structurally bounded by platform constraints so it cannot bypass rollout stages or exceed policy-defined parameters | Big-bang releases | Blue-green only, or canary with no platform-level constraints on the agent | Percentage rollouts with automated metric gating, and the agent is structurally bounded by platform constraints: deployment-tool parameter caps (agent cannot exceed policy-defined rollout percentages); pipeline-defined stages the agent can advance but cannot modify or skip; platform-verified metric gates on stage promotion | Rollout patterns learn safe vs. risky change types; canary thresholds auto-tune from historical false-positive rates; platform-enforced stage constraints auto-update from observed near-misses |
| 4.8 | PL4-agent-invokable-rollback Rollback is trivial and agent-invokable (max 2) | Manual, scary | Documented manual procedure | One-command, agent-callable | — |
| 4.9 | PL4-cost-governance Operating cost is observable, capped, and attributed — agent inference, CI minutes, log retention, canary spin-up costs are tracked per project / per agent run | Cost invisible until invoice arrives | Aggregate cost visible, no attribution | Per-project / per-run cost tracked, capped, alerted on overrun. Cost-prone domains with runaway characteristics (verbose logging, inference tokens, canary spin-up) have domain-specific containment mechanisms in addition to global observability — see recipes under this criterion’s criteria_advanced for known-good shapes (e.g. dynamic debug logging for verbose-logging containment) | Cost anomalies auto-investigated; expensive patterns flagged before recurrence; cost-per-feature-type baselined and tracked |
| 4.10 | PL4-memory-safety Memory safety — hygiene (staleness, contradiction, decay), access control (PII safety, tenant scoping), write-path validation (adversarial-write protection), and retention discipline over the memory substrate (PL3-memory-substrate) | No hygiene, no access controls, no write-path validation, no retention discipline; memory may leak PII, cross tenants, persist adversarial content from feedback-loop writes, or accumulate past commitments | Manual cleanup occasionally; application-layer scoping only; no write-path validation; retention ad hoc | Stale items flagged with ownership routing; tenant-scoped retrieval; PII-safe memory contents enforced at substrate layer; write-path sanitization applied at ingestion using the same policy as PL4-prompt-injection-defence. Retention discipline: the primary disposal trigger is relevance decay — items lose their place in memory when they stop being useful for retrieval. Where the project carries time-bound commitments (customer contracts, privacy obligations, regulated data), those define floors and ceilings that operate as backstops: items are retained at least until their floor, disposed no later than their ceiling, with relevance decay as the default trigger in between. The retention policy (relevance definition, time-bound backstops, override events) is documented and testable; disposal events are logged | Hygiene runs continuously; contradiction rate measured; memory corpus self-prunes; cross-tenant leak attempts auto-flagged; scope rules evolve with new data classes; write-path evasion tracked and policy auto-updates; retention policy auto-tunes from observed retrieval patterns (relevance thresholds calibrate; backstops update as commitments change) |
Recipes that advance criteria in this pillar
Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.
PL4-environment-isolation — Environment isolation
No recipes yet.
PL4-least-privilege — IAM scoped read-only by default
PL4-branch-protection — Branch protection and source-control write scoping
PL4-prompt-injection-defence — Prompt injection defence at ingestion boundary
PL4-egress-capability-scoping — Egress capability scoping at emission boundary
No recipes yet.
PL4-release-strategy — Canary / blue-green / partial release
PL4-agent-invokable-rollback — Rollback is trivial and agent-invokable
No recipes yet.
PL4-cost-governance — Operating cost is observable, capped, and attributed
PL4-memory-safety — Memory safety