Skip to content

4. Safe Space

Blast-radius containment, so "going wrong" has bounded cost.

Max: 28 points (PL4-branch-protection and PL4-agent-invokable-rollback are max 2)

Blast-radius containment — if the agent goes wrong, the damage is bounded. The security of the setup is what lets the agent operate freely.

Safety is a composition of mechanisms, not a single gate. Human-in-the-loop, deterministic policy gates, metric-gated progression, capability scoping, dry-run defaults, audit with fast rollback — the agent operates freely when these layers compose.

Ordered as environment isolation → permissions and data boundaries → release safety → cost → memory governance.

#Criterion0123
4.1PL4-environment-isolation Environment isolation — staging/production separation with parity-checked isolation AND on-demand production-mirrored replica for load testingShared or leaky environments; no load-testing isolationSeparate staging/prod but drift-prone; static undersized staging; no on-demand replicaFully isolated staging/prod with parity-checking; on-demand production-mirrored replica for load testing with prod-sized dataReplica spin-up patterns optimised from past runs; staging configurations evolve with production topology; replica lifecycle automated
4.2PL4-least-privilege IAM scoped read-only by default — DB, Kubernetes, AWSAgents run as adminMixed scopesStrict least-privilege; write requires structurally-enforced elevation — platform-gated (IAM policy-as-code + JIT, credential tenancy, GitOps-triggered grants), not procedural (ticketed approval that then executes with unscoped credentials). See GitOps JIT privilege elevation for a known-good shapePermission requests logged; recurring legitimate elevations get scoped permanent grants; unused permissions auto-revoked
4.3PL4-branch-protection Branch protection and source-control write scoping — protected branches are locked against direct push and direct merge by any actor, including agents. All changes to protected branches flow through a PR; agents have unrestricted write access to feature/task branches, but write access to protected branches is structurally impossible, not merely discouraged (max 2)No branch protection; agents (and humans) can push directly to main; the PR review gate (PL2-external-pr-review) is procedural onlyBranch protection enabled on main but inconsistently applied across repos or branches (e.g. develop unprotected); approval requirement exists but bypassable by repo admins without auditAll protected branches locked across every repo: direct push blocked; merge requires at least one human approval OR an audited automated-merge rule; branches must be current with target at merge time (platform “require up to date before merging” OR merge queue that rebases and tests before merge); bypass requires explicit override with audit log. Agents are scoped to feature branches by platform-enforced rules, not convention
4.4PL4-pii-masking PII masking at data-access and telemetry layers — e.g. pg_columnmask for DB; scrubbing / allowlists for logs, metrics, tracesNo maskingApplication-layer only, or DB-only (logs leak PII)Enforced at DB and telemetry layers; agent and logs cannot see raw PIINew PII fields auto-detected and masked; masking rules evolve with data model changes
4.5PL4-prompt-injection-defence Prompt injection defence at ingestion boundary — all external content entering persistent agent context passes through an ingestion sanitization layer before indexing. Scope is durable ingestion paths (memory writes, indexed knowledge, unsupervised scheduled ingestion); interactive turn context in user-supervised sessions is out of scope — blast radius there is contained by Pillar 4 substrate (PL4-least-privilege, PL4-branch-protection). The layer strips, escapes, or sandboxes instruction-shaped text. The same policy is applied consistently across every ingestion surface — PL1-real-world-feedback (real-world feedback loop), PL5-signal-driven-tasks (signal-driven task generation), PL4-memory-safety (memory write-path)No sanitization; untrusted text flows directly into contextAd-hoc sanitization on some surfaces (e.g. PII redaction only); inconsistent between ingestion pathsUnified sanitization layer applied at every ingestion surface; instruction-shaped patterns (role-prompts, system-message mimicry, fake tool calls, jailbreak patterns) stripped, escaped, or sandboxed; policy version-controlledLayer adversarially tested; evasion rate measured; new attack patterns auto-update policy; near-misses from PL1-real-world-feedback / PL5-signal-driven-tasks / PL4-memory-safety feed back into layer refinement
4.6PL4-egress-capability-scoping Egress capability scoping at emission boundary — all outbound communications from unsupervised agent paths (chat posts, webhook calls, email sends, HTTP requests, image-rendering URLs, link-preview fetches) pass through an egress gate before leaving the trust boundary. Scope is application-layer egress from automated / scheduled / unattended agent action; interactive responses in user-supervised sessions are out of scope — symmetric with PL4-prompt-injection-defence’s ingestion-scope narrowing. IAM-level resource writes are covered separately by PL4-least-privilege. Gate enforces destination allowlists per channel, rate limits per destination, elevation gates on novel destinations. Content-based output scanning is defence-in-depth, not primaryNo scoping; unsupervised agent paths can reach arbitrary external destinationsIAM-level write restrictions in place (per PL4-least-privilege) but application-layer outbound surfaces (Slack channels, email, webhooks, HTTP, image / link rendering) not individually scoped per channelPer-destination allowlist per outbound surface; rate limits per destination; novel-destination sends require elevation (PR review for git egress; tenancy + approval gate for chat / email / webhook surfaces via bot-token credential tenancy + GitOps JIT privilege elevation). Image-rendering and link-preview egress vectors explicitly considered as potential exfiltration pathsEgress patterns learned from legitimate traffic; novel-destination attempts auto-flagged; exfiltration-shaped patterns (bursty volume, novel recipient combined with sensitive-content signatures) auto-detected with automated block + human review; allowlist evolves from observed legitimate use
4.7PL4-release-strategy Canary / blue-green / partial release — percentage rollouts with metric-driven promotion, with the agent structurally bounded by platform constraints so it cannot bypass rollout stages or exceed policy-defined parametersBig-bang releasesBlue-green only, or canary with no platform-level constraints on the agentPercentage rollouts with automated metric gating, and the agent is structurally bounded by platform constraints: deployment-tool parameter caps (agent cannot exceed policy-defined rollout percentages); pipeline-defined stages the agent can advance but cannot modify or skip; platform-verified metric gates on stage promotionRollout patterns learn safe vs. risky change types; canary thresholds auto-tune from historical false-positive rates; platform-enforced stage constraints auto-update from observed near-misses
4.8PL4-agent-invokable-rollback Rollback is trivial and agent-invokable (max 2)Manual, scaryDocumented manual procedureOne-command, agent-callable
4.9PL4-cost-governance Operating cost is observable, capped, and attributed — agent inference, CI minutes, log retention, canary spin-up costs are tracked per project / per agent runCost invisible until invoice arrivesAggregate cost visible, no attributionPer-project / per-run cost tracked, capped, alerted on overrun. Cost-prone domains with runaway characteristics (verbose logging, inference tokens, canary spin-up) have domain-specific containment mechanisms in addition to global observability — see recipes under this criterion’s criteria_advanced for known-good shapes (e.g. dynamic debug logging for verbose-logging containment)Cost anomalies auto-investigated; expensive patterns flagged before recurrence; cost-per-feature-type baselined and tracked
4.10PL4-memory-safety Memory safety — hygiene (staleness, contradiction, decay), access control (PII safety, tenant scoping), write-path validation (adversarial-write protection), and retention discipline over the memory substrate (PL3-memory-substrate)No hygiene, no access controls, no write-path validation, no retention discipline; memory may leak PII, cross tenants, persist adversarial content from feedback-loop writes, or accumulate past commitmentsManual cleanup occasionally; application-layer scoping only; no write-path validation; retention ad hocStale items flagged with ownership routing; tenant-scoped retrieval; PII-safe memory contents enforced at substrate layer; write-path sanitization applied at ingestion using the same policy as PL4-prompt-injection-defence. Retention discipline: the primary disposal trigger is relevance decay — items lose their place in memory when they stop being useful for retrieval. Where the project carries time-bound commitments (customer contracts, privacy obligations, regulated data), those define floors and ceilings that operate as backstops: items are retained at least until their floor, disposed no later than their ceiling, with relevance decay as the default trigger in between. The retention policy (relevance definition, time-bound backstops, override events) is documented and testable; disposal events are loggedHygiene runs continuously; contradiction rate measured; memory corpus self-prunes; cross-tenant leak attempts auto-flagged; scope rules evolve with new data classes; write-path evasion tracked and policy auto-updates; retention policy auto-tunes from observed retrieval patterns (relevance thresholds calibrate; backstops update as commitments change)


Recipes that advance criteria in this pillar

Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.

PL4-environment-isolationEnvironment isolation

No recipes yet.

PL4-least-privilegeIAM scoped read-only by default

PL4-branch-protectionBranch protection and source-control write scoping

PL4-prompt-injection-defencePrompt injection defence at ingestion boundary

PL4-egress-capability-scopingEgress capability scoping at emission boundary

No recipes yet.

PL4-release-strategyCanary / blue-green / partial release

PL4-agent-invokable-rollbackRollback is trivial and agent-invokable

No recipes yet.

PL4-cost-governanceOperating cost is observable, capped, and attributed

PL4-memory-safetyMemory safety