Skip to content

3. Actions

The agent’s ability to act externally in the real world.

Max: 30 points

The agent’s ability to act in the real world — query data, investigate, retrieve, change code, ship, communicate.

Ordered as read → write → meta: foundational read capabilities first, then write actions, then the skill library as a whole.

#Criterion0123
3.1PL3-structured-state-read Structured state read access — agent has read access to the project’s structured state stores: application database, infrastructure-as-code state (Terraform/Pulumi/Cloudflare config), and equivalent backends. Scores capability and usability; PII masking and IAM scoping are scored under PL4-least-privilege / PL4-pii-maskingAgent has no read accessRead access to app DB only; infra state remains human-onlyRead-only across app DB and infra state, queryable via MCP or equivalentQuery patterns logged; common queries become saved views; unusual queries flagged for review
3.2PL3-emission-quality Emission quality — code produces structured, correlated, breadcrumb-style signal. Silent handlers are a bug, not a feature. Correlation identifiers reference production entities via pseudonymous tokens (user-ID, session-ID, request-ID), not PII-derived values (email, phone, name) — logs are an engineering surface, and PL4-pii-masking appliesSparse or unstructured logs; silent handlers leave the agent blind to what happenedLogs exist but inconsistent structure, missing correlation IDsStructured logs with correlation IDs, state transitions, and decision breadcrumbs throughout. Correlation IDs are pseudonymous; raw PII is not used as a correlation identifier, and log payloads are scrubbed of PII at the emission boundary per PL4-pii-maskingEmission gaps detected from incident postmortems auto-generate logging-improvement tasks; emission quality measurably improves over time
3.3PL3-agent-queryability Agent queryability — agent can investigate via MCP / API across logs, metrics, traces — not just humans via dashboardsAgent cannot query telemetryHuman dashboards only; agent access limited or read-heavyAgent directly queries logs, metrics, and traces via MCP or equivalentQuery patterns inform pre-built investigation playbooks; recurring investigations get saved as reusable agent skills
3.4PL3-memory-substrate Memory substrate exists — agent has a unified memory tool (markdown + vector DB + structured DB + event log, MCP-exposed) covering decisions, postmortems, customer-context references (pseudonymous; raw PII does not enter the memory substrate), performance baselinesNo memory infrastructureSome substrate exists (e.g. docs only); no unified retrievalHybrid substrate in place; agent can retrieve across types via single MCP interfaceMemory utilisation rate is measured; retrieval relevance is tracked; substrate evolves based on what agents actually retrieve
3.5PL3-source-control Source control interaction — agent works the git platform end-to-end: branches, opens PRs, resolves merge conflicts, tags releases, and reads PR history, review comments, commit metadataHuman manages all git operationsAgent can commit and push; PRs opened manually; no metadata readAgent manages full branch lifecycle and queries PR history, review patterns, commit blame via MCP or APIConflict-resolution patterns improve from outcomes; merge-failure causes auto-classified; branching strategies and review responses evolve from project patterns
3.6PL3-domain-action-skills Domain-specific action skills — skills that let the agent complete real, end-to-end domain flows (e.g. virtual EV chargers and disposable payment cards for charging workflows; test tenants for multi-tenant SaaS; sandbox accounts for integrations). Examples are per-project; what matters is that critical domain flows don’t require a human in the loopCritical external actions require humanPartial (some actions automated)Agent can complete real end-to-end flowsAction skills track success/failure rates; failing flows auto-generate fix tasks; new domain actions extracted from observed manual workflows
3.7PL3-deployment-cicd Deployment and CI/CD interaction — agent triggers CI/deploys end-to-end (Fastlane, TestFlight, staging) and reads results: test outcomes, build logs, historical run data, coverage and mutation reportsHuman-onlyAgent can trigger some steps; results remain human-only; or the deploy target is configured only via vendor dashboards (SaaS project creation, DNS, domain binding, CDN / WAF / edge config), so that even a fully-agent-driven trigger still rides on out-of-band stateAgent ships to staging end-to-end and queries CI results and run history via MCP or equivalent. The deploy target itself — cloud project, DNS, TLS, CDN / proxy, WAF, edge config — is declared in-repo as IaC; dashboard-configured infra is level-1 regardless of trigger capabilityDeployment outcomes tracked; failed deploys auto-classified by cause; deploy and test patterns optimised over time
3.8PL3-browser-web Browser / web interaction — agent can interact with web UIs: navigating dashboards, filling forms, verifying deployed staging visually. Browser actions are deterministic (reproducible across runs), inspectable (humans can read what the agent will do before it runs), and version-controlled (automation artefacts live with code)No browser capabilityBrowser interaction present but fails at least one of the three architectural properties — non-deterministic (e.g. runtime AI DOM parsing via Browseruse / Stagehand, where each run reinterprets the page and outcomes vary), non-inspectable (opaque agent behaviour at runtime — you can’t read what the agent will do without running it), or not version-controlled — automation cannot be audited before it runsAll three architectural properties satisfied; critical flows reproducible across runs; read-only / dry-run modes available for investigation without side effects. See deterministic browser automation for the known-good patternStale automation auto-detected from flow failures; agent regenerates automation artefacts from recorded actions; automation library compounds across flows and projects
3.9PL3-communication Communication actions — agent can notify and present results via external channels: Slack messages, Linear comments, email summaries, docs-portal publishing. Outbound passes through a structural safety layerAgent cannot communicate outside the terminalCan comment on PRs onlyAgent communicates via multiple channels (Slack, Linear, email, docs portal) appropriate to audience. Outbound communication passes through at least one structural safety layer: allowlist of recipients and channels, no-PII content filter at send boundary, dry-run-by-default with explicit send confirmation, rate limiting, or human approval for sensitive categoriesCommunication patterns tracked; message quality rated by recipients; templates evolve from feedback. Safety-layer gaps detected from near-misses feed back into layer refinement
3.10PL3-skill-library-health Skill library health — beyond individual skills (3.1–3.9), the project’s skill library as a whole is well-curated: inventoried, documented, tested, versioned, with coverage mapped against the domain surface. Distinct from 5.10 which scores portfolio-level reuse — this scores this project’s skill infrastructureSkills exist ad-hoc; no inventory, no documentation, no testing; unclear what the agent can and can’t doPartial inventory; some skills documented but coverage gaps invisible; no systematic way to identify missing skillsLibrary is inventoried, documented, tested, versioned; coverage map against domain surface maintained; gaps visible and prioritisedNew skills auto-extracted from recurring agent-invoked manual workflows; low-usage skills flagged for deprecation; per-skill quality metrics (success rate, latency, cost) tracked; library evolves with the domain


Recipes that advance criteria in this pillar

Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.

PL3-structured-state-readStructured state read access

No recipes yet.

PL3-emission-qualityEmission quality

No recipes yet.

PL3-agent-queryabilityAgent queryability

No recipes yet.

PL3-memory-substrateMemory substrate exists

No recipes yet.

PL3-source-controlSource control interaction

PL3-domain-action-skillsDomain-specific action skills

No recipes yet.

PL3-deployment-cicdDeployment and CI/CD interaction

No recipes yet.

PL3-browser-webBrowser / web interaction

PL3-communicationCommunication actions

PL3-skill-library-healthSkill library health