3. Actions
The agent’s ability to act externally in the real world.
Max: 30 points
The agent’s ability to act in the real world — query data, investigate, retrieve, change code, ship, communicate.
Ordered as read → write → meta: foundational read capabilities first, then write actions, then the skill library as a whole.
| # | Criterion | 0 | 1 | 2 | 3 |
|---|---|---|---|---|---|
| 3.1 | PL3-structured-state-read Structured state read access — agent has read access to the project’s structured state stores: application database, infrastructure-as-code state (Terraform/Pulumi/Cloudflare config), and equivalent backends. Scores capability and usability; PII masking and IAM scoping are scored under PL4-least-privilege / PL4-pii-masking | Agent has no read access | Read access to app DB only; infra state remains human-only | Read-only across app DB and infra state, queryable via MCP or equivalent | Query patterns logged; common queries become saved views; unusual queries flagged for review |
| 3.2 | PL3-emission-quality Emission quality — code produces structured, correlated, breadcrumb-style signal. Silent handlers are a bug, not a feature. Correlation identifiers reference production entities via pseudonymous tokens (user-ID, session-ID, request-ID), not PII-derived values (email, phone, name) — logs are an engineering surface, and PL4-pii-masking applies | Sparse or unstructured logs; silent handlers leave the agent blind to what happened | Logs exist but inconsistent structure, missing correlation IDs | Structured logs with correlation IDs, state transitions, and decision breadcrumbs throughout. Correlation IDs are pseudonymous; raw PII is not used as a correlation identifier, and log payloads are scrubbed of PII at the emission boundary per PL4-pii-masking | Emission gaps detected from incident postmortems auto-generate logging-improvement tasks; emission quality measurably improves over time |
| 3.3 | PL3-agent-queryability Agent queryability — agent can investigate via MCP / API across logs, metrics, traces — not just humans via dashboards | Agent cannot query telemetry | Human dashboards only; agent access limited or read-heavy | Agent directly queries logs, metrics, and traces via MCP or equivalent | Query patterns inform pre-built investigation playbooks; recurring investigations get saved as reusable agent skills |
| 3.4 | PL3-memory-substrate Memory substrate exists — agent has a unified memory tool (markdown + vector DB + structured DB + event log, MCP-exposed) covering decisions, postmortems, customer-context references (pseudonymous; raw PII does not enter the memory substrate), performance baselines | No memory infrastructure | Some substrate exists (e.g. docs only); no unified retrieval | Hybrid substrate in place; agent can retrieve across types via single MCP interface | Memory utilisation rate is measured; retrieval relevance is tracked; substrate evolves based on what agents actually retrieve |
| 3.5 | PL3-source-control Source control interaction — agent works the git platform end-to-end: branches, opens PRs, resolves merge conflicts, tags releases, and reads PR history, review comments, commit metadata | Human manages all git operations | Agent can commit and push; PRs opened manually; no metadata read | Agent manages full branch lifecycle and queries PR history, review patterns, commit blame via MCP or API | Conflict-resolution patterns improve from outcomes; merge-failure causes auto-classified; branching strategies and review responses evolve from project patterns |
| 3.6 | PL3-domain-action-skills Domain-specific action skills — skills that let the agent complete real, end-to-end domain flows (e.g. virtual EV chargers and disposable payment cards for charging workflows; test tenants for multi-tenant SaaS; sandbox accounts for integrations). Examples are per-project; what matters is that critical domain flows don’t require a human in the loop | Critical external actions require human | Partial (some actions automated) | Agent can complete real end-to-end flows | Action skills track success/failure rates; failing flows auto-generate fix tasks; new domain actions extracted from observed manual workflows |
| 3.7 | PL3-deployment-cicd Deployment and CI/CD interaction — agent triggers CI/deploys end-to-end (Fastlane, TestFlight, staging) and reads results: test outcomes, build logs, historical run data, coverage and mutation reports | Human-only | Agent can trigger some steps; results remain human-only; or the deploy target is configured only via vendor dashboards (SaaS project creation, DNS, domain binding, CDN / WAF / edge config), so that even a fully-agent-driven trigger still rides on out-of-band state | Agent ships to staging end-to-end and queries CI results and run history via MCP or equivalent. The deploy target itself — cloud project, DNS, TLS, CDN / proxy, WAF, edge config — is declared in-repo as IaC; dashboard-configured infra is level-1 regardless of trigger capability | Deployment outcomes tracked; failed deploys auto-classified by cause; deploy and test patterns optimised over time |
| 3.8 | PL3-browser-web Browser / web interaction — agent can interact with web UIs: navigating dashboards, filling forms, verifying deployed staging visually. Browser actions are deterministic (reproducible across runs), inspectable (humans can read what the agent will do before it runs), and version-controlled (automation artefacts live with code) | No browser capability | Browser interaction present but fails at least one of the three architectural properties — non-deterministic (e.g. runtime AI DOM parsing via Browseruse / Stagehand, where each run reinterprets the page and outcomes vary), non-inspectable (opaque agent behaviour at runtime — you can’t read what the agent will do without running it), or not version-controlled — automation cannot be audited before it runs | All three architectural properties satisfied; critical flows reproducible across runs; read-only / dry-run modes available for investigation without side effects. See deterministic browser automation for the known-good pattern | Stale automation auto-detected from flow failures; agent regenerates automation artefacts from recorded actions; automation library compounds across flows and projects |
| 3.9 | PL3-communication Communication actions — agent can notify and present results via external channels: Slack messages, Linear comments, email summaries, docs-portal publishing. Outbound passes through a structural safety layer | Agent cannot communicate outside the terminal | Can comment on PRs only | Agent communicates via multiple channels (Slack, Linear, email, docs portal) appropriate to audience. Outbound communication passes through at least one structural safety layer: allowlist of recipients and channels, no-PII content filter at send boundary, dry-run-by-default with explicit send confirmation, rate limiting, or human approval for sensitive categories | Communication patterns tracked; message quality rated by recipients; templates evolve from feedback. Safety-layer gaps detected from near-misses feed back into layer refinement |
| 3.10 | PL3-skill-library-health Skill library health — beyond individual skills (3.1–3.9), the project’s skill library as a whole is well-curated: inventoried, documented, tested, versioned, with coverage mapped against the domain surface. Distinct from 5.10 which scores portfolio-level reuse — this scores this project’s skill infrastructure | Skills exist ad-hoc; no inventory, no documentation, no testing; unclear what the agent can and can’t do | Partial inventory; some skills documented but coverage gaps invisible; no systematic way to identify missing skills | Library is inventoried, documented, tested, versioned; coverage map against domain surface maintained; gaps visible and prioritised | New skills auto-extracted from recurring agent-invoked manual workflows; low-usage skills flagged for deprecation; per-skill quality metrics (success rate, latency, cost) tracked; library evolves with the domain |
Recipes that advance criteria in this pillar
Each recipe is an abstract pattern that moves one or more criteria. Recipes that advance the same criterion are grouped together. A criterion with no listed recipe is a gap the canon has not yet named a known-good pattern for.
PL3-structured-state-read — Structured state read access
No recipes yet.
PL3-emission-quality — Emission quality
No recipes yet.
PL3-agent-queryability — Agent queryability
No recipes yet.
PL3-memory-substrate — Memory substrate exists
No recipes yet.
PL3-source-control — Source control interaction
PL3-domain-action-skills — Domain-specific action skills
No recipes yet.
PL3-deployment-cicd — Deployment and CI/CD interaction
No recipes yet.
PL3-browser-web — Browser / web interaction
PL3-communication — Communication actions
PL3-skill-library-health — Skill library health