Security Model

Trust boundary

┌─────────────────────────────────────────────────────┐
│  Untrusted zone                                     │
│                                                     │
│  Python adapter  ─── POST /v1/tool ──►  Go runtime │ ◄─ Trusted zone
│  LangChain tool                                     │
│  Any HTTP client                         │          │
└─────────────────────────────────────────│──────────┘
                                          │
                              policy evaluation (OPA)
                                          │
                                    Allow │ Deny
                                          │
                              tool execution (allow only)

The runtime is the sole policy authority. Adapters submit tool call intents via POST /v1/tool. The runtime evaluates policy and either executes the tool (allow) or returns HTTP 403 (deny). No adapter code path can flip deny to allow.

Fail-closed: if the policy evaluator returns an error for any reason, the runtime defaults to Deny. An unavailable or corrupt policy bundle produces deny-all behavior, not allow-all.


Runtime API contract

POST /v1/tool
Content-Type: application/json

{"kind":"<tool-kind>","params":{...}}

HTTP status

Meaning

200

Decision = allow; tool executed; response contains output

403

Decision = deny; tool never ran; response contains reason

400

Allow decision but tool execution failed (invalid params, endpoint not allowlisted)

405

Non-POST request

The decision field is always present in the response body. The policy_ref field carries the version of the active bundle that made the decision.


Supply-chain verification order

autonomy verify applies four checks in sequence. All must pass; the command is fail-closed.

Step 1 — Signatures
  cosign verify image
  cosign verify <tag>-lock     (if --require-lock)
  cosign verify <tag>-policy   (if --require-policy)

Step 2 — OCI digest integrity
  Resolve live manifest digest
  Compare against agent_artifact.digest in attached lock file

Step 3 — Behavioral fingerprint
  Recompute BLAKE3 fingerprint of lock file
  Compare against behavioral_fingerprint field

Step 4 — Semver consistency
  Parse version tag from policy_bundle.ref
  Compare major.minor against bundle manifest.json version field

Each step assumes the previous step passed. Skipping --require-lock or --require-policy removes those cosign checks but does not skip the OCI digest and fingerprint checks for whichever artifacts are present.


Signing implementation

AutonomyOps uses cosign as an external CLI binary (Option A — subprocess) rather than the cosign Go SDK.

Why CLI and not the Go SDK?

The cosign Go SDK pulls in hundreds of transitive dependencies: sigstore, TUF, OIDC, Rekor, and their chains. Keeping the cosign binary as an external dependency gives the same functionality while keeping the Go binary lean and the dependency graph auditable.

Where the CLI is invoked:

Component

Invocation

Purpose

oci/sign/cosign.go runCosign()

subprocess

Shared cosign wrapper for sign + verify

oci/sign/sign.go

via runCosign

cosign sign with annotations

oci/sign/verify.go

via runCosign

cosign verify --output=json (4-step pipeline)

oci/verify.go VerifyBundle()

subprocess

Single-artifact signature check

policy/verifier.go Verify()

subprocess

Policy bundle signature pre-flight

No Go SDK import: github.com/sigstore/cosign/v2 is not imported anywhere.

Prerequisites: cosign must be in PATH for autonomy sign, autonomy verify, and autonomy policy fetch to function. Runtime policy evaluation does not require cosign.

Install cosign: https://github.com/sigstore/cosign/releases

Distinct error types for each failure mode:

Sentinel

Step

Cause

ErrNotSigned

1

No cosign signature in registry

ErrDigestMismatch

2

SHA-256 OCI digest ≠ lock record

ErrFingerprintMismatch

3

BLAKE3 behavioral fingerprint ≠ stored value

ErrSemverIncompat

4

policy_bundle_version major.minor mismatch

ErrTimestampMissing

1

autonomy.signed-at annotation absent

ErrTimestampExpired

1

autonomy.signed-at older than --max-age

ErrCosignNotFound

any

cosign binary not in PATH


Key management

Demo keys live in demo/keys/. They are not suitable for production.

Regenerate (the bash demo/keys/generate.sh script is shipped both in the repo and in the extracted quickstart bundle, so the command is identical for both audiences):

In-repo:

bash demo/keys/generate.sh

Installed:

cd ~/.autonomyops/quickstart
bash demo/keys/generate.sh

The cosign.key file is PKCS8 PEM, encrypted with COSIGN_PASSWORD. Set the environment variable before signing or verifying:

In-repo:

COSIGN_PASSWORD=<secret> autonomy sign \
  --image localhost:5000/agent:v1 \
  --key demo/keys/cosign.key \
  --lock --policy

Installed:

cd ~/.autonomyops/quickstart
COSIGN_PASSWORD=<secret> autonomy sign \
  --image localhost:5000/agent:v1 \
  --key demo/keys/cosign.key \
  --lock --policy

Timestamp annotation

By default (AUTONOMY_TRUST_TIME=true), every signature carries an autonomy.signed-at annotation (RFC3339 UTC). autonomy verify rejects signatures older than --max-age (default: 8760h / 1 year).

Disable timestamp enforcement for air-gapped environments where clocks are unreliable:

AUTONOMY_TRUST_TIME=false autonomy verify \
  --image <ref> --pub <key>

Weakened: with AUTONOMY_TRUST_TIME=false, a compromised private key can produce signatures with no temporal bound. Use only with an out-of-band key rotation process.


Air-gapped operation

All runtime operations work without external network access.

  1. Build the binary on an internet-connected machine, copy to the air-gapped host.

  2. Push images and policy bundles to a private registry before the gap.

  3. autonomy policy fetch pulls a cached bundle from the private registry.

  4. autonomy verify requires the cosign public key on disk and a local registry; no Sigstore transparency log or certificate authority is contacted when using key-based signing.

The capability probe (autonomy oci probe) and attachment operations contact only the registry specified in --image. There are no callbacks to external services.


Policy evaluation

Runtime policy evaluation uses OPA/Rego from the loaded policy bundle. The query is data.autonomy.allow, with input containing action kind and params.

Decision behavior is fail-closed:

  • If OPA returns allow == true, runtime allows the action.

  • If evaluation errors, returns no result, or returns false, runtime denies.

This preserves the same trust boundary: adapters submit requests, runtime remains the sole policy authority.


Adapter trust level

Adapters — Python RuntimeClient, LangChain RuntimeTool, any HTTP client — are untrusted. They:

  • Can submit tool call requests.

  • Cannot override policy decisions.

  • Cannot execute tools directly; they receive only the runtime’s output.

  • Cannot suppress PolicyDeniedError; the exception propagates unconditionally.

The Python adapter enforces this at the type level: call_tool() returns ToolResult on allow and raises PolicyDeniedError on deny. There is no return path that converts a deny into a non-exceptional result.


Telemetry and audit

Every policy decision emits an autonomy.decision event to the WAL with:

  • tool — the action kind

  • outcomeallow or deny

  • reason — human-readable explanation from the evaluator

  • policy_ref — the bundle version that made the decision

The WAL is append-only and fsynced on each write. Events cannot be deleted from the WAL; telemetry drain deletes only from the SQLite priority buffer, not from the source WAL. telemetry drain reads from the consumer cursor (telemetry.pos) and advances that cursor only after successful delivery, providing at-least-once semantics. Receivers should deduplicate by stable event identity (event_id).

To forward events to an OTLP collector:

autonomy telemetry drain --endpoint http://collector:4318

Error events drain first (PriorityHigh), decisions and actions second (PriorityNormal), lifecycle events last (PriorityLow).


Layered governance — the dual-emit audit story

A single tool call can be evaluated by more than one governance layer. Today the runtime composes three:

  1. Policy layer — Rego evaluation against the active signed bundle (see Policy evaluation). Returns allow or deny.

  2. Runtime-enforcement layer — the runtime allowlist for tool.http_get. Runs after the policy layer has allowed, before the tool executes. May still deny — e.g. when the endpoint key is not in ServerOptions.AllowedDomains.

  3. Attestation gate — enrollment + provenance + execution-window sub-checks (see Attestation gate). Runs in the same post-policy slot as the runtime allowlist. Default off; flipping AUTONOMY_ATTESTATION_MODE to advisory or enforce activates it.

Each layer that ends a request — by allowing through to the next layer, or by denying — emits an autonomy.decision WAL frame. Every frame under one request carries the same audit_id. When the policy layer allows and a later layer denies, the WAL records both frames: the policy-layer allow, then the runtime-layer or attestation-layer deny. This is the layered-governance audit story — the same shape PR #706 introduced for the runtime allowlist and PR #725 extended for the attestation gate.

Why both frames are recorded

A single deny frame would lose the operator’s most important question: which layer blocked, and was an earlier layer about to allow? The dual-emit pattern makes that reconstruction trivial — group WAL entries by audit_id, sort by seq, and the chain of decisions is the sequence of frames. The GET /v1/audit/{audit_id} endpoint returns the last frame (the canonical final-outcome answer) for the common single-row lookup; for chain reconstruction, consumers grep the WAL or the event stream by audit_id.

Reason-prefix shape — <layer>: <subreason>

Each post-policy enforcement layer stamps a stable, lowercase layer prefix onto its deny frame’s reason field. Plain policy-layer denies — produced by the Rego evaluator itself against the operator-authored bundle — surface the evaluator’s text without a prefix. The prefix-vs-no-prefix distinction is the wire-visible way to tell which layer ended the request:

Layer

reason prefix shape

Reason source

Policy (Rego)

none — raw evaluator text

Rego data.autonomy.allow result + operator-authored reason

Runtime allowlist

runtime allowlist: <endpoint>

ReasonRuntimeAllowlist constant + dynamic tail

Attestation gate

attestation: <subcheck>

One of 6 attestation: * exported constants

The asymmetry is intentional, not an oversight: the policy layer is the base evaluation that every other layer composes on top of, so a deny with no prefix unambiguously identifies “the operator’s Rego policy said no” — distinct from “a later enforcement layer disagreed with the policy allow.” A future change to stamp policy: <…> would be a wire-breaking rename for every consumer; the current shape is deliberate.

For the post-policy layers that do stamp a prefix, the shape is one of two forms:

Form

Example

When used

Full

attestation: enrollment_mismatch

Subreason enumerates a fixed sub-check

Prefix-only

runtime allowlist: api.example.com:443

Subreason is dynamic, assembled at the emit site

The prefix-only constants stop at the colon; the call site appends a space and the dynamic tail (an endpoint, a token name, etc.). The full form has the space and subreason baked into the constant.

Layer prefixes are exported Reason* string constants — one per package, co-located with the layer’s sentinel error:

Layer

Package

Constant(s)

Runtime allowlist

runtime/tools.go

ReasonRuntimeAllowlist = "runtime allowlist:"

Attestation gate

runtime/attestation/types.go

ReasonEnrollmentMismatch, ReasonEnrollmentRevoked, ReasonRolloutStateInvalid, ReasonWindowExpired, ReasonWindowNotYetActive, ReasonSourceUnavailable (all attestation: )

Operator grep patterns, WAL consumers, CLI deny renderings, and SIEM filters all key on these constants. Renaming one is a breaking change for every consumer; per-constant stability is pinned by TestReasonRuntimeAllowlist_Stable (runtime/tools_test.go) and TestReasonPrefixes_Stable (runtime/attestation/types_test.go).

Shape is CI-enforced

runtime/reasons_shape_test.go (added in #712 PR-7122) walks every non-test Go file under runtime/ and validates that every exported Reason* constant matches the shape regex ^[a-z]+( [a-z]+)*:( [a-z][a-z0-9_]*)?$. A new runtime-enforcement layer that ships a constant outside the shape fails the build with a precise file:line:name = "value" diagnostic — the audit-trail vocabulary cannot fragment unannounced.

The companion TestReasonConstants_UnresolvedDetection pins the loud-failure behavior for Reason* constants whose initializer the AST walker cannot statically resolve (aliases, concat expressions, function calls): rather than silently skipping them, the test fails with an “inline the literal” remediation. The contract is every Reason* constant must be statically auditable at parse time.

Operator grep recipes

To filter the WAL by layer prefix (any host with autonomy wal inspect available):

# All denies (any layer), most-recent 1h:
autonomy wal inspect --kind autonomy.decision --since 1h --json \
  | jq -c 'select(.event.attrs.outcome == "deny")'

# Runtime allowlist denies only:
autonomy wal inspect --kind autonomy.decision --json \
  | jq -c 'select(.event.attrs.reason | startswith("runtime allowlist:"))'

# Attestation denies only:
autonomy wal inspect --kind autonomy.decision --json \
  | jq -c 'select(.event.attrs.reason | startswith("attestation:"))'

# Reconstruct the layered-governance chain for one audit_id
# (every frame written under that id, ordered by sequence):
autonomy wal inspect --kind autonomy.decision --json \
  | jq -c 'select(.event.attrs.audit_id == "<id>")'

See docs/runbooks/12-attestation-rollout.md for the operator-facing diagnostic procedure that wraps these recipes.


Attestation join key — enrollment_ref

EnrolledNode.enrollment_ref is the canonical join key that binds a bundle to the set of nodes permitted to execute it. The substrate is in place today (orchestrator stores it, the CLI sets it at enrollment); the runtime gate that consults it lands as PR6 of #725’s substrate-prerequisite slate.

Contract

  • An operator binds the key once, at initial enrollment, on each node:

    autonomy node enroll \
        --node-id robot-arm-007 \
        --enrollment-ref deploy:fleet-alpha:v1.2.3
    
  • A bundle author declares the matching key on the bundle’s signed provenance block (bundle.Manifest.Provenance.EnrollmentRef).

  • The attestation gate (PR6 of #725) performs an exact-match check between the two values at every POST /v1/tool evaluation. A mismatch (or a node with no enrollment record at all) denies the action under AUTONOMY_ATTESTATION_MODE=enforce.

The match is intentionally a literal string compare; there is no parsing, no semver matching, no glob. Operators control granularity by how they shape the ref — a per-fleet ref (deploy:fleet-alpha:v1) permits broad re-use; a per-robot ref (deploy:fleet-alpha:robot-007) locks a bundle to a single node.

Idempotency — set once at enrollment

The enrollment_ref is captured on initial enrollment only. Because autonomy node enroll is idempotent (re-enrolling an existing node_id returns the original record unchanged), re-running the command with a different --enrollment-ref does not rotate the binding. The orchestrator returns the original record with created=false and the original ref preserved.

Mutating an enrollment_ref post-enrollment (rotating it, revoking it) requires a separate operator action with audit-log + role-gating semantics — a PUT /v1/enrollment/{node_id} endpoint that does not exist today. The omission is deliberate: silently allowing in-place rotation via re-enroll would mean any operator with node enroll permission could move a node onto an arbitrary bundle’s allowlist, sidestepping the audit trail that explicit re-enrollment provides.

Why not the labels map?

EnrolledNode.Labels is a string→string bag for arbitrary operator metadata (env, tier, region, …). It is the wrong shape for the attestation join — it is searchable but not unique, has no schema, and silently drifts when operators retag for unrelated reasons. The attestation join needs a typed, single-valued field that the gate can read without prefix conventions and that audit consumers can query without scanning a JSON blob. enrollment_ref is that field.

Node identity binding

The attestation gate joins two values per decision: bundle.Manifest.Provenance.EnrollmentRef (from the signed bundle) and the local node’s enrollment record (looked up by node_id). The first half is the bundle’s claim; the second half is the runtime’s identity. This section is about the second half: how the runtime decides which node_id to present to the orchestrator.

Precedence

The runtime/identity package (identity.Resolve) is the single source of truth. Highest-precedence source wins:

  1. AUTONOMY_NODE_ID environment variable

  2. identity.node_id from the unified config (Config.Identity.NodeID)

  3. unset → ErrNodeIDUnset

Both inputs are trimmed of leading/trailing whitespace before the precedence check, so a Compose env block that resolves to AUTONOMY_NODE_ID=\n falls through to the config value rather than resolving to an invisible newline. The returned string is also trimmed so a padded value can’t key a different row than the operator-typed enrollment value.

Why no machine-id or hostname fallback

The runtime deliberately does not fall back to /etc/machine-id, the kernel hostname, or any other auto-derived value. The node_id is the join key into enrolled_nodes, and it must match what the operator typed into autonomy node enroll --node-id <id>. Silently defaulting to a machine-derived value would mean:

  • two distinct nodes cloned from the same disk image would pick the same identity and collide on the enrollment row;

  • a node enrolled before a hostname change would silently re-enroll under a new identity after the change, leaving the original row orphaned in the allowlist;

  • the operator-visible failure mode (a missing enrollment record → startup error) would be replaced by a silent successful start under an identity nobody enrolled.

Failing closed via ErrNodeIDUnset is the safer default. Operators who want machine-derived identity can opt in explicitly by populating the config or env from /etc/machine-id in their provisioning step — but the runtime never makes that choice on its behalf.

Path-safety contract

The node_id flows into orchestrator path parameters (GET /v1/enrollment/{node_id}, GET /v1/rollout/node/{node_id}). A value containing /, ?, or # becomes unaddressable. Whitespace is also rejected — not because URL routing breaks on it, but because runtime/identity.Resolve TrimSpace’s its output for env-var resilience (a Compose env block that resolves to AUTONOMY_NODE_ID=\n falls through to the config rather than producing an invisible value). If config validation accepted a padded value, the configured ID would resolve to a different string than the one persisted on enrollment, silently breaking the attestation-gate lookup. The two rules are enforced by the same nodeid.Validate:

Reject

Sentinel

empty string

ErrEmpty

any Unicode whitespace

ErrWhitespace

any character in /?#

ErrForbiddenChar

Three layers consume that contract:

Layer

When checked

Failure mode

Config

Config.Validate() at startup

startup error before bind

Orchestrator API

request handler at the HTTP boundary

400 Bad Request

Storage

store.Register* / store.Upsert*

error returned to handler

Any value that survives validation at the config layer round-trips byte-identical through Resolve — that invariant is the foundation of the attestation-gate lookup and is pinned by a dedicated test.

runtime/identity.Resolve intentionally does not call nodeid.Validate on its return value — callers know whether they’re about to use the value as a path parameter (attestation, enrollment lookup) or as a free-form telemetry label, and forcing the check at resolve-time would either obscure errors that callers care about or surface errors they don’t. Callers that DO need a path-safe value should chain the call: id, _ := identity.Resolve(...); err := nodeid.Validate(id). This matters most for the env-var path — the config layer has already validated; an env override has not.

Operational guidance

  • Set identity.node_id in the config file for the persistent identity an operator is willing to commit to version control.

  • Use AUTONOMY_NODE_ID for one-off overrides (a debug shell, a temporary canary on a host where the config can’t be edited).

  • Match what you enrolled. The enrollment CLI is the source of truth: whatever string went into autonomy node enroll --node-id is what identity.Resolve must return. Validate any new value at the config layer first — a typo containing /, ?, or # will fail config validation at startup instead of landing as a silent deny in AUTONOMY_ATTESTATION_MODE=enforce later.

Cache freshness — polling vs subscription

The runtime cache that the attestation gate consults needs to know when the orchestrator’s authoritative records (enrollment, per-node rollout state) change. Two paths are supported and produce identical decoded values:

Path

When updates land

Hold-open cost

Polling

next refresh interval (≤30 s)

one HTTP call per tick

SSE subscription

next 500 ms event-stream tick

one open connection

The subscription path consumes two orchestrator-emitted event types on GET /v1/events/stream:

  • enrollment.node.registered — fires once on initial enrollment; idempotent re-enrolls are silent.

  • rollout.node_state.transitioned — fires only when the persisted (state, directive_id) actually changes; no-op upserts are silent.

Both events commit in the same SQL transaction as the underlying state mutation, so a subscriber that sees an event can trust the corresponding row already exists when it queries the orchestrator. The silence-on-no-op contract makes “saw an event” usable as a cache- refresh trigger without false positives.

See Event Stream for endpoint details, payload schemas, and the consumer pattern.

Attestation gate

The attestation gate is the runtime’s post-policy enforcement layer that binds bundle provenance (the enrollment_ref declared above) to the orchestrator’s per-node enrollment + rollout state, and optionally to a manifest-declared time-bounded authorization window. It mirrors the ErrDomainNotAllowed dual-emit pattern from PR #706: when a sub-check denies under AUTONOMY_ATTESTATION_MODE=enforce, a second autonomy.decision frame lands in the WAL under the same audit_id as the policy-layer allow, and GET /v1/audit/{audit_id} returns the deny as the final outcome.

Sub-checks

Five sub-checks today. Each fails with a stable deny-reason prefix that wire consumers + operator grep patterns can lock onto:

Reason prefix

Trigger

attestation: enrollment_revoked

EnrolledNode.Status != "active"

attestation: enrollment_mismatch

bundle Provenance.EnrollmentRef differs from EnrolledNode.EnrollmentRef, or no enrollment row exists for the local node_id while the bundle declares a binding

attestation: rollout_state_invalid

NodeRolloutState.State is one of rollback_pending, rollback_complete, failed

attestation: window_not_yet_active

active manifest’s ExecutionWindow.NotBefore is later than Input.Now (bundle scheduled for a future activation)

attestation: window_expired

active manifest’s ExecutionWindow.NotAfter is at or before Input.Now (time-bounded authorization has expired; half-open [NotBefore, NotAfter) semantic)

attestation: source_unavailable

cold-cache fetch from the orchestrator failed AND the active manifest declares an enrollment binding the gate cannot evaluate without a fresh value (control-plane outage on a runtime with no prior cached snapshot)

Order matters. The gate evaluates enrollment_revoked first because revocation is the operationally most severe condition — an operator seeing the revoked reason is told the right thing to fix. enrollment_mismatch is checked second because the bundle’s declared binding is the durable license to execute. rollout_state_invalid is third because rollout state is plan-scoped (transient). The two window sub-checks come last because the time bound is a constraint on top of an otherwise-valid execution authorization — telling an operator “your window expired” when the actual problem is “this node was never enrolled for this bundle” sends them to the wrong remediation.

A nil *EnrolledNode (no row) deny via enrollment_mismatch is the correct shape when the bundle declares a binding the orchestrator has no record of. A nil *NodeRolloutState is not a deny (“no row persisted” is distinct from pending).

The two window sub-checks evaluate bundle.Manifest.ExecutionWindow (schema v1.3+, optional at all schema versions). A bundle that doesn’t declare the block — including every v1.0 / v1.1 / v1.2 bundle and any v1.3 bundle that opts out — skips both window checks; the gate only enforces what the bundle claims. The block carries NotBefore and NotAfter (both RFC3339, both independently optional), and the manifest validator at build time rejects malformed timestamps + any inverted or zero-length window. The window is half-open: [NotBefore, NotAfter) — a now exactly equal to NotBefore is allowed (inclusive start), but now == NotAfter denies (exclusive end). Without the strict-after-NotAfter semantic a workload could get a one-tick grace after its authorization expired, which is exactly the failure mode time-bounded authorization is meant to prevent.

source_unavailable is the cold-cache distinction the wiring layer draws on top of the gate’s sub-checks: a control-plane outage on a fresh runtime (no prior cached snapshot) cannot be safely distinguished from a real enrollment miss, so the wiring fails closed with a reason that names the actual failure mode. The Source layer’s stale-cache fallback covers transient outages once any value has been cached, so this reason fires only on the first fetch after process start. The remediation is different from the other reasons — operators see this and check control-plane connectivity, not enrollment hygiene. The wiring scopes the check to “manifest declares a binding”; an unbound bundle (no Provenance or empty EnrollmentRef) would have allowed regardless, so a Source error in that case is irrelevant.

Enforcement modes

Set via AUTONOMY_ATTESTATION_MODE; parsed by runtime/attestation.ParseEnforcementMode. Empty / unknown values fall back to off (safe default), and the runtime logs a warning for unrecognised values so typos surface without changing the fallback semantic.

Mode

Sub-checks

WAL evidence

Wire 403

Use case

off

skipped

none

no

fleets that haven’t validated enrollment

advisory

run

written

no

soak before enforce; operators read WAL

enforce

run

written

yes

production gate

off is the default. A release that ships the gate must not change the wire shape for fleets that haven’t opted in. Operators flip to advisory, soak until the WAL evidence matches their expectations, then flip to enforce.

enforce and advisory both require a non-empty node_id (AUTONOMY_NODE_ID env or identity.node_id from config). The runtime refuses to start without it — an actionable startup error that names both sources, rather than a mysterious silent deny on the first decision.

Cache + freshness

The gate reads the per-decision enrollment + rollout-state via the Source layer, which holds a 30s TTL cache against the orchestrator read APIs introduced in PR1 and PR2. A transient orchestrator outage within the TTL window does not trigger a deny — the cache serves its last-known-good value. Beyond TTL on a failed fetch, the source returns the last cached value with a stale-flag note in the advisory WAL frame (PR6 ships with this; future commits may surface the staleness as a separate sub-check).

See Event Stream — Cache freshness for the subscription-driven alternative to TTL polling.

Layered WAL trail

Per decision in advisory or enforce, the WAL records two autonomy.decision frames under the same audit_id:

audit_id=req-abc-001
  outcome=allow  reason="policy: ok"
  outcome=deny   reason="attestation: enrollment_mismatch"

The first frame is the policy layer’s verdict (recorded before executeTool); the second is the attestation layer’s. GET /v1/audit/{audit_id} returns the last matching frame so the audit-endpoint answer agrees with the wire response on the final outcome — same shape PR #706 established for ErrDomainNotAllowed.

In advisory mode the second frame is written but the wire response stays 200 + allow. The dual frame is what lets operators soak the gate via the WAL without disrupting traffic.

Operator workflow

  1. Enroll every node with a deliberate --enrollment-ref that names the deployment scope (e.g. deploy:fleet-alpha:v1).

  2. Sign bundles with bundle.Manifest.Provenance.EnrollmentRef set to the matching value.

  3. Set AUTONOMY_NODE_ID (or identity.node_id) on every runtime to the same string used at enrollment.

  4. Start with AUTONOMY_ATTESTATION_MODE=off until the substrate is in place.

  5. Flip to advisory. Watch the WAL via autonomy wal inspect (or autonomy attestation status per-node).

  6. Confirm the advisory denies match expectations (none for well-aligned nodes; the right reason prefix for any drift).

  7. Flip to enforce. The runtime now produces 403s on any drift.

  8. CI pipelines should run autonomy attestation eval --bundle <new-bundle> --node-id <each> before promoting a candidate bundle. Non-zero exit on deny gates the rollout at command-line time, not at production-traffic time.