Attestation Mode Rollout

Audience: operators turning on the runtime attestation gate (#725 PR6) on a fleet that has been running with AUTONOMY_ATTESTATION_MODE=off. The gate denies tool execution when a node’s enrollment or rollout state is not coherent with the bundle’s declared provenance. Flipping straight to enforce without soak time means any operator misalignment surfaces as production 403s. This runbook is the safe sequence.

Prerequisites

  • All nodes in the target fleet have been enrolled via autonomy node enroll --node-id <id> --enrollment-ref <ref>.

  • Bundle authors have started declaring bundle.Manifest.Provenance.EnrollmentRef on signed bundles (schema v1.2+, see Security Model — Attestation join key).

  • Each runtime has AUTONOMY_NODE_ID (or identity.node_id in config) set to the same string used at enrollment.

  • Operator has access to the WAL via autonomy wal inspect and the orchestrator read APIs via autonomy attestation status.

Procedure

1. Verify enrollment substrate fleet-wide

For each node, confirm the enrollment row matches the bundle’s declared binding:

autonomy attestation status --node-id <id>

Expected output: ENROLLMENT_REF populated, STATUS=active, ROLLOUT_STATE either <none> or one of pending / acknowledged / active. A <none> ENROLLMENT means the node was missed — re-enroll before proceeding.

For CI / scripted verification, drive autonomy attestation eval against every node + candidate bundle pair:

autonomy bundle pull <ref> /tmp/bundle.tar
autonomy attestation eval --bundle /tmp/bundle.tar --node-id <id>

Non-zero exit on deny means the fleet is not ready for enforce. Fix the misalignment before flipping the mode.

2. Flip to advisory

On every runtime:

AUTONOMY_ATTESTATION_MODE=advisory <restart runtime process>

Advisory mode runs the gate but never returns 403. Every would-be deny is written to the WAL as a second autonomy.decision frame under the same audit_id as the policy-layer allow.

3. Soak

Let traffic run for at least one full directive cycle (typically 24h). Scrape advisory denies from the WAL:

autonomy wal inspect --kind autonomy.decision \
  | jq 'select(.attrs.reason | startswith("attestation:"))'

Expected results, in order of severity:

Reason prefix

Action

enrollment_revoked

Investigate why the node was revoked; re-enroll if intentional

enrollment_mismatch

The bundle’s binding doesn’t match the enrollment row — verify the operator typed the same string at both ends, re-enroll if needed

rollout_state_invalid

Node is in a rollback or failed state; resolve the rollout incident before flipping enforce

Do not flip enforce while ANY non-zero advisory denies are appearing unless you intentionally want those denies to start returning 403.

4. Flip to enforce

When the advisory denies are zero (or only on nodes you’ve intentionally revoked), flip:

AUTONOMY_ATTESTATION_MODE=enforce <restart runtime process>

The runtime now returns 403 on any sub-check failure. The wire response carries the canonical reason prefix (attestation: <sub-check>) so client code can branch on it.

5. Verify the layered WAL

Pick a known-good audit_id (any tool call that succeeded post- enforce) and confirm the dual-emit shape:

autonomy audit get <audit_id>

Expected: outcome=allow, reason from the policy layer. No attestation frame means the gate didn’t object — the happy path.

For a deny path, force a known mismatch (revoke a test node, attempt a request) and confirm:

autonomy audit get <audit_id>

Expected: outcome=deny, reason starts with attestation:.

Recovery

If enforce surfaces unexpected denies in production:

  1. Immediate: flip back to advisory and restart. Traffic resumes; the WAL still records the would-be denies for investigation.

  2. Diagnose: autonomy attestation status per affected node; confirm enrollment + rollout state match expectations.

  3. Re-test: re-run autonomy attestation eval --bundle <new> against each node before flipping enforce again.

Optional: time-bounded authorization (ExecutionWindow)

Bundles signed at schema v1.3 can declare an ExecutionWindow block in the manifest:

{
  "schema_version": "1.3",
  "execution_window": {
    "not_before": "2026-05-15T00:00:00Z",
    "not_after":  "2026-08-15T00:00:00Z"
  }
}

The gate evaluates not_before and not_after against the runtime’s wall clock (time.Now) on every decision once enforce is active. Operational notes:

  • The window is half-open [NotBefore, NotAfter). A bundle whose not_after is 2026-08-15T00:00:00Z denies at exactly that instant — no one-tick grace.

  • Both endpoints are independently optional. A bundle with only not_after runs from any starting point up to the deadline (cert-style expiry). A bundle with only not_before runs from that moment forever (scheduled release).

  • An expired bundle surfaces attestation: window_expired on the wire and in the WAL; a future-dated bundle surfaces attestation: window_not_yet_active. Operator remediation is to ship a new (or renewed) bundle — different from the enrollment- side remediations.

  • Build-time validation rejects malformed RFC3339 timestamps and inverted / zero-length windows, so a bundle that passes bundle inspect won’t surface as a runtime-only window failure.

  • Pre-promotion check: autonomy attestation eval --bundle <path> reports the verdict against the local wall clock; for windows starting in the future the dry-run shows window_not_yet_active until activation day.

References