Demo Runbook¶

Failure drills, expected outputs, and recovery procedures for the make demo-up / make demo-run stack.

Pre-flight¶

make build
make demo-up

Wait for all services to report healthy:

docker compose -f demo/docker-compose.yml ps

Expected (State column):

NAME                  SERVICE     STATE     PORTS
demo-jaeger-1         jaeger      running   0.0.0.0:16686->16686/tcp ...
demo-otel-sink-1      otel-sink   running   0.0.0.0:4319->4318/tcp
demo-registry-1       registry    running   0.0.0.0:5000->5000/tcp
demo-runtime-1        runtime     running   0.0.0.0:7777->7777/tcp

All four must be running (not starting or unhealthy) before proceeding.

Bootstrap policy and run the agent:

make demo-run

Expected final output:

✓ PASS — echo allowed, shell denied correctly

Golden demo sequence (≤10 commands)¶

The repeatable demo path from a clean machine. Each command is idempotent and prints a stable success marker the operator can show on screen.

# 0. Verify tools, keys, paths, and regressions — fail fast before wasting time
make demo-preflight
# Expected: "All preflight checks passed — ready for: make demo-up && make demo-run"

# 1. Build the Go binary
make build

# 2. Start infrastructure + gate on health checks
make demo-smoke
# Expected: "Smoke test passed — stack is healthy"

# 3. Build policy, push OCI artifacts, attach sidecars, sign with cosign,
#    verify supply chain, and run the Python agent
make demo-run
# Expected: "✓ PASS — echo allowed, shell denied correctly"

# 4. Offline telemetry buffering + priority drain
make demo-offline-drain
# Expected: "telemetry drain: OK — N events sent"

# 5. Failure-injection drills (optional)
make demo-drills
# Expected: "Drills complete — passed: 9, failed: 0"

# 6. Tear down
make demo-clean

Stable markers worth confirming after each step:

Step	Command	Stable marker
0	`make demo-preflight`	`All preflight checks passed`
2	`make demo-smoke`	`Smoke test passed`
3	`make demo-run`	`Supply-chain verification passed`
3	`make demo-run`	`decision: allow` for `tool.echo`
3	`make demo-run`	`decision: deny` for `tool.shell`
3	`make demo-run`	`✓ PASS — echo allowed, shell denied correctly`
4	`make demo-offline-drain`	`telemetry drain: OK — N events sent`
5	`make demo-drills`	`Drills complete — passed: 9, failed: 0`

audit_id UUIDs vary per call (format is stable: xxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx).

Drill 1 — Registry offline¶

Simulates: OCI registry failure during push/attach operations.

docker compose -f demo/docker-compose.yml stop registry
autonomy oci push-test-artifact --image localhost:5000/autonomy-demo/agent:v1

Expected (any non-zero exit):

Error: ... connection refused

Restore:

docker compose -f demo/docker-compose.yml start registry

Wait for registry to pass its health check (≤15s), then confirm recovery:

curl -sf http://localhost:5000/v2/

Expected:

{}

Push succeeds:

autonomy oci push-test-artifact --image localhost:5000/autonomy-demo/agent:v1

pushed  ref=localhost:5000/autonomy-demo/agent:v1  digest=sha256:...

Drill 2 — Incompatible policy bundle¶

Simulates: Deploying a bundle whose required_runtime_version does not satisfy the runtime binary version (0.1.0).

Build an incompatible bundle:

autonomy policy build \
  --in demo/policies \
  --out /tmp/bad-bundle.tar.gz \
  --version 99.0.0 \
  --name bad \
  --runtime-version ">=99.0.0"

Attempt to load:

autonomy policy load \
  --bundle      /tmp/bad-bundle.tar.gz \
  --manager-dir demo/data/policy

Expected (exit 1):

policy load: REJECTED — bundle requires runtime >=99.0.0, have 0.1.0

The current and LKG slots are unchanged. Confirm:

autonomy policy status --manager-dir demo/data/policy

Current: version=1.0.0 digest=sha256:... loaded=...
LKG:     (none)

The runtime continues to serve requests under the previous policy. Verify:

curl -s -X POST http://localhost:7777/v1/tool \
  -H 'Content-Type: application/json' \
  -d '{"kind":"tool.echo","params":{"message":"still working"}}'

{"decision":"allow","output":"still working","policy_ref":"1.0.0"}

Drill 3 — OTLP backend offline during drain¶

Simulates: Telemetry backend unavailable during the drain cycle.

Confirm WAL has events from prior tool calls:

autonomy telemetry export --dir demo/data/wal --out - | wc -l

Stop the OTLP sink:

docker compose -f demo/docker-compose.yml stop otel-sink

Attempt drain to the now-dead endpoint:

autonomy telemetry drain \
  --dir      demo/data/wal \
  --endpoint http://localhost:4319

Expected (exit 1):

telemetry drain: send error: ...connection refused

WAL is not modified by a failed drain. Confirm entry count is unchanged:

autonomy telemetry export --dir demo/data/wal --out - | wc -l

The count must be equal to or greater than before the failed drain.

Restore the sink:

docker compose -f demo/docker-compose.yml start otel-sink

Wait for the sink to accept connections (≤15s):

curl -s -o /dev/null -w "%{http_code}" -X POST http://localhost:4319/v1/logs \
  -H 'Content-Type: application/json' -d '{}'

Expected: 200

Drain successfully:

autonomy telemetry drain \
  --dir      demo/data/wal \
  --endpoint http://localhost:4319

Expected:

telemetry drain: OK — N events sent to http://localhost:4319

Re-run drain immediately (no new events):

autonomy telemetry drain \
  --dir      demo/data/wal \
  --endpoint http://localhost:4319

Expected:

telemetry drain: nothing to drain

Offline → drain scenario¶

demo/scripts/04_offline_then_drain.sh runs the full offline accumulation and priority drain sequence:

bash demo/scripts/04_offline_then_drain.sh

Expected sequence:

[demo] Simulating offline: stopping otel-sink...
  ✓ otel-sink stopped
[demo] Generating tool calls (runtime buffers events in WAL while sink is offline)...
  [allow] tool.echo ×3
          call 1: "decision":"allow"
          call 2: "decision":"allow"
          call 3: "decision":"allow"
  [deny]  tool.shell ×2
          call 1: "decision":"deny"
          call 2: "decision":"deny"
  ✓ 5 tool calls made (3 allow, 2 deny)
[demo] WAL entry count:
  N events buffered in WAL
[demo] Bringing otel-sink back online...
  ✓ otel-sink ready at http://localhost:4319
[demo] Draining WAL → http://localhost:4319 in priority order (errors first, lifecycle last)...
telemetry drain: OK — N events sent to http://localhost:4319
  ✓ Drain complete
  ✓ Script 04 complete — offline WAL accumulation and priority drain demonstrated

Run all failure drills¶

make demo-drills

Expected summary:

[demo] Drills complete — passed: 9, failed: 0
  ✓ All failure drills behaved correctly

If any drill fails, the output shows [DRILL FAIL] with the specific assertion that did not hold.

Golden-output check¶

make demo-golden asserts that the live make demo-run output matches the checked-in golden fixtures under demo/fixtures/. Run it after a successful make demo-run:

make demo-golden

Expected:

All golden checks passed

The check strips ANSI colour codes and variable fields (audit-ID UUIDs and timestamps) before comparing against the structural markers in the fixture. A drift between what the runtime currently emits and the recorded structure fails the check with a precise diff so the fixture can be intentionally regenerated rather than silently updated.

Fixture contents:

File	Purpose
`demo/fixtures/golden_tool_calls.jsonl`	Expected runtime responses for the two demo calls (echo allow, shell deny)
`demo/fixtures/expected_run_output.txt`	Console-output template with `<AUDIT_ID>` placeholders

Python-version guard¶

The demo Python agent is pinned to Python 3.12 via demo/agent_py/.python-version. To assert the local environment satisfies the pin without starting Docker:

make demo-check-python

Pydantic v2 and LangChain are fully compatible with 3.12; running the agent on a different minor version produces deprecation warnings and may break the supply-chain demo.

Operator workflows (orchestrator + fleet)¶

These targets exercise the optional control-plane and fleet surfaces. They require make demo-up to have already started the orchestrator at localhost:8888.

Control-plane smoke (`make demo-orchestrator-smoke`)¶

Asserts the orchestrator’s health endpoint responds and that event ingestion is idempotent — the same event_id posted twice is silently deduplicated (does not double-count).

make demo-orchestrator-smoke

Runs demo/scripts/orchestrator_smoke.sh. Pass criterion: zero non-zero exits across the health probe and the duplicate-ingest check.

Recent control-plane events (`make demo-cp-check`)¶

Prints the last 10 events from the orchestrator’s event store. Useful as a quick sanity check after make demo-run or after a polled-release demo:

make demo-cp-check

Sends GET http://localhost:8888/v1/events?limit=10 and pretty-prints the JSON. If python3 is not available, prints raw JSON; if no events have been ingested yet, the response is {"events":[],"count":0}.

Fleet snapshot (`make demo-show-fleet`)¶

A two-query orchestrator snapshot for channel=stable. The Makefile target runs:

GET /v1/releases/latest?channel=stable — the latest release pointer.
GET /v1/events?limit=10 — the ten most recent events from the orchestrator’s event store.

make demo-show-fleet

The target does not call /v1/releases/{release_id}/acks directly, so it does not show a structured per-node ack table. Acks are visible indirectly — they appear in the recent-events stream as ai.deployment.ack event rows when the orchestrator has received recent traffic. Useful immediately after make demo-publish-release to confirm the release pointer advanced and that ack events are landing in the event store.

If no releases have been published yet, the first query prints (no releases yet — run: make demo-publish-release) and the target continues to the recent-events block.

HA failover (`make demo-ha-failover`)¶

Demonstrates that killing the leader control-plane does not produce a double-promotion. Runs demo/scripts/10_ha_failover.sh which kills CP-1 (the current leader), waits for CP-2 to acquire the advisory lock, then asserts no two CPs ever held the lock simultaneously.

make demo-ha-failover

Pass criterion: CP-2 reports session_lock_held=1 and CP-1 reports 0 within the failover window; no two-leader interval recorded in the audit log.

Publish a desired-state release (`make demo-publish-release`)¶

Demonstrates the pull-based release model (v1.13 §1.2.3, advisory only — the control plane never pushes to nodes). The script (demo/scripts/06_releases.sh) waits for the orchestrator to be healthy, then:

Publishes a release to channel stable via POST /v1/releases.
Simulates an agent poll: GET /v1/releases/latest?channel=stable.

Records three node acks via POST /v1/nodes/{node_id}/ack, exercising the full ack-status set:

Node	Status	Reason
`node-alpha`	`accepted`	Lock fingerprint matched, runtime compatible.
`node-beta`	`rejected`	`required_runtime_version >=1.0.0 not satisfied by 0.1.0`.
`node-gamma`	`failed`	Disk-full or transient runtime failure during adoption.

Shows the fleet view via GET /v1/releases/{release_id}/acks.
Publishes a second release and confirms the latest pointer advances.

make demo-publish-release

Each run publishes new releases (sequence increments). Pair with make demo-show-fleet to see the latest release pointer and the most recent events (acks appear as ai.deployment.ack rows in that event stream); pair with make demo-cp-check for the same recent-events view without the release-pointer query. For a structured per-node ack table, query the orchestrator directly: curl http://localhost:8888/v1/releases/{release_id}/acks. The longer narrative flow that frames this script in a multi-node story is in 02-multi-node-seed-once-update-everywhere.md; this section is the single-command operator entry point.

Release poll loop and lifecycle events (`make demo-poll-loop`)¶

Demonstrates the runtime’s release poll loop emitting lifecycle events that flow through the WAL → OTel bridge → orchestrator pipeline. The script (demo/scripts/07_poll_loop.sh):

Verifies the orchestrator and runtime are healthy.
Publishes a new release to channel stable.
Waits for the runtime’s poll loop to fire (default interval 30 s; override with POLL_WAIT=10 make demo-poll-loop if the runtime is started with a smaller --poll-interval).
Drains the runtime WAL to the OTel bridge so events reach the orchestrator.
Queries GET /v1/events?event_type=ai.deployment.lifecycle and parses the emitted phases.

make demo-poll-loop

Lifecycle phases emitted by runtime/poller.go (the script asserts the first two; the verify-* phases require an explicit cosign pubkey configuration):

Phase	When emitted
`polled`	Every poll cycle.
`candidate_detected`	New `target_lock_fingerprint` differs from current.
`verify_started`	Cosign pubkey is configured (`AUTONOMY_COSIGN_PUBKEY`).
`verify_passed`	OCI + cosign + fingerprint + policy verification all succeeded.
`verify_failed`	Any verification step failed (non-fatal — the poller keeps running).

Prerequisites: make demo-up-build (full stack with orchestrator + runtime + poll loop). The runtime must be started with AUTONOMY_ORCHESTRATOR_URL set; docker-compose sets this automatically.

vNext acceptance harness (`make demo-verify-vnext`)¶

Runs the vNext Definition-of-Done harness end-to-end: supply-chain demo, failure drills, control-plane telemetry, and lifecycle events. Builds the binary first, then runs demo/scripts/verify_vnext.sh:

make demo-verify-vnext

Use this as the single command that asserts every demo-relevant invariant in one pass. It is the canonical pre-release acceptance check; CI runs the same script on every release-tag pipeline.

CI acceptance scripts¶

The same behaviors are tested automatically in CI via make ci:

Script	What it asserts
`ci/test_lock_determinism.sh`	10× fingerprint stability; canonicalize round-trip; Go unit tests
`ci/test_policy_enforcement.sh`	Deny-all before load; allow/deny after load; Python adapter tests
`ci/test_oci_attach_verify.sh`	Sidecar attach + pull byte-equality; cosign sign + verify (optional)
`ci/test_offline_telemetry_drain.sh`	WAL accumulation; durability on failed drain; priority drain

Run locally (requires Docker):

make ci

Telemetry event priority order¶

The drain delivers events in this order:

Priority 0 (High)    autonomy.error     ← security and fault events
Priority 1 (Normal)  autonomy.decision  ← policy allow/deny decisions
Priority 1 (Normal)  autonomy.action    ← tool execution results
Priority 2 (Low)     autonomy.lifecycle ← bundle load, stale, rejected

Within each priority tier, events are ordered by age (TierHot < TierWarm < TierCold) and then by sequence number.

Events expire after 30 days and are removed by store.Purge() at the start of each drain cycle.

Jaeger UI¶

Traces from the runtime are forwarded to Jaeger at http://localhost:16686.

Open http://localhost:16686.
Select service: autonomy-adk.
Click Find Traces.

Each POST /v1/tool request appears as a trace with spans for policy evaluation and tool execution.

Cleanup¶

Tear down and remove all generated data:

make demo-clean

This runs docker compose down -v (removes named volumes including registry data) and rm -rf demo/data/.

To preserve the registry content across restarts, use demo-down instead:

make demo-down

Screen recording¶

make demo-record prints a step-by-step recording checklist with narration cues and the stable output markers to call out during a 2–3 minute session.

make demo-record

Stable markers worth showing on screen, in order:

Step	Command	Marker to show
Preflight	`make demo-preflight`	`All preflight checks passed`
Supply-chain demo	`make demo-run`	`Supply-chain verification passed`
Policy enforcement	`make demo-run`	`✓ PASS — echo allowed, shell denied correctly`
Telemetry drain	`make demo-offline-drain`	`telemetry drain: OK — N events sent`
Failure drills	`make demo-drills`	`Drills complete — passed: 9, failed: 0`

Port-conflict notes during preflight:

If the demo stack is already running, port checks show demo stack already running ✓ (green, not a warning).
If a port is occupied by an unrelated process, preflight prints a conflict warning with remediation steps (make demo-clean, sudo lsof -i :<port>).

Troubleshooting¶

Symptom	Cause	Fix
`make demo-run` fails: `cosign not found`	cosign not on PATH	Install: `curl -sSfL https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64 -o /usr/local/bin/cosign && chmod +x /usr/local/bin/cosign`. Alternative: run `make demo-run-unsigned` (weaker — no supply-chain verification).
`make demo-preflight` fails: `Legacy OpenSSL key format`	Demo key is not in cosign-native format	Regenerate: `bash demo/keys/generate.sh`, then re-run `make demo-preflight`.
`make demo-run` fails at Step 1: `no cosign signature`	Image was pushed but not signed	Re-run from script 02: `bash demo/scripts/02_push_attach_sign.sh`, then `bash demo/scripts/03_verify_and_run.sh`.
Runtime starts in deny-all mode	No policy loaded	Run `bash demo/scripts/01_build.sh` to build and load the demo policy bundle.
`uv` not found	Python launcher missing	Install: `curl -LsSf https://astral.sh/uv/install.sh \| sh`.
Port conflict on 4318 or 7777	Another process is listening	Check: `ss -tlnp \| grep -E '4318\|7777'`; stop the conflicting process; then `make demo-up`.
Docker permission denied	User not in docker group	`sudo usermod -aG docker $USER && newgrp docker`.
Python agent prints version warning	Local Python is not 3.12	Install Python 3.12 (the demo pins to 3.12 via `demo/agent_py/.python-version`); confirm with `make demo-check-python`.
`make demo-golden` fails with diff	Demo output drifted from recorded fixture	Inspect the diff. If the drift is intentional (legitimate runtime change), regenerate the fixture; if not, fix the runtime regression.

Demo Runbook¶

Pre-flight¶

Golden demo sequence (≤10 commands)¶

Drill 1 — Registry offline¶

Drill 2 — Incompatible policy bundle¶

Drill 3 — OTLP backend offline during drain¶

Offline → drain scenario¶

Run all failure drills¶

Golden-output check¶

Python-version guard¶

Operator workflows (orchestrator + fleet)¶

Control-plane smoke (make demo-orchestrator-smoke)¶

Recent control-plane events (make demo-cp-check)¶

Fleet snapshot (make demo-show-fleet)¶

HA failover (make demo-ha-failover)¶

Publish a desired-state release (make demo-publish-release)¶

Release poll loop and lifecycle events (make demo-poll-loop)¶

vNext acceptance harness (make demo-verify-vnext)¶

CI acceptance scripts¶

Telemetry event priority order¶

Jaeger UI¶

Cleanup¶

Screen recording¶

Troubleshooting¶

Control-plane smoke (`make demo-orchestrator-smoke`)¶

Recent control-plane events (`make demo-cp-check`)¶

Fleet snapshot (`make demo-show-fleet`)¶

HA failover (`make demo-ha-failover`)¶

Publish a desired-state release (`make demo-publish-release`)¶

Release poll loop and lifecycle events (`make demo-poll-loop`)¶

vNext acceptance harness (`make demo-verify-vnext`)¶