Architecture Overview¶

The AutonomyOps ADK is a toolkit for building and operating autonomous agents under a deterministic, policy-governed runtime. The system is structured around two planes:

Control plane (orchestrator/) — manages release lifecycle, event ingestion, and fleet-level desired state. Nodes poll it; it never pushes to nodes.
Runtime plane (per-node agent) — enforces policy, executes tools, verifies artifacts, and buffers telemetry. Operates with local-only state and local-only decision authority.

Module Map¶

Module	Path	Role
`lock`	`lock/`	Deterministic lock file schema (JSON v0 MVP) + BLAKE3 behavioral fingerprint
`runtime`	`runtime/`	Interceptor, Decision types, ToolServer (HTTP API)
`policy`	`policy/`	OPA-based policy bundle loader + evaluator + active/LKG slot manager
`oci`	`oci/`	Content-addressable blob cache, push/pull, cosign sign/verify pipeline
`telemetry`	`telemetry/`	WAL-based event buffer + async OTLP exporter + OTel bridge
`orchestrator`	`orchestrator/`	HTTP API — event ingestion, release management, SQLite store
`edge`	`edge/`	Deterministic content relay + pre-staging capability (INV-01..INV-13)
`cmd/autonomy`	`cmd/autonomy/`	Unified CLI (`autonomy`) — all subcommands

Control-Plane vs Runtime-Plane¶

Control Plane¶

The control plane (orchestrator/) exposes an HTTP API for:

Ingesting telemetry events from edge nodes (POST /v1/events)
Publishing desired-state releases (POST /v1/releases)
Querying fleet event history (GET /v1/events)

The control plane has no push channel to individual nodes. Nodes poll via the release poller (runtime/poller.go) on a configurable interval.

Runtime Plane¶

The runtime plane is the per-node agent. Its key components:

Tool server (runtime/server.go) — accepts POST /v1/tool from adapters, evaluates policy, executes approved tools, emits decision + action telemetry events.
Policy evaluator (policy/) — loads and evaluates OPA/Rego policy bundles. Uses an active/LKG (last-known-good) two-slot design. Corrupted or incompatible bundles fall back to LKG. No active bundle → deny-all.
Telemetry WAL (telemetry/) — buffers events to a write-ahead log, survives collector downtime, drains asynchronously via OTLP when the collector is reachable.
OCI verifier (oci/sign/) — four-step pipeline: cosign signature, agent digest match, behavioral fingerprint match, policy bundle semver compatibility check.
Release poller (runtime/poller.go) — background loop that polls the control plane for new releases, runs the verification pipeline, and emits lifecycle telemetry.

Node Autonomy Model¶

Each runtime node enforces policy locally. Core properties:

Property	Meaning
Local-only state	No peer state is considered in any policy decision
Local-only policy	Policy bundles are fetched, cached, and evaluated on-device
Local-only decision authority	The Go runtime (`runtime/`) is the sole authority; adapters cannot override decisions
Fail-closed	If the policy evaluator errors, the decision defaults to Deny
Offline-first	WAL buffers events; lock + policy are cached locally; no live CP connection required for decisions

Governing Constraints¶

The full invariant set is documented in Invariants. High-level structural constraints:

Constraint	Enforced by
No shared state (INV-01)	Startup path check + `edge/ci/scan_prohibited`
No convergence tracking (INV-02)	`edge/ci/scan_prohibited` (prohibited symbol scan)
No leader election (INV-03)	`edge/ci/scan_prohibited` + code review
Disk ceiling (INV-04)	`edge/storage/localstore.go` kernel `statfs` check + FI tests
Platform assurance binding (INV-05)	`edge/assurance/assurance.go` cgroup v2 probe at startup
Mission-layer decoupling (INV-10)	`edge/ci/scan_dependencies` import-graph check

Key Data Flows¶

Tool Call (allow path)¶

Adapter → POST /v1/tool → ToolServer → policy.Evaluator (Allow)
       → executeTool → response {decision:"allow", output:...}
       → telemetry.Emit(EventKindDecision) + telemetry.Emit(EventKindAction)
       → WAL → OTLP → OTel Collector → Control Plane /v1/events

Tool Call (deny path)¶

Adapter → POST /v1/tool → ToolServer → policy.Evaluator (Deny)
       → HTTP 403 {decision:"deny", reason:...}
       → telemetry.Emit(EventKindDecision)
       → WAL (decision event only; no action event)

Release Verification¶

Poller → GET {CP}/v1/releases/latest → candidate fingerprint differs from current
       → emitLifecycle("candidate_detected")
       → oci/sign Verify(imageRef, pubKeyPath):
           Step 1: cosign signature check
           Step 2: agent artifact digest match
           Step 3: behavioral fingerprint match
           Step 4: policy bundle semver compatibility
       → emitLifecycle("verify_passed")
       → PolicyActivator (when configured) → emitLifecycle("activated" | "activate_failed")

Do Not Do¶

❌ Do NOT allow the control plane to push commands to individual nodes — poll only
❌ Do NOT evaluate policy outside the runtime/ module — it is the sole authority
❌ Do NOT add convergence, gossip, or CRDT to any edge module — edge/ci/scan_prohibited enforces this

Evidence¶

runtime/interceptor.go, runtime/interceptor_test.go
runtime/poller.go
policy/evaluator.go, policy/manager.go
telemetry/wal.go, telemetry/exporter.go
orchestrator/server.go, orchestrator/server_fleet.go, orchestrator/server_releases.go
edge/ci/scan_prohibited/main.go, edge/ci/scan_dependencies/main.go