Architecture Overview

The AutonomyOps ADK is a toolkit for building and operating autonomous agents under a deterministic, policy-governed runtime. The system is structured around two planes:

  • Control plane (orchestrator/) — manages release lifecycle, event ingestion, and fleet-level desired state. Nodes poll it; it never pushes to nodes.

  • Runtime plane (per-node agent) — enforces policy, executes tools, verifies artifacts, and buffers telemetry. Operates with local-only state and local-only decision authority.

Module Map

Module

Path

Role

lock

lock/

Deterministic lock file schema (JSON v0 MVP) + BLAKE3 behavioral fingerprint

runtime

runtime/

Interceptor, Decision types, ToolServer (HTTP API)

policy

policy/

OPA-based policy bundle loader + evaluator + active/LKG slot manager

oci

oci/

Content-addressable blob cache, push/pull, cosign sign/verify pipeline

telemetry

telemetry/

WAL-based event buffer + async OTLP exporter + OTel bridge

orchestrator

orchestrator/

HTTP API — event ingestion, release management, SQLite store

edge

edge/

Deterministic content relay + pre-staging capability (INV-01..INV-13)

cmd/autonomy

cmd/autonomy/

Unified CLI (autonomy) — all subcommands

Control-Plane vs Runtime-Plane

flowchart TB CP["Control Plane (orchestrator/)<br/>POST /v1/releases<br/>GET /v1/events<br/>GET /v1/fleet"] subgraph RP["Runtime Plane — per-node agent"] AD["Adapter (HTTP)"] TS["runtime/ ToolServer"] PO["policy/ Evaluator"] OC["oci/"] TE["telemetry/"] AD -->|"POST /v1/tool"| TS TS -->|"policy eval"| PO TS -->|"OCI fetch"| OC PO -->|"WAL emit"| TE end RP -->|"HTTPS poll (runtime → CP; CP never pushes)"| CP

Control Plane

The control plane (orchestrator/) exposes an HTTP API for:

  • Ingesting telemetry events from edge nodes (POST /v1/events)

  • Publishing desired-state releases (POST /v1/releases)

  • Querying fleet event history (GET /v1/events)

The control plane has no push channel to individual nodes. Nodes poll via the release poller (runtime/poller.go) on a configurable interval.

Runtime Plane

The runtime plane is the per-node agent. Its key components:

  1. Tool server (runtime/server.go) — accepts POST /v1/tool from adapters, evaluates policy, executes approved tools, emits decision + action telemetry events.

  2. Policy evaluator (policy/) — loads and evaluates OPA/Rego policy bundles. Uses an active/LKG (last-known-good) two-slot design. Corrupted or incompatible bundles fall back to LKG. No active bundle → deny-all.

  3. Telemetry WAL (telemetry/) — buffers events to a write-ahead log, survives collector downtime, drains asynchronously via OTLP when the collector is reachable.

  4. OCI verifier (oci/sign/) — four-step pipeline: cosign signature, agent digest match, behavioral fingerprint match, policy bundle semver compatibility check.

  5. Release poller (runtime/poller.go) — background loop that polls the control plane for new releases, runs the verification pipeline, and emits lifecycle telemetry.

Node Autonomy Model

Each runtime node enforces policy locally. Core properties:

Property

Meaning

Local-only state

No peer state is considered in any policy decision

Local-only policy

Policy bundles are fetched, cached, and evaluated on-device

Local-only decision authority

The Go runtime (runtime/) is the sole authority; adapters cannot override decisions

Fail-closed

If the policy evaluator errors, the decision defaults to Deny

Offline-first

WAL buffers events; lock + policy are cached locally; no live CP connection required for decisions

Governing Constraints

The full invariant set is documented in Invariants. High-level structural constraints:

Constraint

Enforced by

No shared state (INV-01)

Startup path check + edge/ci/scan_prohibited

No convergence tracking (INV-02)

edge/ci/scan_prohibited (prohibited symbol scan)

No leader election (INV-03)

edge/ci/scan_prohibited + code review

Disk ceiling (INV-04)

edge/storage/localstore.go kernel statfs check + FI tests

Platform assurance binding (INV-05)

edge/assurance/assurance.go cgroup v2 probe at startup

Mission-layer decoupling (INV-10)

edge/ci/scan_dependencies import-graph check

Key Data Flows

Tool Call (allow path)

Adapter → POST /v1/tool → ToolServer → policy.Evaluator (Allow)
       → executeTool → response {decision:"allow", output:...}
       → telemetry.Emit(EventKindDecision) + telemetry.Emit(EventKindAction)
       → WAL → OTLP → OTel Collector → Control Plane /v1/events

Tool Call (deny path)

Adapter → POST /v1/tool → ToolServer → policy.Evaluator (Deny)
       → HTTP 403 {decision:"deny", reason:...}
       → telemetry.Emit(EventKindDecision)
       → WAL (decision event only; no action event)

Release Verification

Poller → GET {CP}/v1/releases/latest → candidate fingerprint differs from current
       → emitLifecycle("candidate_detected")
       → oci/sign Verify(imageRef, pubKeyPath):
           Step 1: cosign signature check
           Step 2: agent artifact digest match
           Step 3: behavioral fingerprint match
           Step 4: policy bundle semver compatibility
       → emitLifecycle("verify_passed")
       → PolicyActivator (when configured) → emitLifecycle("activated" | "activate_failed")

Do Not Do

  • ❌ Do NOT allow the control plane to push commands to individual nodes — poll only

  • ❌ Do NOT evaluate policy outside the runtime/ module — it is the sole authority

  • ❌ Do NOT add convergence, gossip, or CRDT to any edge module — edge/ci/scan_prohibited enforces this

Evidence

  • runtime/interceptor.go, runtime/interceptor_test.go

  • runtime/poller.go

  • policy/evaluator.go, policy/manager.go

  • telemetry/wal.go, telemetry/exporter.go

  • orchestrator/server.go, orchestrator/server_fleet.go, orchestrator/server_releases.go

  • edge/ci/scan_prohibited/main.go, edge/ci/scan_dependencies/main.go

See Also