Architecture Overview¶
The AutonomyOps ADK is a toolkit for building and operating autonomous agents under a deterministic, policy-governed runtime. The system is structured around two planes:
Control plane (
orchestrator/) — manages release lifecycle, event ingestion, and fleet-level desired state. Nodes poll it; it never pushes to nodes.Runtime plane (per-node agent) — enforces policy, executes tools, verifies artifacts, and buffers telemetry. Operates with local-only state and local-only decision authority.
Module Map¶
Module |
Path |
Role |
|---|---|---|
|
|
Deterministic lock file schema (JSON v0 MVP) + BLAKE3 behavioral fingerprint |
|
|
Interceptor, Decision types, ToolServer (HTTP API) |
|
|
OPA-based policy bundle loader + evaluator + active/LKG slot manager |
|
|
Content-addressable blob cache, push/pull, cosign sign/verify pipeline |
|
|
WAL-based event buffer + async OTLP exporter + OTel bridge |
|
|
HTTP API — event ingestion, release management, SQLite store |
|
|
Deterministic content relay + pre-staging capability (INV-01..INV-13) |
|
|
Unified CLI ( |
Control-Plane vs Runtime-Plane¶
Control Plane¶
The control plane (orchestrator/) exposes an HTTP API for:
Ingesting telemetry events from edge nodes (
POST /v1/events)Publishing desired-state releases (
POST /v1/releases)Querying fleet event history (
GET /v1/events)
The control plane has no push channel to individual nodes. Nodes poll via the
release poller (runtime/poller.go) on a configurable interval.
Runtime Plane¶
The runtime plane is the per-node agent. Its key components:
Tool server (
runtime/server.go) — acceptsPOST /v1/toolfrom adapters, evaluates policy, executes approved tools, emits decision + action telemetry events.Policy evaluator (
policy/) — loads and evaluates OPA/Rego policy bundles. Uses an active/LKG (last-known-good) two-slot design. Corrupted or incompatible bundles fall back to LKG. No active bundle → deny-all.Telemetry WAL (
telemetry/) — buffers events to a write-ahead log, survives collector downtime, drains asynchronously via OTLP when the collector is reachable.OCI verifier (
oci/sign/) — four-step pipeline: cosign signature, agent digest match, behavioral fingerprint match, policy bundle semver compatibility check.Release poller (
runtime/poller.go) — background loop that polls the control plane for new releases, runs the verification pipeline, and emits lifecycle telemetry.
Node Autonomy Model¶
Each runtime node enforces policy locally. Core properties:
Property |
Meaning |
|---|---|
Local-only state |
No peer state is considered in any policy decision |
Local-only policy |
Policy bundles are fetched, cached, and evaluated on-device |
Local-only decision authority |
The Go runtime ( |
Fail-closed |
If the policy evaluator errors, the decision defaults to Deny |
Offline-first |
WAL buffers events; lock + policy are cached locally; no live CP connection required for decisions |
Governing Constraints¶
The full invariant set is documented in Invariants. High-level structural constraints:
Constraint |
Enforced by |
|---|---|
No shared state (INV-01) |
Startup path check + |
No convergence tracking (INV-02) |
|
No leader election (INV-03) |
|
Disk ceiling (INV-04) |
|
Platform assurance binding (INV-05) |
|
Mission-layer decoupling (INV-10) |
|
Key Data Flows¶
Tool Call (allow path)¶
Adapter → POST /v1/tool → ToolServer → policy.Evaluator (Allow)
→ executeTool → response {decision:"allow", output:...}
→ telemetry.Emit(EventKindDecision) + telemetry.Emit(EventKindAction)
→ WAL → OTLP → OTel Collector → Control Plane /v1/events
Tool Call (deny path)¶
Adapter → POST /v1/tool → ToolServer → policy.Evaluator (Deny)
→ HTTP 403 {decision:"deny", reason:...}
→ telemetry.Emit(EventKindDecision)
→ WAL (decision event only; no action event)
Release Verification¶
Poller → GET {CP}/v1/releases/latest → candidate fingerprint differs from current
→ emitLifecycle("candidate_detected")
→ oci/sign Verify(imageRef, pubKeyPath):
Step 1: cosign signature check
Step 2: agent artifact digest match
Step 3: behavioral fingerprint match
Step 4: policy bundle semver compatibility
→ emitLifecycle("verify_passed")
→ PolicyActivator (when configured) → emitLifecycle("activated" | "activate_failed")
Do Not Do¶
❌ Do NOT allow the control plane to push commands to individual nodes — poll only
❌ Do NOT evaluate policy outside the
runtime/module — it is the sole authority❌ Do NOT add convergence, gossip, or CRDT to any edge module —
edge/ci/scan_prohibitedenforces this
Evidence¶
runtime/interceptor.go,runtime/interceptor_test.goruntime/poller.gopolicy/evaluator.go,policy/manager.gotelemetry/wal.go,telemetry/exporter.goorchestrator/server.go,orchestrator/server_fleet.go,orchestrator/server_releases.goedge/ci/scan_prohibited/main.go,edge/ci/scan_dependencies/main.go
See Also¶
Edge Layer — edge capability design: offer/accept/store/relay
Invariants — all 13 invariants with rationale and enforcement
Threat Model — trust boundaries, adversary model, fail-closed design
Traceability → Invariant Map — invariant → code → test