Fleet Monitoring for Robotics¶
Once robot edge nodes are deployed they check in with the orchestrator
periodically. The autonomy fleet status and autonomy logs commands give
operators a real-time view of fleet health, release state, and structured logs
— without SSH access to individual nodes.
Prerequisites¶
Running
autonomy-orchestratorinstance (see Run)Edge nodes configured with
AUTONOMY_ORCHESTRATOR_URLfleet:readRBAC permission (assigned by default to theoperatorrole)
Fleet status snapshot¶
autonomy fleet status \
--orchestrator-url http://localhost:8888
Example output (text format):
Channel: stable Generated: 2026-04-07T10:00:00Z
Total: 3 Up: 2 Stale: 1 Unknown: 0
Latest Release:
ID: rel-004
Policy: arm-safety:v1.2.0
Sequence: 7
Created: 2026-04-06T09:00:00Z
NODE_ID STATUS LAST_SEEN POLICY RELEASE SEQ
node-alpha up 2026-04-07T09:59:00Z arm-safety:v1.2 rel-004 7
node-beta up 2026-04-07T09:58:45Z arm-safety:v1.2 rel-004 7
node-gamma stale 2026-04-07T08:00:00Z arm-safety:v1.1 rel-003 5
Node health states¶
Status |
Meaning |
|---|---|
|
Last heartbeat within |
|
Last heartbeat older than |
|
Node has never sent a heartbeat to this orchestrator |
Flags¶
Flag |
Default |
Description |
|---|---|---|
|
env / config |
Orchestrator base URL |
|
|
Release channel to query |
|
|
Seconds before a node is classified as stale |
|
|
|
JSON output¶
autonomy fleet status \
--orchestrator-url http://localhost:8888 \
--output json | jq '.nodes[] | {node: .node_id, status: .node_status}'
Filtering by release channel¶
Robot fleets often use multiple release channels (dev, canary, stable).
Query each independently:
# Check canary channel
autonomy fleet status \
--orchestrator-url http://localhost:8888 \
--channel canary
# Check dev channel
autonomy fleet status \
--orchestrator-url http://localhost:8888 \
--channel dev
Adjusting the stale threshold¶
For robots with slower heartbeat intervals, increase the stale threshold:
# 120-second heartbeat interval → stale after 3 minutes
autonomy fleet status \
--orchestrator-url http://localhost:8888 \
--stale-threshold 180
Structured log streaming¶
Edge nodes emit structured logs through the relay infrastructure. The
orchestrator aggregates them in a ring-buffer table and exposes them at
GET /v1/logs.
Fetch recent logs (batch)¶
autonomy logs \
--orchestrator-url http://localhost:8888 \
--limit 50
Example output (text format):
2026-04-07T09:58:00Z [INFO ] node-alpha relay: started
2026-04-07T09:58:01Z [WARN ] node-beta relay: peer-b unreachable
2026-04-07T09:59:00Z [INFO ] node-alpha arm_controller: proximity cleared
Filter to a single node¶
autonomy logs \
--orchestrator-url http://localhost:8888 \
--node node-alpha \
--limit 100
Stream logs in real time (SSE)¶
autonomy logs \
--orchestrator-url http://localhost:8888 \
--follow
Press Ctrl-C to stop. If the stream is interrupted, reconnect with the same
command — the orchestrator ring buffer retains recent entries.
JSON output (machine-readable)¶
# JSONL — one JSON object per line
autonomy logs \
--orchestrator-url http://localhost:8888 \
--output json \
--limit 20
Each entry:
{"level":"info","node":"node-alpha","ts":"2026-04-07T09:58:00Z","msg":"relay: started"}
RBAC¶
fleet status requires the fleet:read permission. The operator predefined
role includes this permission.
# Assign the operator role (run once per operator)
autonomy rbac role assign \
--role operator \
--subject alice@example.com
# Verify access
AUTONOMY_OPERATOR=alice@example.com \
AUTONOMY_RBAC_ENFORCEMENT=1 \
autonomy fleet status --orchestrator-url http://localhost:8888
To deny a subject, ensure they are not assigned any role with fleet:read.
Monitoring with the Gazebo sim stack¶
When the Gazebo sim stack is running (demo/docker-compose.gazebo.yml), the
orchestrator is available on port 8889:
# Fleet status (no robots registered yet in a fresh sim)
autonomy fleet status --orchestrator-url http://localhost:8889
# Log streaming from the sim stack
autonomy logs --orchestrator-url http://localhost:8889 --follow
Scripting a fleet health check¶
#!/bin/bash
# Check that every node in the stable channel is either up or stale (not unknown)
set -euo pipefail
ORCH_URL="${AUTONOMY_ORCHESTRATOR_URL:-http://localhost:8888}"
CHANNEL="${1:-stable}"
result=$(autonomy fleet status \
--orchestrator-url "$ORCH_URL" \
--channel "$CHANNEL" \
--output json)
unknown=$(echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('unknown',0))")
if [ "$unknown" -gt 0 ]; then
echo "WARN: $unknown node(s) in UNKNOWN state on channel $CHANNEL" >&2
exit 1
fi
stale=$(echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('stale',0))")
echo "Fleet OK: channel=$CHANNEL unknown=0 stale=$stale"
See also¶
Robotics Quickstart — full demo run
ROS2 Markers and Observability — WAL and PASS markers
Hardware Adaptation — registering real robot nodes
cmd/autonomy/commands/fleet.go— fleet CLI sourcecmd/autonomy/commands/logs.go— logs CLI source