Fleet Monitoring for Robotics¶

Once robot edge nodes are deployed they check in with the orchestrator periodically. The autonomy fleet status and autonomy logs commands give operators a real-time view of fleet health, release state, and structured logs — without SSH access to individual nodes.

Prerequisites¶

Running autonomy-orchestrator instance (see Run)
Edge nodes configured with AUTONOMY_ORCHESTRATOR_URL
fleet:read RBAC permission (assigned by default to the operator role)

Fleet status snapshot¶

autonomy fleet status \
    --orchestrator-url http://localhost:8888

Example output (text format):

Channel:    stable    Generated: 2026-04-07T10:00:00Z

  Total: 3   Up: 2   Stale: 1   Unknown: 0

Latest Release:
  ID:              rel-004
  Policy:          arm-safety:v1.2.0
  Sequence:        7
  Created:         2026-04-06T09:00:00Z

NODE_ID          STATUS  LAST_SEEN             POLICY           RELEASE   SEQ
node-alpha       up      2026-04-07T09:59:00Z  arm-safety:v1.2  rel-004   7
node-beta        up      2026-04-07T09:58:45Z  arm-safety:v1.2  rel-004   7
node-gamma       stale   2026-04-07T08:00:00Z  arm-safety:v1.1  rel-003   5

Node health states¶

Status	Meaning
`up`	Last heartbeat within `--stale-threshold` seconds (default: 60)
`stale`	Last heartbeat older than `--stale-threshold` seconds
`unknown`	Node has never sent a heartbeat to this orchestrator

Flags¶

Flag	Default	Description
`--orchestrator-url`	env / config	Orchestrator base URL
`--channel`	`stable`	Release channel to query
`--stale-threshold`	`60`	Seconds before a node is classified as stale
`--output`	`text`	`text` or `json`

JSON output¶

autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --output json | jq '.nodes[] | {node: .node_id, status: .node_status}'

Filtering by release channel¶

Robot fleets often use multiple release channels (dev, canary, stable). Query each independently:

# Check canary channel
autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --channel canary

# Check dev channel
autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --channel dev

Adjusting the stale threshold¶

For robots with slower heartbeat intervals, increase the stale threshold:

# 120-second heartbeat interval → stale after 3 minutes
autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --stale-threshold 180

Structured log streaming¶

Edge nodes emit structured logs through the relay infrastructure. The orchestrator aggregates them in a ring-buffer table and exposes them at GET /v1/logs.

Fetch recent logs (batch)¶

autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --limit 50

Example output (text format):

2026-04-07T09:58:00Z [INFO ] node-alpha relay: started
2026-04-07T09:58:01Z [WARN ] node-beta  relay: peer-b unreachable
2026-04-07T09:59:00Z [INFO ] node-alpha arm_controller: proximity cleared

Filter to a single node¶

autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --node node-alpha \
    --limit 100

Stream logs in real time (SSE)¶

autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --follow

Press Ctrl-C to stop. If the stream is interrupted, reconnect with the same command — the orchestrator ring buffer retains recent entries.

JSON output (machine-readable)¶

# JSONL — one JSON object per line
autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --output json \
    --limit 20

Each entry:

{"level":"info","node":"node-alpha","ts":"2026-04-07T09:58:00Z","msg":"relay: started"}

RBAC¶

fleet status requires the fleet:read permission. The operator predefined role includes this permission.

# Assign the operator role (run once per operator)
autonomy rbac role assign \
    --role operator \
    --subject alice@example.com

# Verify access
AUTONOMY_OPERATOR=alice@example.com \
AUTONOMY_RBAC_ENFORCEMENT=1 \
  autonomy fleet status --orchestrator-url http://localhost:8888

To deny a subject, ensure they are not assigned any role with fleet:read.

Monitoring with the Gazebo sim stack¶

When the Gazebo sim stack is running (demo/docker-compose.gazebo.yml), the orchestrator is available on port 8889:

# Fleet status (no robots registered yet in a fresh sim)
autonomy fleet status --orchestrator-url http://localhost:8889

# Log streaming from the sim stack
autonomy logs --orchestrator-url http://localhost:8889 --follow

Scripting a fleet health check¶

#!/bin/bash
# Check that every node in the stable channel is either up or stale (not unknown)
set -euo pipefail

ORCH_URL="${AUTONOMY_ORCHESTRATOR_URL:-http://localhost:8888}"
CHANNEL="${1:-stable}"

result=$(autonomy fleet status \
    --orchestrator-url "$ORCH_URL" \
    --channel "$CHANNEL" \
    --output json)

unknown=$(echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('unknown',0))")
if [ "$unknown" -gt 0 ]; then
    echo "WARN: $unknown node(s) in UNKNOWN state on channel $CHANNEL" >&2
    exit 1
fi

stale=$(echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('stale',0))")
echo "Fleet OK: channel=$CHANNEL unknown=0 stale=$stale"