Fleet Monitoring for Robotics

Once robot edge nodes are deployed they check in with the orchestrator periodically. The autonomy fleet status and autonomy logs commands give operators a real-time view of fleet health, release state, and structured logs — without SSH access to individual nodes.

Prerequisites

  • Running autonomy-orchestrator instance (see Run)

  • Edge nodes configured with AUTONOMY_ORCHESTRATOR_URL

  • fleet:read RBAC permission (assigned by default to the operator role)

Fleet status snapshot

autonomy fleet status \
    --orchestrator-url http://localhost:8888

Example output (text format):

Channel:    stable    Generated: 2026-04-07T10:00:00Z

  Total: 3   Up: 2   Stale: 1   Unknown: 0

Latest Release:
  ID:              rel-004
  Policy:          arm-safety:v1.2.0
  Sequence:        7
  Created:         2026-04-06T09:00:00Z

NODE_ID          STATUS  LAST_SEEN             POLICY           RELEASE   SEQ
node-alpha       up      2026-04-07T09:59:00Z  arm-safety:v1.2  rel-004   7
node-beta        up      2026-04-07T09:58:45Z  arm-safety:v1.2  rel-004   7
node-gamma       stale   2026-04-07T08:00:00Z  arm-safety:v1.1  rel-003   5

Node health states

Status

Meaning

up

Last heartbeat within --stale-threshold seconds (default: 60)

stale

Last heartbeat older than --stale-threshold seconds

unknown

Node has never sent a heartbeat to this orchestrator

Flags

Flag

Default

Description

--orchestrator-url

env / config

Orchestrator base URL

--channel

stable

Release channel to query

--stale-threshold

60

Seconds before a node is classified as stale

--output

text

text or json

JSON output

autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --output json | jq '.nodes[] | {node: .node_id, status: .node_status}'

Filtering by release channel

Robot fleets often use multiple release channels (dev, canary, stable). Query each independently:

# Check canary channel
autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --channel canary

# Check dev channel
autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --channel dev

Adjusting the stale threshold

For robots with slower heartbeat intervals, increase the stale threshold:

# 120-second heartbeat interval → stale after 3 minutes
autonomy fleet status \
    --orchestrator-url http://localhost:8888 \
    --stale-threshold 180

Structured log streaming

Edge nodes emit structured logs through the relay infrastructure. The orchestrator aggregates them in a ring-buffer table and exposes them at GET /v1/logs.

Fetch recent logs (batch)

autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --limit 50

Example output (text format):

2026-04-07T09:58:00Z [INFO ] node-alpha relay: started
2026-04-07T09:58:01Z [WARN ] node-beta  relay: peer-b unreachable
2026-04-07T09:59:00Z [INFO ] node-alpha arm_controller: proximity cleared

Filter to a single node

autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --node node-alpha \
    --limit 100

Stream logs in real time (SSE)

autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --follow

Press Ctrl-C to stop. If the stream is interrupted, reconnect with the same command — the orchestrator ring buffer retains recent entries.

JSON output (machine-readable)

# JSONL — one JSON object per line
autonomy logs \
    --orchestrator-url http://localhost:8888 \
    --output json \
    --limit 20

Each entry:

{"level":"info","node":"node-alpha","ts":"2026-04-07T09:58:00Z","msg":"relay: started"}

RBAC

fleet status requires the fleet:read permission. The operator predefined role includes this permission.

# Assign the operator role (run once per operator)
autonomy rbac role assign \
    --role operator \
    --subject alice@example.com

# Verify access
AUTONOMY_OPERATOR=alice@example.com \
AUTONOMY_RBAC_ENFORCEMENT=1 \
  autonomy fleet status --orchestrator-url http://localhost:8888

To deny a subject, ensure they are not assigned any role with fleet:read.

Monitoring with the Gazebo sim stack

When the Gazebo sim stack is running (demo/docker-compose.gazebo.yml), the orchestrator is available on port 8889:

# Fleet status (no robots registered yet in a fresh sim)
autonomy fleet status --orchestrator-url http://localhost:8889

# Log streaming from the sim stack
autonomy logs --orchestrator-url http://localhost:8889 --follow

Scripting a fleet health check

#!/bin/bash
# Check that every node in the stable channel is either up or stale (not unknown)
set -euo pipefail

ORCH_URL="${AUTONOMY_ORCHESTRATOR_URL:-http://localhost:8888}"
CHANNEL="${1:-stable}"

result=$(autonomy fleet status \
    --orchestrator-url "$ORCH_URL" \
    --channel "$CHANNEL" \
    --output json)

unknown=$(echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('unknown',0))")
if [ "$unknown" -gt 0 ]; then
    echo "WARN: $unknown node(s) in UNKNOWN state on channel $CHANNEL" >&2
    exit 1
fi

stale=$(echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('stale',0))")
echo "Fleet OK: channel=$CHANNEL unknown=0 stale=$stale"

Share snippet

A compact, copy-pasteable summary of this demo. Suitable for an email, issue, sales note, or proof artifact.

Prerequisites

  • A running autonomy-orchestrator instance (for example http://localhost:8888)

  • One or more edge nodes posting heartbeats to that orchestrator

  • fleet:read permission on the calling identity (operator role grants it by default)

Run it

autonomy fleet status --orchestrator-url http://localhost:8888
autonomy logs --orchestrator-url http://localhost:8888 --limit 50

Expected proof markers

  • Header line of the form Total: N   Up: N   Stale: N   Unknown: N

  • A node table with each row showing up, stale, or unknown status

  • Structured log lines tagged with [INFO ] or [WARN ] and originating node id

What this proves

Operators see fleet health and structured logs centrally without per-node SSH. Node states are deterministically classified by stale-threshold (up / stale / unknown), and edge logs flow through the orchestrator’s ring buffer with both batch (--limit) and streaming (--follow) reads — the same surface CI harnesses and operator runbooks use.

See also