Bandwidth Troubleshooting

Audience: operators managing edge node relay delivery bandwidth.

Background

Each edge node’s relay subsystem enforces two optional bandwidth controls:

  1. Rate limit (bytes_per_second): A token-bucket rate limiter that refills at the configured rate. Segments are throttled if the bucket is empty.

  2. Daily quota (daily_quota_bytes): A rolling 24-hour byte budget. Segments are dropped from the delivery queue if the daily quota is exhausted.

Both controls default to unlimited (0 = unlimited). Throttled relays are transitioned to Failed state and retried with backoff — they are not permanently deadlettered by throttling alone.


1. Check current bandwidth status

edgectl relay config get [--socket /run/edged/rpc.sock]

Example output (unlimited):

Relay Configuration
  Enabled:                 true
  Workers:                 4
  Success Condition:       ack
  Dial Timeout:            10s
  Ack Timeout:             30s

Bandwidth
  Rate Limit:              unlimited
  Daily Quota:             unlimited
  Available Tokens:        (n/a — unlimited)
  Daily Used:              (n/a — unlimited)

Example output (rate-limited, quota set):

Relay Configuration
  Enabled:                 true
  Workers:                 4
  Success Condition:       ack

Bandwidth
  Rate Limit:              1,048,576 bytes/sec  (1 MiB/s)
  Daily Quota:             10,737,418,240 bytes  (10 GiB)
  Available Tokens:        524,288  (50% of burst)
  Daily Used:              3,221,225,472  (3 GiB / 10 GiB, 30% consumed)
  Throttle Count:          14
  Quota Drops:             0

Key fields

Field

Meaning

Available Tokens

Current token-bucket depth. If near 0, relays are being throttled.

Daily Used

Bytes delivered today. Compare to Daily Quota.

Throttle Count

Total relay attempts blocked by the rate limit since last restart.

Quota Drops

Total relay attempts dropped because daily quota was exhausted.


2. Diagnose throttle events

Symptom: relay queue growing, inflight not clearing

watch -n 5 'edgectl relay status --socket /run/edged/rpc.sock 2>&1 | grep -E "Scheduled|Inflight|Failed|Throttle"'

If Failed count is growing and Throttle Count in relay config get is incrementing, the rate limit is the cause. Throttled relays move to Failed and re-enter Scheduled after their backoff expires — the queue will continue growing if the rate limit is too low for the delivery volume.

Symptom: deliveries stop completely during a window

If Quota Drops is non-zero, the 24-hour rolling quota has been exhausted. No new delivery attempts will succeed until the quota window resets (24h after the first byte was counted in the current window).

Check the audit log for quota events:

# Look for relay.bandwidth.quota_exceeded in the slog output
autonomy audit query --audit-dir "$AUTONOMY_AUDIT_DIR" --category relay --limit 20

3. Adjust bandwidth limits

Remove rate limit (set to unlimited)

edgectl relay config set-bandwidth \
  --bytes-per-second 0 \
  [--socket /run/edged/rpc.sock]

Set a rate limit (e.g. 2 MiB/s)

edgectl relay config set-bandwidth \
  --bytes-per-second 2097152 \
  [--socket /run/edged/rpc.sock]

Set a daily quota (e.g. 20 GiB)

edgectl relay config set-bandwidth \
  --daily-quota 21474836480 \
  [--socket /run/edged/rpc.sock]

Set both

edgectl relay config set-bandwidth \
  --bytes-per-second 1048576 \
  --daily-quota 10737418240 \
  [--socket /run/edged/rpc.sock]

Expected output:

Bandwidth updated
  Rate Limit:   1,048,576 bytes/sec
  Daily Quota:  10,737,418,240 bytes
  Applied:      immediately

Configuration changes take effect immediately on the running edged process without restart. The daily quota accumulator is preserved across config updates (a config update does not reset the daily counter).

Validation: Both --bytes-per-second and --daily-quota must be ≥ 0. Negative values are rejected with an error.


4. Verify the change

edgectl relay config get [--socket /run/edged/rpc.sock]

Confirm the new limits are reflected. Then watch relay status to verify delivery resumes:

watch -n 5 'edgectl relay status --socket /run/edged/rpc.sock 2>&1 | grep -E "Scheduled|Inflight|Acked|Throttle"'

5. Reference: bandwidth sizing

Scenario

Recommended setting

Low-bandwidth link (LTE, satellite)

100–500 KiB/s rate limit; 1–5 GiB daily quota

Standard broadband edge

1–5 MiB/s rate limit; 10–50 GiB daily quota

High-throughput data center

Unlimited or very high limit

Contested-connectivity (shared link)

Rate limit + daily quota; monitor Throttle Count


Known gaps

  • Rate limit change resets token bucket to full: When UpdateConfig is called, the token bucket is reset to the new rate limit value. This is intentional to prevent starvation after a limit increase but means a sudden increase allows a burst equal to the new rate limit.

  • Daily quota window is rolling 24h from first byte: There is no midnight-reset option. The window starts when the first byte is delivered and rolls forward from there.

  • No per-peer bandwidth controls: The current implementation applies rate and quota limits globally across all peers on the node. Per-peer quotas are a follow-on item.

  • Bandwidth metrics not in Prometheus: Throttle Count and Quota Drops are available via edgectl relay config get only. Prometheus metric export is a follow-on item.

  • No config file persistence: Bandwidth limits set via edgectl relay config set-bandwidth take effect immediately but are not persisted to the config file. After edged restart, the limits revert to the values in the config file. Update the config file (relay.bandwidth_bytes_per_second, relay.bandwidth_daily_quota_bytes) to persist the change across restarts.