Deadletter Inspection and Retry Workflow¶
Audience: operators managing edge node relay delivery.
What is a deadletter entry?¶
A relay deadletter entry is an outbound segment delivery record that has exhausted its
retry budget (MaxRetries) or whose segment is missing from local storage at delivery
time (SEGMENT_MISSING outcome). The entry transitions to StateDeadletter (terminal)
and no further automatic retry attempts are made.
The relay ledger is bound by INV-12: terminal states (Acked, Deadletter) are never
exited by the normal delivery path. Manual operator intervention via retry is required
to re-queue a deadletter entry. Manual purge permanently removes it.
1. Check relay and deadletter status¶
Overall relay health¶
edgectl relay status [--socket /run/edged/rpc.sock]
Example output:
Relay Status
Enabled: true
Workers: 4
Success Condition: ack
Queue Depth
Scheduled: 12
Inflight: 2
Acked: 847
Failed: 3
Deadletter: 5
Total: 869
Bandwidth
Rate Limit: unlimited
Daily Quota: unlimited
If Deadletter count is non-zero, proceed to list and inspect.
List deadletter entries¶
edgectl relay deadletter list [--limit 50] [--socket /run/edged/rpc.sock]
Example output:
SEGMENT-ID PEER-ID ATTEMPTS FIRST-QUEUED LAST-UPDATED
seg-0a1b2c3d4e5f6789 peer-alpha 8 2026-03-18T06:00:00Z 2026-03-18T09:30:00Z
seg-1a2b3c4d5e6f7890 peer-beta 5 2026-03-18T07:15:00Z 2026-03-18T10:00:00Z
...
5 deadletter entries (showing 5 of 5)
2. Inspect a specific deadletter entry¶
edgectl relay deadletter inspect seg-0a1b2c3d4e5f6789 peer-alpha \
[--socket /run/edged/rpc.sock]
Example output:
Deadletter Entry
Segment ID: seg-0a1b2c3d4e5f6789
Peer ID: peer-alpha
State: deadletter
Attempt Count: 8
First Queued: 2026-03-18T06:00:00Z
Last Updated: 2026-03-18T09:30:00Z
Attempt History
#1 started: 2026-03-18T06:00:05Z completed: 2026-03-18T06:00:08Z outcome: FAILED error: connection refused
#2 started: 2026-03-18T06:01:20Z completed: 2026-03-18T06:01:25Z outcome: FAILED error: connection refused
#3 started: (none — ABANDONED) completed: 2026-03-18T07:15:00Z outcome: ABANDONED
#4 started: 2026-03-18T07:15:30Z completed: 2026-03-18T07:15:31Z outcome: FAILED error: dial timeout
#5 started: 2026-03-18T07:45:00Z completed: 2026-03-18T07:45:01Z outcome: SEGMENT_MISSING error: segment not in local store
...
#8 started: 2026-03-18T09:30:00Z completed: 2026-03-18T09:30:01Z outcome: FAILED error: peer unreachable
Key outcomes to look for:
Outcome |
Cause |
Action |
|---|---|---|
|
Peer unreachable, connection refused |
Verify peer connectivity; if peer is restored, retry |
|
Segment evicted from local storage |
Do not retry — purge instead; re-deploy if needed |
|
edged process crashed mid-attempt |
Normal on restart; retry is safe |
3. Decision tree¶
What is the most recent failure outcome?
│
├─ SEGMENT_MISSING → The segment is gone from local storage.
│ Retrying will immediately fail again with SEGMENT_MISSING.
│ → Purge this entry (§5). Re-deploy the artifact if needed.
│
├─ FAILED / ABANDONED → Network or peer connectivity issue.
│ Is the peer now reachable?
│ ├─ YES → Retry (§4). The re-queued entry will attempt delivery normally.
│ └─ NO → Do not retry yet. Fix the peer or network issue first.
│ If the entry is stale and the data is no longer needed → Purge (§5).
│
└─ Connection refused / dial timeout → Same as FAILED above.
4. Retry a deadletter entry¶
retry transitions StateDeadletter → StateScheduled. The AttemptCount is preserved
(the retry budget is not reset). Delivery will be attempted again with the normal backoff
schedule.
edgectl relay deadletter retry seg-0a1b2c3d4e5f6789 peer-alpha \
[--socket /run/edged/rpc.sock]
Expected output:
Retried: seg-0a1b2c3d4e5f6789 / peer-alpha → scheduled for re-delivery
Monitor the result:
# Watch the deadletter count drop (or the entry move to Acked)
watch -n 5 'edgectl relay status --socket /run/edged/rpc.sock 2>&1 | grep -E "Deadletter|Acked"'
5. Purge deadletter entries¶
purge permanently removes the outbound ledger record, its attempt history, and its
segment index entry. The original segment data may still exist in local storage; purge
only removes the delivery tracking record.
--force is required to execute a purge. Without it, purge runs as a dry-run
and shows a count of entries that would be removed.
Purge a single entry¶
edgectl relay deadletter purge \
--segment-id seg-0a1b2c3d4e5f6789 \
--peer-id peer-alpha \
--force \
[--socket /run/edged/rpc.sock]
Purge all entries for a segment (all peers)¶
edgectl relay deadletter purge \
--segment-id seg-0a1b2c3d4e5f6789 \
--force \
[--socket /run/edged/rpc.sock]
Purge entries older than N seconds¶
# Dry-run first to see what would be removed
edgectl relay deadletter purge --older-than 86400 [--socket /run/edged/rpc.sock]
# Execute
edgectl relay deadletter purge --older-than 86400 --force [--socket /run/edged/rpc.sock]
Expected output:
Purged: 3 deadletter entries removed
6. Post-action verification¶
edgectl relay status [--socket /run/edged/rpc.sock]
# Deadletter count should reflect the retried/purged entries
edgectl relay deadletter list [--socket /run/edged/rpc.sock]
# Remaining entries should only be those not yet actioned
Known gaps¶
Retry does not reset AttemptCount: After retry, the entry resumes from its current
AttemptCount. If the peer required many attempts to fail, the entry may reachMaxRetriesagain quickly. A--reset-retriesflag is a follow-on item.No bulk retry by filter: Retrying all deadletter entries for a specific peer requires individual
retrycalls per segment. Bulk retry by peer filter is a follow-on item.No automatic retention: Deadletter entries are not automatically purged after a configurable age. A retention policy with auto-purge is a follow-on item.
Audit trail: Deadletter retry and purge events are logged via slog only. Full audit event persistence to the audit store is a follow-on item.