VAL22 — Deadletter Workflow Validation¶
Audience: engineers and reviewers who want a reproducible local lab proving the
complete relay deadletter operator workflow against a live edged daemon,
including delivery outcome verification and failure injection.
1. Scope¶
VAL22 validates five operational goals:
List and inspect accuracy —
relay deadletter listandinspectoutput is correctly formatted and matches the seeded ledger state.Retry delivery success —
relay deadletter retrycauses the relay executor to actually deliver the segment to the peer (not just move it back to the queue), confirmed bypeer-received.json.Retry delivery failure and re-deadletter — when the peer is unreachable (outage proxy), the retried segment exhausts its retry budget and returns to DEADLETTER state.
Retention — DEADLETTER entries persist across an edged process restart (BoltDB durability).
Purge —
relay deadletter purge --forceremoves entries from the queue and emits the correct audit event.
Branch rule: coverage by existing runner¶
The existing run_edge_deadletter_lab.sh covers relay deadletter command
execution and evidence capture, but has two gaps that VAL22 addresses:
Gap |
Existing lab |
VAL22 |
|---|---|---|
Retry outcome verified (delivery to peer) |
No — no live peer in default mode; retry always fails |
Yes — clean proxy + live peer confirms delivery |
Retry success rate as a measured metric |
No |
Yes — 4/8 = 50% with controlled group split |
Retention across restart |
No |
Yes — Phase 4 |
New standalone runner is justified because:
The existing lab hardcodes
peer-aat a refusing address: retry always fails, making it impossible to verify successful deliveryAdding a live peer requires modifying shared setup infrastructure used by other PR evidence runs
Port isolation (19050–19053) prevents interference
Out of scope¶
Bandwidth quota retry interaction (covered by PR-16)
Bulk
--allpurge (not implemented as of VAL22)Multi-peer deadletter scenarios
Deadletter list pagination beyond
--limitproof
2. Architecture¶
edged (relay executor)
→ relay_impairment_proxy:19051 ← clean (Phases 1–2, 4–5) or outage (Phase 3)
→ edge_deadletter_lab_peer:19052 ← mTLS receive + JSON evidence
Port assignments (isolated from VAL19–VAL21)¶
Component |
Address |
|---|---|
edged |
|
proxy (edged→) |
|
peer server |
|
proxy ctrl API |
|
3. Fixture Design¶
The setup binary (edge_relay_deadletter_val22_setup.go) seeds 8 segments into
DEADLETTER state using the same Schedule → TryTransitionInflight → RecordAttempt → TransitionFailed → ForceDeadletter seeding API used in the
VAL19 impairment lab.
Each segment has one prior failed attempt with error_detail = "seed: peer unreachable".
Group |
Segment IDs |
Count |
Behaviour at retry |
|---|---|---|---|
R |
val22-r-001..004 |
4 |
Clean proxy → delivery succeeds |
U |
val22-u-001..004 |
4 |
Outage proxy → fails → re-deadletter |
All segments are 64 B and target peer-val22. The proxy mode at retry time
controls the outcome, not the segment or peer identity.
Configuration for fast re-deadletter¶
[retry]
max_retry_count = 3
backoff_base_seconds = 1
window_seconds = 10
Group U segments have 1 prior attempt. After relay deadletter retry:
Attempt 2: outage proxy → FAIL
Attempt 3: outage proxy → FAIL (max 3 reached → DEADLETTER)
Total time to re-deadletter: ~10 s. The runner waits 30 s as a safety margin.
4. Workflow Matrix¶
Phase |
Proxy mode |
Operation |
Expected outcome |
|---|---|---|---|
1 |
clean |
|
8 entries in list, inspect shows 1 attempt |
2 |
clean |
|
4 delivered to peer; audit has 4× retried events |
3 |
outage |
|
4 re-deadletter after max retries |
4 |
clean |
edged restart |
4 Group U entries still in deadletter list |
5 |
clean |
|
list empty; audit has 4× purged events |
5. Failure Injection Plan¶
Injection |
Mechanism |
Phase |
|---|---|---|
Initial failures (1 attempt per segment) |
|
Before Phase 1 |
Peer unreachable for Group U |
|
Phase 3 |
edged process kill and restart |
|
Phase 4 |
No kernel-level tools (tc, iptables) required.
6. 10-Check Matrix¶
ID |
Phase |
Description |
Pass criterion |
|---|---|---|---|
VAL22-01 |
1 |
Baseline list: all 8 seeded entries visible |
|
VAL22-02 |
1 |
List |
exactly 4 entries returned in output |
VAL22-03 |
1 |
Inspect: correct state, attempt history, outcome |
output contains |
VAL22-04 |
2 |
Group R retry delivery |
|
VAL22-05 |
2 |
Group R audit events |
|
VAL22-06 |
2 |
List after Group R: only Group U remains |
deadletter list contains exactly the 4 |
VAL22-07 |
3 |
Group U re-deadletter after outage retry |
deadletter list still contains exactly the 4 |
VAL22-08 |
2/3 |
Retry success rate |
measured from exact IDs: aggregate |
VAL22-09 |
4 |
Retention: entries survive edged restart |
restart deadletter list preserves exactly the 4 |
VAL22-10 |
5 |
Purge: all Group U removed + audit event |
final |
Operator-visible output checks¶
Evidence file |
What to verify |
|---|---|
|
8 lines starting with |
|
4 |
|
|
|
|
|
count=4, segments array contains exactly the 4 |
|
4 lines containing |
|
combined retry audit extract from the full session log |
|
4 |
|
same exact 4 |
|
|
|
|
|
≥ 4 lines containing |
7. Metrics and Audit Checks¶
Retry success rate¶
The runner computes and records in val22-summary.json:
{
"retry_success_rate": 0.5,
"group_r_retry_rate": 1.0,
"group_u_retry_rate": 0.0,
"group_u_redeadlettered": 4
}
The 50% aggregate rate is a documented workflow baseline, not a failure. The runner computes it from exact observed segment IDs:
Group R success means the peer received all 4
val22-r-*IDsGroup U success means a
val22-u-*ID reached the peerGroup U re-deadletter means the same
val22-u-*ID remains in the deadletter list after the outage retry window
The meaningful metric is Group R rate = 100%: every retried segment with a reachable peer must deliver end-to-end.
Audit events verified¶
Event |
When emitted |
Verified in |
|---|---|---|
|
each |
|
|
each |
|
Audit evidence is extracted from edged.log by grepping for the event string.
8. Run the Lab¶
export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local
bash scripts/labs/run_relay_deadletter_val22_lab.sh
Optional custom evidence directory:
bash scripts/labs/run_relay_deadletter_val22_lab.sh \
"$PWD/evidence/val22-relay-deadletter-local-$(date +%F)"
Expected runtime: 3–5 minutes (dominated by Phase 3 Group U retry exhaustion wait of 30 s).
9. Final Report Format¶
VAL22 — Deadletter Workflow Validation
Date: <YYYY-MM-DD>
Environment: <OS, Go version>
Evidence dir: <path>
Workflow summary:
Initial deadletter entries: 8 (4 Group R + 4 Group U)
Group R delivered on retry: 4 / 4 (clean proxy)
Group U re-deadlettered: 4 / 4 (outage proxy)
Retry success rate: 4 / 8 = 50% (Group R: 100%, Group U: 0%)
Entries after restart: 4 exact Group U IDs retained
Entries after purge: 0
10-check matrix:
VAL22-01 PASS/FAIL initial list: <count> entries
VAL22-02 PASS/FAIL list --limit 4: <count> entries
VAL22-03 PASS/FAIL inspect: <state/attempt summary>
VAL22-04 PASS/FAIL Group R delivery: <N>/4
VAL22-05 PASS/FAIL retry audit: <N> events
VAL22-06 PASS/FAIL list after Group R: <count>
VAL22-07 PASS/FAIL Group U re-deadletter: <count>
VAL22-08 PASS/FAIL retry success rate: <rate>
VAL22-09 PASS/FAIL retention after restart: <count>
VAL22-10 PASS/FAIL purge + audit: <final_count> / <audit_events>
Overall: PASS=<N> FAIL=<N>
10. Tooling¶
File |
Role |
|---|---|
|
Lab runner (entry point) |
|
Setup binary — seeds 8 DEADLETTER fixtures |
|
Reused from VAL19 — clean/outage proxy modes |
|
Reused from PR-14 lab — mTLS peer server |
|
Reused — final ledger state dump |