VAL22 — Deadletter Workflow Validation

Audience: engineers and reviewers who want a reproducible local lab proving the complete relay deadletter operator workflow against a live edged daemon, including delivery outcome verification and failure injection.

1. Scope

VAL22 validates five operational goals:

  1. List and inspect accuracyrelay deadletter list and inspect output is correctly formatted and matches the seeded ledger state.

  2. Retry delivery successrelay deadletter retry causes the relay executor to actually deliver the segment to the peer (not just move it back to the queue), confirmed by peer-received.json.

  3. Retry delivery failure and re-deadletter — when the peer is unreachable (outage proxy), the retried segment exhausts its retry budget and returns to DEADLETTER state.

  4. Retention — DEADLETTER entries persist across an edged process restart (BoltDB durability).

  5. Purgerelay deadletter purge --force removes entries from the queue and emits the correct audit event.

Branch rule: coverage by existing runner

The existing run_edge_deadletter_lab.sh covers relay deadletter command execution and evidence capture, but has two gaps that VAL22 addresses:

Gap

Existing lab

VAL22

Retry outcome verified (delivery to peer)

No — no live peer in default mode; retry always fails

Yes — clean proxy + live peer confirms delivery

Retry success rate as a measured metric

No

Yes — 4/8 = 50% with controlled group split

Retention across restart

No

Yes — Phase 4

New standalone runner is justified because:

  • The existing lab hardcodes peer-a at a refusing address: retry always fails, making it impossible to verify successful delivery

  • Adding a live peer requires modifying shared setup infrastructure used by other PR evidence runs

  • Port isolation (19050–19053) prevents interference

Out of scope

  • Bandwidth quota retry interaction (covered by PR-16)

  • Bulk --all purge (not implemented as of VAL22)

  • Multi-peer deadletter scenarios

  • Deadletter list pagination beyond --limit proof

2. Architecture

edged (relay executor)
  → relay_impairment_proxy:19051   ← clean (Phases 1–2, 4–5) or outage (Phase 3)
      → edge_deadletter_lab_peer:19052  ← mTLS receive + JSON evidence

Port assignments (isolated from VAL19–VAL21)

Component

Address

edged

127.0.0.1:19050

proxy (edged→)

127.0.0.1:19051

peer server

127.0.0.1:19052

proxy ctrl API

127.0.0.1:19053

3. Fixture Design

The setup binary (edge_relay_deadletter_val22_setup.go) seeds 8 segments into DEADLETTER state using the same Schedule TryTransitionInflight RecordAttempt TransitionFailed ForceDeadletter seeding API used in the VAL19 impairment lab.

Each segment has one prior failed attempt with error_detail = "seed: peer unreachable".

Group

Segment IDs

Count

Behaviour at retry

R

val22-r-001..004

4

Clean proxy → delivery succeeds

U

val22-u-001..004

4

Outage proxy → fails → re-deadletter

All segments are 64 B and target peer-val22. The proxy mode at retry time controls the outcome, not the segment or peer identity.

Configuration for fast re-deadletter

[retry]
max_retry_count = 3
backoff_base_seconds = 1
window_seconds = 10

Group U segments have 1 prior attempt. After relay deadletter retry:

  • Attempt 2: outage proxy → FAIL

  • Attempt 3: outage proxy → FAIL (max 3 reached → DEADLETTER)

Total time to re-deadletter: ~10 s. The runner waits 30 s as a safety margin.

4. Workflow Matrix

Phase

Proxy mode

Operation

Expected outcome

1

clean

list, list --limit 4, inspect val22-r-001 peer-val22

8 entries in list, inspect shows 1 attempt

2

clean

retry val22-r-{001..004} peer-val22

4 delivered to peer; audit has 4× retried events

3

outage

retry val22-u-{001..004} peer-val22

4 re-deadletter after max retries

4

clean

edged restart

4 Group U entries still in deadletter list

5

clean

purge --force val22-u-{001..004} peer-val22

list empty; audit has 4× purged events

5. Failure Injection Plan

Injection

Mechanism

Phase

Initial failures (1 attempt per segment)

ForceDeadletter via setup binary (deterministic seeding)

Before Phase 1

Peer unreachable for Group U

PUT /mode {"type":"outage"} to proxy control API

Phase 3

edged process kill and restart

kill $EDGED_PID + fresh launch

Phase 4

No kernel-level tools (tc, iptables) required.

6. 10-Check Matrix

ID

Phase

Description

Pass criterion

VAL22-01

1

Baseline list: all 8 seeded entries visible

deadletter_count = 8

VAL22-02

1

List --limit 4 truncation

exactly 4 entries returned in output

VAL22-03

1

Inspect: correct state, attempt history, outcome

output contains val22-r-001, deadletter, attempt history, FAILED

VAL22-04

2

Group R retry delivery

peer-received.json contains exactly the 4 val22-r-* IDs once each

VAL22-05

2

Group R audit events

audit-retry-r.log contains exactly the 4 Group R segment IDs

VAL22-06

2

List after Group R: only Group U remains

deadletter list contains exactly the 4 val22-u-* IDs

VAL22-07

3

Group U re-deadletter after outage retry

deadletter list still contains exactly the 4 val22-u-* IDs after wait

VAL22-08

2/3

Retry success rate

measured from exact IDs: aggregate 4/8 = 50%, Group R 4/4, Group U 0/4

VAL22-09

4

Retention: entries survive edged restart

restart deadletter list preserves exactly the 4 val22-u-* IDs

VAL22-10

5

Purge: all Group U removed + audit event

final deadletter_count = 0, purge audit contains exactly the 4 Group U IDs

Operator-visible output checks

Evidence file

What to verify

deadletter-list-initial.txt

8 lines starting with segment=; footer says showing 8

deadletter-list-limit-4.txt

4 segment= lines; footer says showing 4 of 8

deadletter-inspect-r001.txt

Segment: val22-r-001, State: deadletter, Attempts: 1, Attempt History: section with one FAILED line

retry-r/val22-r-001-retry.txt

retried=true for each Group R segment

peer-received.json

count=4, segments array contains exactly the 4 val22-r-* IDs with no duplicates

audit-retry-r.log

4 lines containing relay.deadletter.retried, one for each val22-r-* segment

audit-retry-all.log

combined retry audit extract from the full session log

deadletter-list-after-retry-u.txt

4 segment= lines, all and only val22-u-*

deadletter-list-after-restart.txt

same exact 4 val22-u-* lines as before restart

purge-u/val22-u-001-purge.txt

purged=true for each Group U segment

deadletter-list-after-purge.txt

showing 0 deadletter entries (empty)

audit-purge-u.log

≥ 4 lines containing relay.deadletter.purged

7. Metrics and Audit Checks

Retry success rate

The runner computes and records in val22-summary.json:

{
  "retry_success_rate": 0.5,
  "group_r_retry_rate": 1.0,
  "group_u_retry_rate": 0.0,
  "group_u_redeadlettered": 4
}

The 50% aggregate rate is a documented workflow baseline, not a failure. The runner computes it from exact observed segment IDs:

  • Group R success means the peer received all 4 val22-r-* IDs

  • Group U success means a val22-u-* ID reached the peer

  • Group U re-deadletter means the same val22-u-* ID remains in the deadletter list after the outage retry window

The meaningful metric is Group R rate = 100%: every retried segment with a reachable peer must deliver end-to-end.

Audit events verified

Event

When emitted

Verified in

relay.deadletter.retried

each relay deadletter retry command

audit-retry-r.log proves exact Group R retry coverage; audit-retry-all.log keeps the full session extract

relay.deadletter.purged

each relay deadletter purge --force

audit-purge-u.log proves exact Group U purge coverage

Audit evidence is extracted from edged.log by grepping for the event string.

8. Run the Lab

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_relay_deadletter_val22_lab.sh

Optional custom evidence directory:

bash scripts/labs/run_relay_deadletter_val22_lab.sh \
  "$PWD/evidence/val22-relay-deadletter-local-$(date +%F)"

Expected runtime: 3–5 minutes (dominated by Phase 3 Group U retry exhaustion wait of 30 s).

9. Final Report Format

VAL22 — Deadletter Workflow Validation
Date:           <YYYY-MM-DD>
Environment:    <OS, Go version>
Evidence dir:   <path>

Workflow summary:
  Initial deadletter entries:  8 (4 Group R + 4 Group U)
  Group R delivered on retry:  4 / 4  (clean proxy)
  Group U re-deadlettered:     4 / 4  (outage proxy)
  Retry success rate:          4 / 8 = 50%  (Group R: 100%, Group U: 0%)
  Entries after restart:       4 exact Group U IDs retained
  Entries after purge:         0

10-check matrix:
  VAL22-01 PASS/FAIL  initial list: <count> entries
  VAL22-02 PASS/FAIL  list --limit 4: <count> entries
  VAL22-03 PASS/FAIL  inspect: <state/attempt summary>
  VAL22-04 PASS/FAIL  Group R delivery: <N>/4
  VAL22-05 PASS/FAIL  retry audit: <N> events
  VAL22-06 PASS/FAIL  list after Group R: <count>
  VAL22-07 PASS/FAIL  Group U re-deadletter: <count>
  VAL22-08 PASS/FAIL  retry success rate: <rate>
  VAL22-09 PASS/FAIL  retention after restart: <count>
  VAL22-10 PASS/FAIL  purge + audit: <final_count> / <audit_events>

Overall: PASS=<N> FAIL=<N>

10. Tooling

File

Role

scripts/labs/run_relay_deadletter_val22_lab.sh

Lab runner (entry point)

scripts/labs/edge_relay_deadletter_val22_setup.go

Setup binary — seeds 8 DEADLETTER fixtures

scripts/labs/relay_impairment_proxy.go

Reused from VAL19 — clean/outage proxy modes

scripts/labs/edge_deadletter_lab_peer.go

Reused from PR-14 lab — mTLS peer server

scripts/labs/edge_deadletter_lab_dump.go

Reused — final ledger state dump