VAL21 — Relay Queue Depth and Overflow Validation

Audience: engineers and reviewers who want a reproducible local lab proving relay queue behavior under depth pressure, local store overflow (LRU eviction), and relay status accuracy.

1. Scope and Goals

VAL21 validates three operational goals:

  1. Deep queue correctness — relay executor delivers 200 segments without loss, corruption, or scheduling regression (store and ledger correctness at scale).

  2. LRU eviction and graceful degradation — when the local segment store fills and LRU eviction removes payloads before the relay executor can send them, the system fails gracefully (deadletter — no panic, no silent data loss).

  3. Relay status fidelityedgectl relay status --output json queue_depth fields accurately reflect the live ledger state both before and after delivery.

Branch rule: coverage by existing runners

Existing runner

Relevance

Verdict

run_edge_deadletter_lab.sh

operator workflow (list/inspect/retry/purge)

insufficient — no deep-queue or ceiling scenario

run_relay_impairment_val19_lab.sh

impairment correctness

insufficient — correctness under network faults, not store overflow

run_relay_throughput_val20_lab.sh

throughput measurement

insufficient — no eviction, no status accuracy, max N=100

New standalone runner is required. The reason is concrete: S-B requires a low disk ceiling + aggressive eviction threshold that would corrupt the concurrent operation of any other scenario in the same edged instance.

Out of scope

  • Multi-peer relay (single peer peer-val21)

  • Bandwidth quota overflow (covered by PR-16 and VAL20-T5)

  • Replay after ledger corruption (durability, not overflow)

  • Disk-full OS-level failure (OS-level, not application behavior)

  • Automatic eviction of DELIVERED segments (covered by evict_on_relay)

2. Architecture

edged (relay executor)
  → relay_impairment_proxy:19045   ← always clean mode for VAL21
      → edge_deadletter_lab_peer:19046  ← mTLS receive + JSON evidence

Port assignments (isolated from VAL19 19030–19033 and VAL20 19040–19043)

Component

Address

edged

127.0.0.1:19044

proxy (edged→)

127.0.0.1:19045

peer server

127.0.0.1:19046

proxy ctrl API

127.0.0.1:19047

3. Scenarios

Scenario A — Deep queue drain

Parameter

Value

Segment count

200

Segment size

64 B

Store ceiling

1 TB (no effective limit)

Proxy mode

clean

Relay workers

4

Segments are seeded to PENDING state. The relay executor drains the full 200-segment queue over multiple scheduling rounds.

Queue depth monitoring: snapshots of peer-received.json count are written to sa/queue-depth.jsonl every 4 seconds (initial + periodic + final).

Scenario B — LRU eviction interaction

Parameter

Value

Segment count

10

Segment size

64 KB (65 536 B)

Store ceiling

327 680 B (5 × 64 KB)

Eviction threshold

0.50 (50% of ceiling)

Max retry count

3

Proxy mode

clean

Eviction mechanism: The store ceiling (5 × 64 KB) plus aggressive eviction threshold (50%) means evictions begin after storing ~2.5 segments. As the setup binary writes all 10 segments sequentially, the LRU eviction policy discards earlier segments to make room for later ones. By the time the relay executor starts, only the most-recently-written segments remain in the store.

Expected relay behaviour:

  • Segments still in store → delivered correctly

  • Segments evicted from store → relay executor reads, gets “not found”, records failed attempt, retries up to 3 times (exhausted within ~12 s), enters DEADLETTER state

Graceful degradation validation:

  • No panic or crash in edged.log

  • All 10 segments accounted for: delivered + deadletter = 10

  • Delivered segments have correct size (no corruption)

Note: The exact delivered/deadletter split depends on LRU timing and is not asserted precisely. VAL21-05 requires deadletter 1 (eviction occurred) and VAL21-06 requires delivered 1 (some survived).

Scenario C — Relay status accuracy

Parameter

Value

Segment count

10

Segment size

64 B

Store ceiling

1 TB (no effective limit)

Proxy mode

clean

Two relay status captures are taken:

  1. Before (t+0.4 s): before the first scheduler tick fires (tick interval = 1 s). Expected: queue_depth.scheduled + queue_depth.inflight = 10.

  2. After delivery: all 10 segments acknowledged by the peer. Expected: queue_depth.scheduled = 0, queue_depth.acked = 10.

4. 10-Check Matrix

ID

Scenario

Description

Pass criterion

VAL21-01

S-A

Deep drain: N=200 × 64 B delivered

all 200 delivered, elapsed ≤ 300 s

VAL21-02

S-A

Zero loss: delivered IDs match seeded IDs

sorted(delivered_ids) == sorted(seeded_ids)

VAL21-03

S-A

No corruption: spot-check 3 delivered sizes

3 sampled segments (first / mid / last) each have size_bytes = 64

VAL21-04

S-A

Queue depth non-increasing

≥ 2 queue-depth snapshots, received[i] ≤ received[i+1] throughout

VAL21-05

S-B

LRU eviction confirmed

deadletter_count ≥ 1 and deadletter inspect shows store_read_failed: ... segment not found

VAL21-06

S-B

Graceful degradation: no panic, some delivery

“panic” absent from edged.log, delivered_count ≥ 1

VAL21-07

S-B

All segments accounted for

terminal relay status, and seeded IDs exactly reconcile to delivered IDs + deadletter IDs

VAL21-08

S-B

No corruption in delivered set

all delivered segments have size_bytes = 65 536

VAL21-09

S-C

Status accuracy before delivery

queue_depth.scheduled + inflight = 10 at t+0.4 s

VAL21-10

S-C

Status accuracy after delivery

queue_depth.scheduled = 0 AND queue_depth.acked = 10

Pass criterion rationale

  • VAL21-01 ≤ 300 s: 200 segments through 4 workers @ 1 s tick = ~50 rounds. 300 s is 6× safety margin.

  • VAL21-05 deadletter inspect: ties the failure to the expected missing-payload path instead of any generic relay failure.

  • VAL21-07 exact ID reconciliation: strict accounting by identity, not just count. The check also requires scheduled = inflight = failed = 0 after the S-B wait window so the evidence is terminal rather than mid-flight.

  • VAL21-09 at t+0.4 s: the scheduler tick fires at t = 1 s. Capturing at 0.4 s leaves a 600 ms margin; scheduled + inflight are both included to tolerate a slightly early tick.

5. LRU Eviction Policy

The documented drop policy for the edge relay local store is eviction.policy = "lru_priority". When the store approaches the disk ceiling (controlled by eviction_threshold_fraction), the eviction policy removes the oldest (least recently used) segments to make room for incoming writes.

Consequence for relay: If a segment’s payload is evicted between scheduling and relay, the relay executor cannot read the payload. It records a failed attempt and retries. After max_retry_count failures, the segment enters the DEADLETTER state. The operator must use relay deadletter purge to remove it from the queue.

Invariant (verified by VAL21-07): no seeded segment silently disappears. The final proof is identity-based: the exact seeded segment IDs must reconcile to the union of delivered IDs and deadletter IDs once the relay status has no remaining scheduled, inflight, or failed entries.

6. Evidence Files

Per-scenario directory ($EVIDENCE_DIR/s{a|b|c}/)

File

Description

setup.log

Output of setup binary (store write errors visible here)

seed-manifest.json

Seeded segments with stored bool per entry

edge.toml

edged config (ceiling, eviction threshold, retry count)

edged.log

edged relay executor log

peer.log

Peer server log

peer-received.json

Cumulative delivery evidence

queue-depth.jsonl

(S-A only) queue depth snapshots every 4 seconds

relay-deadletter-list.txt

(S-B only) deadletter list after retry window

relay-deadletter-inspect-first.txt

(S-B only) inspection of the first deadletter entry

relay-status-after-deadletter.json

(S-B only) relay status after the S-B wait window

relay-status-before.json

(S-C only) relay status at t+0.4 s

relay-status-after.json

(S-C only) relay status after delivery

Top-level evidence files

File

Description

build.log

Go build output

proxy.log

Impairment proxy log

val21-summary.json

Scenario metrics + pass/fail counts

val21-summary.json schema

{
  "val": "VAL21",
  "date": "<RFC3339>",
  "passes": 9,
  "fails": 1,
  "scenarios": {
    "sa": {"delivered": 200, "elapsed_ms": 42100},
    "sb": {"delivered": 3, "deadletter": 7, "accounted": 10},
    "sc": {"delivered": 10}
  }
}

7. Relay Status JSON Reference

queue_depth field definitions (from edge/rpcv1/types.go):

Field

Meaning

scheduled

Segments in PENDING state (waiting for executor)

inflight

Segments currently being transmitted

acked

Segments successfully ACKed by peer

failed

Segments that failed delivery (in retry)

deadletter

Segments that exhausted retry budget

total

Sum of all states

VAL21 asserts on scheduled (before) and scheduled + acked (after).

8. Run the Lab

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_relay_overflow_val21_lab.sh

Optional custom evidence directory:

bash scripts/labs/run_relay_overflow_val21_lab.sh \
  "$PWD/evidence/val21-relay-overflow-local-$(date +%F)"

Expected runtime: 5–8 minutes (S-A 200-segment drain dominates; S-B waits 30 s for deadletter accumulation after delivery).

Tuning if VAL21-05 fails

If deadletter_count = 0 after S-B, the store ceiling was large enough that no evictions occurred before relay ran. Reduce --ceiling-bytes in the run_scenario sb call or increase --eviction-threshold aggressiveness:

# In run_relay_overflow_val21_lab.sh, S-B run_scenario call:
run_scenario "sb" \
  --ceiling-bytes 163840 --eviction-threshold 0.30 --max-retry-count 3 \
  10 65536 60 0 30

9. Report Template

VAL21 — Relay Queue Depth and Overflow Validation
Date:           <YYYY-MM-DD>
Environment:    <OS, Go version>
Evidence dir:   <path>

Scenario results:
  S-A deep drain:   200 segments delivered in <X>ms
  S-B eviction:     <N> delivered + <M> deadletter = <N+M> accounted
  S-C status:       scheduled=<Y> at t+0.4s → scheduled=0 acked=10 after

10-check matrix:
  VAL21-01 PASS/FAIL  <detail>
  VAL21-02 PASS/FAIL  <detail>
  VAL21-03 PASS/FAIL  <detail>
  VAL21-04 PASS/FAIL  <detail>
  VAL21-05 PASS/FAIL  <detail>
  VAL21-06 PASS/FAIL  <detail>
  VAL21-07 PASS/FAIL  <detail>
  VAL21-08 PASS/FAIL  <detail>
  VAL21-09 PASS/FAIL  <detail>
  VAL21-10 PASS/FAIL  <detail>

Overall: PASS=<N> FAIL=<N>

10. Tooling

File

Role

scripts/labs/run_relay_overflow_val21_lab.sh

Benchmark runner (this lab’s entry point)

scripts/labs/edge_relay_overflow_setup.go

Setup binary: ceiling/eviction/retry config

scripts/labs/relay_impairment_proxy.go

Reused from VAL19 — clean proxy mode

scripts/labs/edge_deadletter_lab_peer.go

Reused from deadletter lab — mTLS peer server