VAL21 — Relay Queue Depth and Overflow Validation¶
Audience: engineers and reviewers who want a reproducible local lab proving relay queue behavior under depth pressure, local store overflow (LRU eviction), and relay status accuracy.
1. Scope and Goals¶
VAL21 validates three operational goals:
Deep queue correctness — relay executor delivers 200 segments without loss, corruption, or scheduling regression (store and ledger correctness at scale).
LRU eviction and graceful degradation — when the local segment store fills and LRU eviction removes payloads before the relay executor can send them, the system fails gracefully (deadletter — no panic, no silent data loss).
Relay status fidelity —
edgectl relay status --output jsonqueue_depthfields accurately reflect the live ledger state both before and after delivery.
Branch rule: coverage by existing runners¶
Existing runner |
Relevance |
Verdict |
|---|---|---|
|
operator workflow (list/inspect/retry/purge) |
insufficient — no deep-queue or ceiling scenario |
|
impairment correctness |
insufficient — correctness under network faults, not store overflow |
|
throughput measurement |
insufficient — no eviction, no status accuracy, max N=100 |
New standalone runner is required. The reason is concrete: S-B requires a low disk ceiling + aggressive eviction threshold that would corrupt the concurrent operation of any other scenario in the same edged instance.
Out of scope¶
Multi-peer relay (single peer
peer-val21)Bandwidth quota overflow (covered by PR-16 and VAL20-T5)
Replay after ledger corruption (durability, not overflow)
Disk-full OS-level failure (OS-level, not application behavior)
Automatic eviction of DELIVERED segments (covered by
evict_on_relay)
2. Architecture¶
edged (relay executor)
→ relay_impairment_proxy:19045 ← always clean mode for VAL21
→ edge_deadletter_lab_peer:19046 ← mTLS receive + JSON evidence
Port assignments (isolated from VAL19 19030–19033 and VAL20 19040–19043)¶
Component |
Address |
|---|---|
edged |
|
proxy (edged→) |
|
peer server |
|
proxy ctrl API |
|
3. Scenarios¶
Scenario A — Deep queue drain¶
Parameter |
Value |
|---|---|
Segment count |
200 |
Segment size |
64 B |
Store ceiling |
1 TB (no effective limit) |
Proxy mode |
clean |
Relay workers |
4 |
Segments are seeded to PENDING state. The relay executor drains the full 200-segment queue over multiple scheduling rounds.
Queue depth monitoring: snapshots of peer-received.json count are written
to sa/queue-depth.jsonl every 4 seconds (initial + periodic + final).
Scenario B — LRU eviction interaction¶
Parameter |
Value |
|---|---|
Segment count |
10 |
Segment size |
64 KB (65 536 B) |
Store ceiling |
327 680 B (5 × 64 KB) |
Eviction threshold |
0.50 (50% of ceiling) |
Max retry count |
3 |
Proxy mode |
clean |
Eviction mechanism: The store ceiling (5 × 64 KB) plus aggressive eviction threshold (50%) means evictions begin after storing ~2.5 segments. As the setup binary writes all 10 segments sequentially, the LRU eviction policy discards earlier segments to make room for later ones. By the time the relay executor starts, only the most-recently-written segments remain in the store.
Expected relay behaviour:
Segments still in store → delivered correctly
Segments evicted from store → relay executor reads, gets “not found”, records failed attempt, retries up to 3 times (exhausted within ~12 s), enters DEADLETTER state
Graceful degradation validation:
No panic or crash in
edged.logAll 10 segments accounted for:
delivered + deadletter = 10Delivered segments have correct size (no corruption)
Note: The exact delivered/deadletter split depends on LRU timing and is not asserted precisely. VAL21-05 requires
deadletter ≥ 1(eviction occurred) and VAL21-06 requiresdelivered ≥ 1(some survived).
Scenario C — Relay status accuracy¶
Parameter |
Value |
|---|---|
Segment count |
10 |
Segment size |
64 B |
Store ceiling |
1 TB (no effective limit) |
Proxy mode |
clean |
Two relay status captures are taken:
Before (t+0.4 s): before the first scheduler tick fires (tick interval = 1 s). Expected:
queue_depth.scheduled + queue_depth.inflight = 10.After delivery: all 10 segments acknowledged by the peer. Expected:
queue_depth.scheduled = 0,queue_depth.acked = 10.
4. 10-Check Matrix¶
ID |
Scenario |
Description |
Pass criterion |
|---|---|---|---|
VAL21-01 |
S-A |
Deep drain: N=200 × 64 B delivered |
all 200 delivered, elapsed ≤ 300 s |
VAL21-02 |
S-A |
Zero loss: delivered IDs match seeded IDs |
sorted(delivered_ids) == sorted(seeded_ids) |
VAL21-03 |
S-A |
No corruption: spot-check 3 delivered sizes |
3 sampled segments (first / mid / last) each have size_bytes = 64 |
VAL21-04 |
S-A |
Queue depth non-increasing |
≥ 2 queue-depth snapshots, received[i] ≤ received[i+1] throughout |
VAL21-05 |
S-B |
LRU eviction confirmed |
deadletter_count ≥ 1 and deadletter inspect shows |
VAL21-06 |
S-B |
Graceful degradation: no panic, some delivery |
“panic” absent from edged.log, delivered_count ≥ 1 |
VAL21-07 |
S-B |
All segments accounted for |
terminal relay status, and seeded IDs exactly reconcile to delivered IDs + deadletter IDs |
VAL21-08 |
S-B |
No corruption in delivered set |
all delivered segments have size_bytes = 65 536 |
VAL21-09 |
S-C |
Status accuracy before delivery |
queue_depth.scheduled + inflight = 10 at t+0.4 s |
VAL21-10 |
S-C |
Status accuracy after delivery |
queue_depth.scheduled = 0 AND queue_depth.acked = 10 |
Pass criterion rationale¶
VAL21-01 ≤ 300 s: 200 segments through 4 workers @ 1 s tick = ~50 rounds. 300 s is 6× safety margin.
VAL21-05 deadletter inspect: ties the failure to the expected missing-payload path instead of any generic relay failure.
VAL21-07 exact ID reconciliation: strict accounting by identity, not just count. The check also requires
scheduled = inflight = failed = 0after the S-B wait window so the evidence is terminal rather than mid-flight.VAL21-09 at t+0.4 s: the scheduler tick fires at t = 1 s. Capturing at 0.4 s leaves a 600 ms margin;
scheduled + inflightare both included to tolerate a slightly early tick.
5. LRU Eviction Policy¶
The documented drop policy for the edge relay local store is
eviction.policy = "lru_priority". When the store approaches the disk
ceiling (controlled by eviction_threshold_fraction), the eviction policy
removes the oldest (least recently used) segments to make room for incoming
writes.
Consequence for relay: If a segment’s payload is evicted between scheduling
and relay, the relay executor cannot read the payload. It records a failed
attempt and retries. After max_retry_count failures, the segment enters the
DEADLETTER state. The operator must use relay deadletter purge to remove
it from the queue.
Invariant (verified by VAL21-07): no seeded segment silently disappears. The final proof is identity-based: the exact seeded segment IDs must reconcile to the union of delivered IDs and deadletter IDs once the relay status has no remaining scheduled, inflight, or failed entries.
6. Evidence Files¶
Per-scenario directory ($EVIDENCE_DIR/s{a|b|c}/)¶
File |
Description |
|---|---|
|
Output of setup binary (store write errors visible here) |
|
Seeded segments with |
|
edged config (ceiling, eviction threshold, retry count) |
|
edged relay executor log |
|
Peer server log |
|
Cumulative delivery evidence |
|
(S-A only) queue depth snapshots every 4 seconds |
|
(S-B only) deadletter list after retry window |
|
(S-B only) inspection of the first deadletter entry |
|
(S-B only) relay status after the S-B wait window |
|
(S-C only) relay status at t+0.4 s |
|
(S-C only) relay status after delivery |
Top-level evidence files¶
File |
Description |
|---|---|
|
Go build output |
|
Impairment proxy log |
|
Scenario metrics + pass/fail counts |
val21-summary.json schema¶
{
"val": "VAL21",
"date": "<RFC3339>",
"passes": 9,
"fails": 1,
"scenarios": {
"sa": {"delivered": 200, "elapsed_ms": 42100},
"sb": {"delivered": 3, "deadletter": 7, "accounted": 10},
"sc": {"delivered": 10}
}
}
7. Relay Status JSON Reference¶
queue_depth field definitions (from edge/rpcv1/types.go):
Field |
Meaning |
|---|---|
|
Segments in PENDING state (waiting for executor) |
|
Segments currently being transmitted |
|
Segments successfully ACKed by peer |
|
Segments that failed delivery (in retry) |
|
Segments that exhausted retry budget |
|
Sum of all states |
VAL21 asserts on scheduled (before) and scheduled + acked (after).
8. Run the Lab¶
export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local
bash scripts/labs/run_relay_overflow_val21_lab.sh
Optional custom evidence directory:
bash scripts/labs/run_relay_overflow_val21_lab.sh \
"$PWD/evidence/val21-relay-overflow-local-$(date +%F)"
Expected runtime: 5–8 minutes (S-A 200-segment drain dominates; S-B waits 30 s for deadletter accumulation after delivery).
Tuning if VAL21-05 fails¶
If deadletter_count = 0 after S-B, the store ceiling was large enough that
no evictions occurred before relay ran. Reduce --ceiling-bytes in the
run_scenario sb call or increase --eviction-threshold aggressiveness:
# In run_relay_overflow_val21_lab.sh, S-B run_scenario call:
run_scenario "sb" \
--ceiling-bytes 163840 --eviction-threshold 0.30 --max-retry-count 3 \
10 65536 60 0 30
9. Report Template¶
VAL21 — Relay Queue Depth and Overflow Validation
Date: <YYYY-MM-DD>
Environment: <OS, Go version>
Evidence dir: <path>
Scenario results:
S-A deep drain: 200 segments delivered in <X>ms
S-B eviction: <N> delivered + <M> deadletter = <N+M> accounted
S-C status: scheduled=<Y> at t+0.4s → scheduled=0 acked=10 after
10-check matrix:
VAL21-01 PASS/FAIL <detail>
VAL21-02 PASS/FAIL <detail>
VAL21-03 PASS/FAIL <detail>
VAL21-04 PASS/FAIL <detail>
VAL21-05 PASS/FAIL <detail>
VAL21-06 PASS/FAIL <detail>
VAL21-07 PASS/FAIL <detail>
VAL21-08 PASS/FAIL <detail>
VAL21-09 PASS/FAIL <detail>
VAL21-10 PASS/FAIL <detail>
Overall: PASS=<N> FAIL=<N>
10. Tooling¶
File |
Role |
|---|---|
|
Benchmark runner (this lab’s entry point) |
|
Setup binary: ceiling/eviction/retry config |
|
Reused from VAL19 — clean proxy mode |
|
Reused from deadletter lab — mTLS peer server |