VAL 19 — Relay Local-Network Impairment Validation

Status: Implemented Scripts:

  • scripts/labs/run_relay_impairment_val19_lab.sh — single-shot lab runner

  • scripts/labs/relay_impairment_proxy.go — TCP impairment proxy binary

  • scripts/labs/edge_relay_impairment_setup.go — VAL19 lab setup binary

Evidence dir: operator-chosen directory (default evidence/val19-relay-impairment-YYYY-MM-DD/) Ports: edged → 19030 · proxy listen → 19031 · peer server → 19032 · proxy control → 19033


Purpose

Validates Edge Relay store-and-forward correctness, delivery-after-reconnection, and performance characterisation under four network impairment conditions:

  • Outage — peer unreachable; messages must accumulate in the deadletter queue and be delivered after the peer is restored (workplan: “store-and-forward correctness”)

  • Bandwidth constraint — 1 Mbps and 10 Mbps transport limits; relay must deliver all segments; throughput is measured and captured as evidence

  • Connection delay — 200 ms and 500 ms pre-dial delay per connection; relay must deliver correctly when connection establishment is delayed

  • Connection instability — mid-transfer disconnect (proxy closes after 500 bytes); relay must handle the partial transfer gracefully, retain the segment in the deadletter queue, and deliver on the subsequent clean retry


Branch-Specific Rule Application

Question

Answer

Is this covered by an existing LAB?

Partially. run_edge_deadletter_lab.sh covers CLI operations on a pre-seeded deadletter queue (happy path) and in-process bandwidth throttling using a mocked clock. It does NOT cover: real transport-layer latency, mid-transfer disconnect handling, outage accumulation and recovery timing, or throughput measurement under external bandwidth constraints.

Which LAB/evidence bundle is extended?

New standalone script — extending run_edge_deadletter_lab.sh would require modifying edge_deadletter_lab_setup.go to route the peer through a proxy address, adding a new proxy binary, and extending the lab’s 30-60 s runtime by 3-4 minutes. Following the VAL12 / VAL18 precedent for runtimes and multi-binary setups.

New evidence files

30+ files in the evidence directory — see Evidence Files table.

Tutorial/runbook docs updated

docs/tutorials/edge-relay-deadletter-lab.md §VAL19 cross-reference note added.

Reason new runner required

(a) Requires two new Go binaries built at runtime. (b) 3–4 min runtime due to outage cycles + bandwidth tests. (c) The proxy routes edged to a different peer address than the deadletter lab, requiring a separate edge.toml (generated by edge_relay_impairment_setup.go).


Test Architecture

edged (relay executor)
  │  mTLS connection to peer-val19
  │
  ▼
relay_impairment_proxy  (127.0.0.1:19031)
  │  control API at 127.0.0.1:19033
  │  PUT /mode  GET /stats  POST /reset
  │
  ▼  (transparent TCP forward; mTLS end-to-end)
edge_deadletter_lab_peer  (127.0.0.1:19032)
  │  peer-val19.crt — mTLS identity
  │
  ▼
peer-received.json  (segment delivery log)

Key invariant: The proxy is a raw-TCP passthrough; mTLS authentication occurs end-to-end between edged and the peer server. The proxy never decrypts or inspects TLS content — it only delays, rate-limits, or closes connections at the TCP layer.


Network Impairment Plan

Impairment proxy modes

Mode

PUT /mode body

Effect

clean

{"type":"clean"}

Full passthrough, no impairment

outage

{"type":"outage"}

Immediately closes every new incoming connection

bandwidth

{"type":"bandwidth","value_bps":N}

Token-bucket rate limit on the primary (edged→peer) direction, interpreted as bits per second

latency

{"type":"latency","value_ms":N}

Sleeps N ms before dialling the target (adds per-connection pre-dial delay)

cutoff

{"type":"cutoff","value_bytes":N}

Closes connection after forwarding N bytes in the primary direction

Scenario-to-mode mapping

User scenario

Mode used

Value

30s–5m intermittent outages

outage

immediate close (outage period controlled by test script)

200–500 ms connection delay

latency

200 ms (VAL19-08), 500 ms (VAL19-09)

1 Mbps bandwidth

bandwidth

value_bps=1000000

10 Mbps bandwidth

bandwidth

value_bps=10000000

1–5% packet loss

cutoff

value_bytes=500 (simulates mid-transfer disconnect at TCP layer)

Note on “packet loss” simulation: TCP provides reliable delivery, so true per-packet loss cannot be injected via a TCP proxy. The cutoff mode simulates the observable effect — a connection reset mid-transfer — which triggers the same relay retry path as packet-loss-induced TCP RST. True packet-level loss (e.g., via tc netem) requires CAP_NET_ADMIN / root and is out of scope for this lab; the cutoff mechanism provides equivalent relay behaviour coverage without OS privileges.


Harness and Tooling Setup

Pre-built binaries (built by lab script)

Binary

Source

Role

edged

edge/cmd/edged

Edge daemon with relay executor

edgectl

edge/cmd/edgectl

CLI: relay deadletter list/retry; relay status

relay-impairment-proxy

scripts/labs/relay_impairment_proxy.go

TCP impairment proxy with HTTP control API

Setup binary (run via go run)

Binary

Source

Role

edge_relay_impairment_setup

scripts/labs/edge_relay_impairment_setup.go

Generates CA + certs, edge.toml (proxy address for peer-val19), seeds 10 segments in BoltDB

edge_deadletter_lab_peer

scripts/labs/edge_deadletter_lab_peer.go

Reused from deadletter lab; records received segments to peer-received.json

edge_deadletter_lab_dump

scripts/labs/edge_deadletter_lab_dump.go

Reused; dumps final ledger state per segment

Pre-seeded segments

Segment ID

Size

Purpose

val19-seg-01

64 B

VAL19-01 clean baseline

val19-seg-02

64 B

VAL19-02/03 outage accumulate + recovery

val19-seg-03

64 B

VAL19-05 repeated outage cycle 1

val19-seg-04

64 B

VAL19-05 repeated outage cycle 2

val19-seg-05

64 B

VAL19-05 repeated outage cycle 3

val19-seg-06

128 KB

VAL19-06 bandwidth 1 Mbps

val19-seg-07

128 KB

VAL19-07 bandwidth 10 Mbps

val19-seg-08

64 B

VAL19-08 latency 200 ms

val19-seg-09

64 B

VAL19-09 latency 500 ms

val19-seg-10

4 KB

VAL19-10 cutoff + clean retry

All segments are seeded in the relay BoltDB ledger with one prior failed attempt (error_detail: "seed: peer unreachable") so they are visible in relay deadletter list at lab start.


Metrics and Logs to Collect

Per-scenario evidence files

Check

Key evidence file(s)

VAL19-01

val19-01-retry.txt, peer-received.json

VAL19-02

val19-02-retry.txt, val19-02-deadletter-list.txt

VAL19-03

val19-03-retry.txt, peer-received.json

VAL19-04

val19-04-integrity.txt, peer-received.json

VAL19-05

val19-05-cycles.txt, val19-05-deadletter-list.txt, peer-received.json

VAL19-06

val19-06-retry.txt, val19-06-proxy-stats.json, val19-06-timing.txt

VAL19-07

val19-07-retry.txt, val19-07-proxy-stats.json, val19-07-timing.txt

VAL19-08

val19-08-retry.txt, val19-08-timing.txt

VAL19-09

val19-09-retry.txt, val19-09-timing.txt

VAL19-10

val19-10-retry-cutoff.txt, val19-10-deadletter-after-cutoff.txt, val19-10-check.txt, val19-10-retry-clean.txt, peer-received.json

Proxy stats JSON schema (proxy-stats-final.json, val19-0X-proxy-stats.json)

{
  "bytes_fwd":        N,
  "conn_accepted":    N,
  "conn_dropped":     N,
  "active_conn":      N,
  "last_conn_bytes":  N,
  "last_conn_ms":     N
}

last_conn_bytes and last_conn_ms give per-connection throughput: throughput_bps = last_conn_bytes * 1000 / last_conn_ms.

Persistent lab logs

File

Content

edged.log

Edge daemon relay executor log; includes relay: dial failed on outage and retry-path errors on cutoff

peer.log

Peer server log; includes per-segment receive events

proxy.log

Proxy startup and error log


VAL19 10-Check Matrix

Check

Name

Mode

Pass Criterion

VAL19-01

clean_baseline_delivery

clean

peer-received.json contains val19-seg-01

VAL19-02

outage_deadletter_accumulate

outage

relay deadletter list still shows val19-seg-02 after retry

VAL19-03

outage_recovery_delivery

clean (restored)

peer-received.json contains val19-seg-02

VAL19-04

data_integrity

(from 01)

Peer-recorded size for val19-seg-01 = 64 bytes

VAL19-05

repeated_outage_resilience

outage+clean ×3

All of val19-seg-03, 04, 05 in peer-received.json

VAL19-06

bandwidth_1mbps_delivery

bandwidth@1Mbps

peer-received.json contains val19-seg-06

VAL19-07

bandwidth_10mbps_delivery

bandwidth@10Mbps

peer-received.json contains val19-seg-07

VAL19-08

latency_200ms_delivery

latency@200ms

peer-received.json contains val19-seg-08

VAL19-09

latency_500ms_delivery

latency@500ms

peer-received.json contains val19-seg-09

VAL19-10

cutoff_retry_converges

cutoff@500Bclean

cutoff retry exits non-zero, val19-seg-10 remains in deadletter, then peer-received.json contains it after clean retry


Pass/Fail Criteria

Outcome

Condition

PASS

All 10 checks pass

PARTIAL

Checks 1, 2, 3, 5 pass (store-and-forward + outage resilience)

FAIL

Check 1 fails (relay baseline broken) OR check 3 fails (outage recovery broken) OR check 4 fails (data integrity)

Mandatory checks: VAL19-01 (clean delivery), VAL19-03 (outage recovery), VAL19-04 (data integrity). A relay that delivers segments cleanly but loses them after an outage cycle, or corrupts payload sizes, fails the public store-and-forward claim regardless of bandwidth or latency check outcomes.


Performance Degradation Characterisation

The lab captures the following performance metrics as evidence (informational, not hard pass/fail gates):

Metric

Source

Interpretation

throughput_1mbps_bps

val19-06-proxy-stats.jsonlast_conn_bytes / last_conn_ms

Actual relay throughput under 1 Mbps proxy limit

throughput_10mbps_bps

val19-07-proxy-stats.json

Actual throughput under 10 Mbps proxy limit

latency_200ms_elapsed_ms

val19-08-timing.txt

Wall time of retry + delivery under 200 ms pre-dial delay

latency_500ms_elapsed_ms

val19-09-timing.txt

Wall time of retry + delivery under 500 ms pre-dial delay

Expected characterisation results (with 128 KB segments):

  • 1 Mbps: ~131 ms transfer + scheduler overhead ≈ 1.1–1.5 s total elapsed

  • 10 Mbps: ~13 ms transfer + scheduler overhead ≈ 1.0–1.1 s total elapsed

  • 200 ms delay: 200 ms pre-dial delay + ~10 ms transfer + scheduler ≈ 1.2 s total

  • 500 ms delay: 500 ms pre-dial delay + ~10 ms + scheduler ≈ 1.5 s total


Evidence Files

File

Description

edge.toml

Edge daemon config with peer-val19 → proxy address

seed-manifest.json

VAL19 segment catalogue (IDs, sizes, purposes)

initial-deadletter-list.txt

All 10 segments visible before tests start

edgectl-status.txt

Initial edgectl status (edged ready)

relay-status-initial.txt / .json

Relay status at lab start

peer-received.json

All segments received by the peer server (grows across test)

edged.log

Edge daemon full log

peer.log

Peer server log

proxy.log

Proxy log

val19-01-retry.txt

Retry output for clean baseline

val19-02-retry.txt

Retry output under outage

val19-02-deadletter-list.txt

Deadletter list after failed outage retry

val19-03-retry.txt

Retry output after proxy restored

val19-04-integrity.txt

received_size=N expected_size=64

val19-05-cycles.txt

delivered_count=3 expected=3

val19-05-deadletter-list.txt

Deadletter list after 3 outage cycles

val19-06-retry.txt

Retry under 1 Mbps bandwidth

val19-06-proxy-stats.json

Proxy stats after 1 Mbps test

val19-06-timing.txt

elapsed_ms=N for 1 Mbps test

val19-07-retry.txt

Retry under 10 Mbps bandwidth

val19-07-proxy-stats.json

Proxy stats after 10 Mbps test

val19-07-timing.txt

elapsed_ms=N for 10 Mbps test

val19-08-retry.txt

Retry under 200 ms latency

val19-08-timing.txt

elapsed_ms=N proxy_latency_ms=200

val19-09-retry.txt

Retry under 500 ms latency

val19-09-timing.txt

elapsed_ms=N proxy_latency_ms=500

val19-10-retry-cutoff.txt

Retry output with proxy in cutoff mode

val19-10-deadletter-after-cutoff.txt

Deadletter list after cutoff (seg-10 present)

val19-10-check.txt

cutoff_exit=<N> deadletter_retained=true/false

val19-10-retry-clean.txt

Final retry output after clean restore

proxy-stats-final.json

Final proxy stats (cumulative)

final-deadletter-list.txt

Remaining deadletter entries after all tests

relay-status-final.txt / .json

Final relay status

ledger-val19-seg-XX.json

BoltDB ledger dump per segment (10 files)

val19-report.txt

Human-readable 10-check PASS/FAIL report

val19-report.json

Machine-readable JSON report with throughput metrics


Known Failure Modes

Failure

Likely Cause

Mitigation

VAL19-01 FAIL: baseline not delivered

edged not ready before first retry; scheduler not running

Check edgectl-status.txt for ready; verify edged.log shows relay scheduler started

VAL19-02 FAIL: seg-02 not in deadletter after outage retry

Relay scheduled the retry AFTER proxy mode was restored to clean

Increase sleep 2 to sleep 3 after the outage retry

VAL19-03 FAIL: recovery delivery not received

Retry command returned non-zero; peer server crashed

Check val19-03-retry.txt and peer.log

VAL19-04 FAIL: wrong size

peer-received.json format changed; peer server version mismatch

Check raw peer-received.json content

VAL19-05 FAIL: not all 3 delivered

Outage cycles ran too fast; segment didn’t return to deadletter state before clean retry

Increase sleep 1 between cycles to sleep 2

VAL19-06/07 FAIL: not delivered

edged scheduler timed out waiting for the rate-limited transfer; ack_timeout_seconds = 5 too short

Increase ack_timeout_seconds in edge_relay_impairment_setup.go

VAL19-08/09 FAIL: not delivered

dial_timeout_seconds = 5 exceeded by pre-dial delay + overhead (should not happen for ≤ 500 ms)

Check edged.log for dial failed: context deadline exceeded

VAL19-10 FAIL: seg-10 not delivered after clean

Cutoff partially delivered mTLS handshake leaving peer in bad state

Kill and restart peer server; re-run

Proxy control API unreachable

Proxy not started or crashed

Check proxy.log; verify 127.0.0.1:19033 is listening

Port conflict

Ports 19030-19033 in use

Kill any leftover processes; lsof -i :19031


Final Report Template

# VAL 19 — Relay Local-Network Impairment Validation

Generated:    <timestamp>
Evidence dir: <path>

## Network Impairment Summary
Proxy listen:   127.0.0.1:19031 → 127.0.0.1:19032
Proxy control:  127.0.0.1:19033
edged listen:   127.0.0.1:19030

## Throughput Characterisation (informational)
1 Mbps constraint:   1,004,320 bps  (131 ms, 131,072 B)
10 Mbps constraint:  9,962,481 bps  (13 ms, 131,072 B)
Latency 200 ms:      delivery_elapsed=1247 ms
Latency 500 ms:      delivery_elapsed=1531 ms

## Check Results
VAL19-01 clean_baseline_delivery:         PASS
VAL19-02 outage_deadletter_accumulate:    PASS
VAL19-03 outage_recovery_delivery:        PASS
VAL19-04 data_integrity:                  PASS  (size=64 expected=64)
VAL19-05 repeated_outage_resilience:      PASS  (3/3 delivered)
VAL19-06 bandwidth_1mbps_delivery:        PASS
VAL19-07 bandwidth_10mbps_delivery:       PASS
VAL19-08 latency_200ms_delivery:          PASS
VAL19-09 latency_500ms_delivery:          PASS
VAL19-10 cutoff_retry_converges:          PASS

## Summary
pass=10  fail=0  total=10

Relay Store-and-Forward Assessment:

  • PASS requires VAL19-01 (clean delivery) + VAL19-03 (outage recovery) + VAL19-04 (data integrity)

  • VAL19-05 (repeated outage resilience) is the key Gate D evidence for the “store-and-forward correctness” workplan claim

  • Throughput characterisation values (throughput_1mbps_bps, last_conn_ms) are recorded as baseline performance evidence; no hard thresholds imposed (relay delivers correctly regardless of transport speed)