VAL20 — Relay Throughput Benchmark¶

Audience: engineers and reviewers who want a reproducible local benchmark for edge relay executor throughput. Complements VAL19 (impairment correctness) by characterising raw delivery rate and backpressure behaviour under load.

1. Scope¶

VAL20 validates three operational goals:

Throughput baseline — measure segments/sec and bytes/sec delivered end-to-end over a loopback mTLS relay session under clean network conditions.
Scheduler scalability — confirm that throughput does not catastrophically degrade as queue depth grows from 1 to 100 segments.
Backpressure evidence — demonstrate that the 1 Mbps bandwidth constraint (via VAL19 impairment proxy) visibly reduces bytes/sec compared to clean mode.

Out of scope:

Multi-peer relay (VAL20 uses one peer: peer-val20)
Persistent-across-restart durability (covered by VAL11/VAL15)
Relay ledger correctness (covered by PR-14/PR-15 deadletter lab)
Bandwidth enforcement correctness at the control plane (covered by PR-16)
Relay under packet loss / latency / outage (covered by VAL19)

2. Architecture¶

edged (relay executor)
  → relay_impairment_proxy:19041   ← proxy in clean or bandwidth mode
      → edge_deadletter_lab_peer:19042  ← mTLS receive + JSON evidence

All traffic is loopback. The impairment proxy (relay_impairment_proxy.go) is reused from VAL19 and is kept in clean mode for tiers T1–T4. Tier T5 applies a 1 Mbps bandwidth constraint to demonstrate backpressure.

Port assignments (isolated from VAL19 19030–19033)¶

Component	Address
edged	`127.0.0.1:19040`
proxy (edged→)	`127.0.0.1:19041`
peer server	`127.0.0.1:19042`
proxy ctrl API	`127.0.0.1:19043`

Relay executor configuration¶

The edge.toml generated by edge_relay_throughput_setup.go increases concurrency relative to the deadletter lab defaults:

[scheduler]
max_concurrent_relays = 4
schedule_interval_seconds = 1
max_segments_per_scheduling_round = 50

[relay]
worker_count = 4
dial_timeout_seconds = 5
ack_timeout_seconds = 5

Higher concurrency ensures the benchmark saturates the loopback network rather than the scheduler tick interval.

3. Workload Tiers¶

Segments are seeded to PENDING state (not deadletter). The relay executor picks them up automatically on the first scheduling round — no relay deadletter retry invocations are needed.

Tier	Label	N	Size	Proxy mode	Purpose
T1	warmup	1	64 B	clean	Baseline single-segment latency
T2	small-10	10	64 B	clean	Baseline segments/sec
T3	small-100	100	64 B	clean	Scheduler throughput under load
T4	large-10	10	128 KB	clean	Bytes/sec characterisation
T5	constrained	10	128 KB	1 Mbps	Backpressure evidence

Each tier uses a fresh edged and peer instance (separate BoltDB and TLS material) to prevent ledger state from one tier affecting another. The impairment proxy is shared and stays up across all tiers.

Segment seeding¶

edge_relay_throughput_setup.go generates segment IDs of the form val20-${tier}-${N:03d} (e.g. val20-t3-001 through val20-t3-100).

Each segment payload is deterministic:

val20:<segment_id>:AAAA...

(prefix up to 32 bytes; remainder filled with A).

Queue depth monitoring¶

During T3 (N=100), the runner writes an initial snapshot immediately, samples peer-received.json every 4 seconds while the tier is running, and records a final snapshot at tier completion in t3/queue-depth.jsonl. Each snapshot records:

{"snapshot": 3, "elapsed_s": 12, "received": 48, "pending": 52}

VAL20-04 requires at least two snapshots and verifies that the received sequence is non-decreasing across the full captured window.

4. 10-Check Matrix¶

ID	Tier	Description	Pass criterion
VAL20-01	T1	Health: edged starts, single segment delivered	received=1, elapsed ≤ 30 s
VAL20-02	T2	Baseline throughput: N=10 × 64 B	all 10 delivered, sps ≥ 0.5 seg/s
VAL20-03	T3	Scheduler load: N=100 × 64 B	all 100 delivered within 180 s
VAL20-04	T3	Queue depth: non-increasing over sample window	at least 2 snapshots, and `received[i] ≤ received[i+1]` throughout
VAL20-05	T2/3	Scaling: T3 sps within 3× of T2 sps	T3 sps ≥ T2 sps / 3
VAL20-06	T4	Large delivery: N=10 × 128 KB delivered	all 10 delivered within 60 s
VAL20-07	T4	Bytes/sec on loopback	bps ≥ 102 400 B/s (100 KB/s)
VAL20-08	T5	Backpressure timing: constrained > clean	T5 elapsed_ms > T4 elapsed_ms
VAL20-09	T4/5	Backpressure rate: 1 Mbps throttle is visible	T5 bps < 50% of T4 bps
VAL20-10	all	Zero loss across all tiers	delivered segment IDs exactly match seeded IDs for every tier

Threshold rationale¶

VAL20-01 ≤ 30 s: conservative; single segment on loopback with 1 s scheduler tick should deliver in < 3 s.
VAL20-02 sps ≥ 0.5: 10 segments in ≤ 20 s — safe floor for 4 workers on any CI host.
VAL20-03 ≤ 180 s: 100 segments through 4 workers with 1 s tick = ~25 rounds theoretically; 180 s is 5× safety margin.
VAL20-05 within 3×: T3 has larger queue so some scheduling overhead is expected; 3× band catches regressions without requiring linear scaling.
VAL20-07 ≥ 100 KB/s: 10 × 128 KB = 1.28 MB; 100 KB/s implies ≤ 13 s — well below the 60 s timeout on loopback.
VAL20-09 T5 < 50% T4: at 1 Mbps (125 KB/s) constraint vs. loopback (typically >> 10 MB/s), the ratio should be < 2% not just < 50%. The 50% threshold is a conservative floor.

5. Metrics and Evidence¶

Per-tier evidence directory (`$EVIDENCE_DIR/t${N}/`)¶

File	Description
`setup.log`	Output of setup binary
`seed-manifest.json`	Seeded segment catalogue with IDs, sizes, tier label
`edge.toml`	edged configuration for this tier
`edged.log`	edged relay executor log (scheduling decisions, delivery events)
`peer.log`	Peer server log (connection + receive events)
`peer-received.json`	Cumulative delivery evidence (count + segment list)
`proxy-stats.json`	Impairment proxy stats at tier completion
`tier-result.json`	Computed metrics: elapsed_ms, sps, bps, zero_loss
`queue-depth.jsonl`	(T3 only) Queue depth snapshots every 4 seconds

Top-level evidence files¶

File	Description
`build.log`	Go build output for all binaries
`proxy.log`	Impairment proxy log (shared across all tiers)
`val20-baseline.json`	Consolidated performance baseline: all tier metrics + 10-check

`val20-baseline.json` schema¶

{
  "val": "VAL20",
  "date": "<RFC3339>",
  "passes": 10,
  "fails": 0,
  "tiers": [
    {
      "tier": "t2",
      "count": 10,
      "size_bytes": 64,
      "total_bytes": 640,
      "elapsed_ms": 1823,
      "delivered": 10,
      "segs_per_sec": 5.486,
      "bytes_per_sec": 351,
      "zero_loss": true
    }
  ],
  "backpressure_ratio": 42.1,
  "throughput_baseline": {
    "small_segment_sps": 5.486,
    "large_segment_bps": 12845056,
    "constrained_1mbps_bps": 119283
  }
}

backpressure_ratio = T4 bps / T5 bps. A ratio > 10 confirms the 1 Mbps constraint is actively throttling.

6. Throughput Characterisation Method¶

The measured segments/sec and bytes/sec values are end-to-end from the moment edged is started to the moment all segments appear in peer-received.json. This includes:

BoltDB ledger read latency (one scan per scheduling round)
Relay executor scheduling overhead (1 s tick, up to 50 seg/round)
mTLS connection establishment (per segment or per connection pool)
Payload transfer over loopback TCP
ACK write-back from peer to edged

The 1 s schedule tick is the dominant factor for small queues. As queue depth grows, the scheduler delivers multiple segments per tick (up to 50), so segments/sec climbs toward the connection-rate limit.

The T3 and T4 completion checks are pass/fail gated on both delivery count and their stated elapsed-time thresholds:

VAL20-03: all 100 segments within 180 s
VAL20-06: all 10 × 128 KB segments within 60 s

Performance baseline interpretation¶

Metric	Typical loopback value (expected range)
Single segment latency (T1)	1 000 – 3 000 ms
Small segment sps (T2, N=10)	2 – 20 seg/s
Scheduler load sps (T3, N=100)	5 – 40 seg/s
Large payload bps (T4, 128 KB)	5 MB/s – 50 MB/s
Constrained bps (T5, 1 Mbps)	80 KB/s – 120 KB/s

Values outside these ranges indicate either a configuration issue (too few workers, wrong scheduler interval) or a system load anomaly.

7. Run the Benchmark¶

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_relay_throughput_val20_lab.sh

Optional custom evidence directory:

bash scripts/labs/run_relay_throughput_val20_lab.sh \
  "$PWD/evidence/val20-relay-throughput-local-$(date +%F)"

Expected runtime: 3–6 minutes (dominated by T3 100-segment delivery and T5 constrained transfer).

8. Report Template¶

VAL20 — Relay Throughput Benchmark
Date:           <YYYY-MM-DD>
Environment:    <OS, Go version, kernel>
Evidence dir:   <path>

Tier results:
  T1 warmup:        1 seg delivered in <X>ms
  T2 small-10:     10 seg delivered in <X>ms  (sps=<Y>)
  T3 small-100:   100 seg delivered in <X>ms  (sps=<Y>)
  T4 large-10:     10 × 128KB in <X>ms        (bps=<Y>)
  T5 constrained:  10 × 128KB @ 1Mbps in <X>ms (bps=<Y>)

Performance baseline:
  Small segment sps:           <Y> seg/s
  Large segment bytes/sec:     <Y> B/s
  Constrained 1Mbps bytes/sec: <Y> B/s
  Backpressure ratio (T4/T5):  <Y>×

10-check matrix:
  VAL20-01 PASS/FAIL  <detail>
  VAL20-02 PASS/FAIL  <detail>
  VAL20-03 PASS/FAIL  <detail>
  VAL20-04 PASS/FAIL  <detail>
  VAL20-05 PASS/FAIL  <detail>
  VAL20-06 PASS/FAIL  <detail>
  VAL20-07 PASS/FAIL  <detail>
  VAL20-08 PASS/FAIL  <detail>
  VAL20-09 PASS/FAIL  <detail>
  VAL20-10 PASS/FAIL  <detail>

Overall: PASS=<N> FAIL=<N>

9. Tooling¶

File	Role
`scripts/labs/run_relay_throughput_val20_lab.sh`	Benchmark runner (this lab’s entry point)
`scripts/labs/edge_relay_throughput_setup.go`	Setup binary — TLS, BoltDB, PENDING segment seeds
`scripts/labs/relay_impairment_proxy.go`	Reused from VAL19 — clean/bandwidth proxy modes
`scripts/labs/edge_deadletter_lab_peer.go`	Reused from deadletter lab — mTLS peer server

10. Known Limitations¶

1 s scheduler tick floor: T1 latency is always ≥ 1 s due to schedule_interval_seconds=1. This matches production configuration — the benchmark exposes this, not a defect.
BoltDB single-writer lock: each tier restarts edged against a fresh DB directory; concurrent multi-instance benchmarking is out of scope.
Loopback ceiling: bytes/sec figures reflect loopback TCP, not real network. The benchmark establishes a floor; real-network deployment will be lower.
No “packet loss” tier: true packet loss requires tc netem (root/CAP_NET_ADMIN). The cutoff proxy mode (tested in VAL19) is the closest available approximation.