VAL20 — Relay Throughput Benchmark¶
Audience: engineers and reviewers who want a reproducible local benchmark for edge relay executor throughput. Complements VAL19 (impairment correctness) by characterising raw delivery rate and backpressure behaviour under load.
1. Scope¶
VAL20 validates three operational goals:
Throughput baseline — measure segments/sec and bytes/sec delivered end-to-end over a loopback mTLS relay session under clean network conditions.
Scheduler scalability — confirm that throughput does not catastrophically degrade as queue depth grows from 1 to 100 segments.
Backpressure evidence — demonstrate that the 1 Mbps bandwidth constraint (via VAL19 impairment proxy) visibly reduces bytes/sec compared to clean mode.
Out of scope:
Multi-peer relay (VAL20 uses one peer:
peer-val20)Persistent-across-restart durability (covered by VAL11/VAL15)
Relay ledger correctness (covered by PR-14/PR-15 deadletter lab)
Bandwidth enforcement correctness at the control plane (covered by PR-16)
Relay under packet loss / latency / outage (covered by VAL19)
2. Architecture¶
edged (relay executor)
→ relay_impairment_proxy:19041 ← proxy in clean or bandwidth mode
→ edge_deadletter_lab_peer:19042 ← mTLS receive + JSON evidence
All traffic is loopback. The impairment proxy (relay_impairment_proxy.go)
is reused from VAL19 and is kept in clean mode for tiers T1–T4. Tier T5
applies a 1 Mbps bandwidth constraint to demonstrate backpressure.
Port assignments (isolated from VAL19 19030–19033)¶
Component |
Address |
|---|---|
edged |
|
proxy (edged→) |
|
peer server |
|
proxy ctrl API |
|
Relay executor configuration¶
The edge.toml generated by edge_relay_throughput_setup.go increases
concurrency relative to the deadletter lab defaults:
[scheduler]
max_concurrent_relays = 4
schedule_interval_seconds = 1
max_segments_per_scheduling_round = 50
[relay]
worker_count = 4
dial_timeout_seconds = 5
ack_timeout_seconds = 5
Higher concurrency ensures the benchmark saturates the loopback network rather than the scheduler tick interval.
3. Workload Tiers¶
Segments are seeded to PENDING state (not deadletter). The relay executor
picks them up automatically on the first scheduling round — no relay deadletter retry invocations are needed.
Tier |
Label |
N |
Size |
Proxy mode |
Purpose |
|---|---|---|---|---|---|
T1 |
warmup |
1 |
64 B |
clean |
Baseline single-segment latency |
T2 |
small-10 |
10 |
64 B |
clean |
Baseline segments/sec |
T3 |
small-100 |
100 |
64 B |
clean |
Scheduler throughput under load |
T4 |
large-10 |
10 |
128 KB |
clean |
Bytes/sec characterisation |
T5 |
constrained |
10 |
128 KB |
1 Mbps |
Backpressure evidence |
Each tier uses a fresh edged and peer instance (separate BoltDB and TLS
material) to prevent ledger state from one tier affecting another. The
impairment proxy is shared and stays up across all tiers.
Segment seeding¶
edge_relay_throughput_setup.go generates segment IDs of the form
val20-${tier}-${N:03d} (e.g. val20-t3-001 through val20-t3-100).
Each segment payload is deterministic:
val20:<segment_id>:AAAA...
(prefix up to 32 bytes; remainder filled with A).
Queue depth monitoring¶
During T3 (N=100), the runner writes an initial snapshot immediately, samples
peer-received.json every 4 seconds while the tier is running, and records a
final snapshot at tier completion in t3/queue-depth.jsonl. Each snapshot
records:
{"snapshot": 3, "elapsed_s": 12, "received": 48, "pending": 52}
VAL20-04 requires at least two snapshots and verifies that the received
sequence is non-decreasing across the full captured window.
4. 10-Check Matrix¶
ID |
Tier |
Description |
Pass criterion |
|---|---|---|---|
VAL20-01 |
T1 |
Health: edged starts, single segment delivered |
received=1, elapsed ≤ 30 s |
VAL20-02 |
T2 |
Baseline throughput: N=10 × 64 B |
all 10 delivered, sps ≥ 0.5 seg/s |
VAL20-03 |
T3 |
Scheduler load: N=100 × 64 B |
all 100 delivered within 180 s |
VAL20-04 |
T3 |
Queue depth: non-increasing over sample window |
at least 2 snapshots, and |
VAL20-05 |
T2/3 |
Scaling: T3 sps within 3× of T2 sps |
T3 sps ≥ T2 sps / 3 |
VAL20-06 |
T4 |
Large delivery: N=10 × 128 KB delivered |
all 10 delivered within 60 s |
VAL20-07 |
T4 |
Bytes/sec on loopback |
bps ≥ 102 400 B/s (100 KB/s) |
VAL20-08 |
T5 |
Backpressure timing: constrained > clean |
T5 elapsed_ms > T4 elapsed_ms |
VAL20-09 |
T4/5 |
Backpressure rate: 1 Mbps throttle is visible |
T5 bps < 50% of T4 bps |
VAL20-10 |
all |
Zero loss across all tiers |
delivered segment IDs exactly match seeded IDs for every tier |
Threshold rationale¶
VAL20-01 ≤ 30 s: conservative; single segment on loopback with 1 s scheduler tick should deliver in < 3 s.
VAL20-02 sps ≥ 0.5: 10 segments in ≤ 20 s — safe floor for 4 workers on any CI host.
VAL20-03 ≤ 180 s: 100 segments through 4 workers with 1 s tick = ~25 rounds theoretically; 180 s is 5× safety margin.
VAL20-05 within 3×: T3 has larger queue so some scheduling overhead is expected; 3× band catches regressions without requiring linear scaling.
VAL20-07 ≥ 100 KB/s: 10 × 128 KB = 1.28 MB; 100 KB/s implies ≤ 13 s — well below the 60 s timeout on loopback.
VAL20-09 T5 < 50% T4: at 1 Mbps (125 KB/s) constraint vs. loopback (typically >> 10 MB/s), the ratio should be < 2% not just < 50%. The 50% threshold is a conservative floor.
5. Metrics and Evidence¶
Per-tier evidence directory ($EVIDENCE_DIR/t${N}/)¶
File |
Description |
|---|---|
|
Output of setup binary |
|
Seeded segment catalogue with IDs, sizes, tier label |
|
edged configuration for this tier |
|
edged relay executor log (scheduling decisions, delivery events) |
|
Peer server log (connection + receive events) |
|
Cumulative delivery evidence (count + segment list) |
|
Impairment proxy stats at tier completion |
|
Computed metrics: elapsed_ms, sps, bps, zero_loss |
|
(T3 only) Queue depth snapshots every 4 seconds |
Top-level evidence files¶
File |
Description |
|---|---|
|
Go build output for all binaries |
|
Impairment proxy log (shared across all tiers) |
|
Consolidated performance baseline: all tier metrics + 10-check |
val20-baseline.json schema¶
{
"val": "VAL20",
"date": "<RFC3339>",
"passes": 10,
"fails": 0,
"tiers": [
{
"tier": "t2",
"count": 10,
"size_bytes": 64,
"total_bytes": 640,
"elapsed_ms": 1823,
"delivered": 10,
"segs_per_sec": 5.486,
"bytes_per_sec": 351,
"zero_loss": true
}
],
"backpressure_ratio": 42.1,
"throughput_baseline": {
"small_segment_sps": 5.486,
"large_segment_bps": 12845056,
"constrained_1mbps_bps": 119283
}
}
backpressure_ratio = T4 bps / T5 bps. A ratio > 10 confirms the 1 Mbps
constraint is actively throttling.
6. Throughput Characterisation Method¶
The measured segments/sec and bytes/sec values are end-to-end from the
moment edged is started to the moment all segments appear in
peer-received.json. This includes:
BoltDB ledger read latency (one scan per scheduling round)
Relay executor scheduling overhead (1 s tick, up to 50 seg/round)
mTLS connection establishment (per segment or per connection pool)
Payload transfer over loopback TCP
ACK write-back from peer to edged
The 1 s schedule tick is the dominant factor for small queues. As queue depth grows, the scheduler delivers multiple segments per tick (up to 50), so segments/sec climbs toward the connection-rate limit.
The T3 and T4 completion checks are pass/fail gated on both delivery count and their stated elapsed-time thresholds:
VAL20-03: all 100 segments within 180 sVAL20-06: all 10 × 128 KB segments within 60 s
Performance baseline interpretation¶
Metric |
Typical loopback value (expected range) |
|---|---|
Single segment latency (T1) |
1 000 – 3 000 ms |
Small segment sps (T2, N=10) |
2 – 20 seg/s |
Scheduler load sps (T3, N=100) |
5 – 40 seg/s |
Large payload bps (T4, 128 KB) |
5 MB/s – 50 MB/s |
Constrained bps (T5, 1 Mbps) |
80 KB/s – 120 KB/s |
Values outside these ranges indicate either a configuration issue (too few workers, wrong scheduler interval) or a system load anomaly.
7. Run the Benchmark¶
export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local
bash scripts/labs/run_relay_throughput_val20_lab.sh
Optional custom evidence directory:
bash scripts/labs/run_relay_throughput_val20_lab.sh \
"$PWD/evidence/val20-relay-throughput-local-$(date +%F)"
Expected runtime: 3–6 minutes (dominated by T3 100-segment delivery and T5 constrained transfer).
8. Report Template¶
VAL20 — Relay Throughput Benchmark
Date: <YYYY-MM-DD>
Environment: <OS, Go version, kernel>
Evidence dir: <path>
Tier results:
T1 warmup: 1 seg delivered in <X>ms
T2 small-10: 10 seg delivered in <X>ms (sps=<Y>)
T3 small-100: 100 seg delivered in <X>ms (sps=<Y>)
T4 large-10: 10 × 128KB in <X>ms (bps=<Y>)
T5 constrained: 10 × 128KB @ 1Mbps in <X>ms (bps=<Y>)
Performance baseline:
Small segment sps: <Y> seg/s
Large segment bytes/sec: <Y> B/s
Constrained 1Mbps bytes/sec: <Y> B/s
Backpressure ratio (T4/T5): <Y>×
10-check matrix:
VAL20-01 PASS/FAIL <detail>
VAL20-02 PASS/FAIL <detail>
VAL20-03 PASS/FAIL <detail>
VAL20-04 PASS/FAIL <detail>
VAL20-05 PASS/FAIL <detail>
VAL20-06 PASS/FAIL <detail>
VAL20-07 PASS/FAIL <detail>
VAL20-08 PASS/FAIL <detail>
VAL20-09 PASS/FAIL <detail>
VAL20-10 PASS/FAIL <detail>
Overall: PASS=<N> FAIL=<N>
9. Tooling¶
File |
Role |
|---|---|
|
Benchmark runner (this lab’s entry point) |
|
Setup binary — TLS, BoltDB, PENDING segment seeds |
|
Reused from VAL19 — clean/bandwidth proxy modes |
|
Reused from deadletter lab — mTLS peer server |
10. Known Limitations¶
1 s scheduler tick floor: T1 latency is always ≥ 1 s due to
schedule_interval_seconds=1. This matches production configuration — the benchmark exposes this, not a defect.BoltDB single-writer lock: each tier restarts
edgedagainst a fresh DB directory; concurrent multi-instance benchmarking is out of scope.Loopback ceiling: bytes/sec figures reflect loopback TCP, not real network. The benchmark establishes a floor; real-network deployment will be lower.
No “packet loss” tier: true packet loss requires
tc netem(root/CAP_NET_ADMIN). Thecutoffproxy mode (tested in VAL19) is the closest available approximation.