VAL20 — Relay Throughput Benchmark

Audience: engineers and reviewers who want a reproducible local benchmark for edge relay executor throughput. Complements VAL19 (impairment correctness) by characterising raw delivery rate and backpressure behaviour under load.

1. Scope

VAL20 validates three operational goals:

  1. Throughput baseline — measure segments/sec and bytes/sec delivered end-to-end over a loopback mTLS relay session under clean network conditions.

  2. Scheduler scalability — confirm that throughput does not catastrophically degrade as queue depth grows from 1 to 100 segments.

  3. Backpressure evidence — demonstrate that the 1 Mbps bandwidth constraint (via VAL19 impairment proxy) visibly reduces bytes/sec compared to clean mode.

Out of scope:

  • Multi-peer relay (VAL20 uses one peer: peer-val20)

  • Persistent-across-restart durability (covered by VAL11/VAL15)

  • Relay ledger correctness (covered by PR-14/PR-15 deadletter lab)

  • Bandwidth enforcement correctness at the control plane (covered by PR-16)

  • Relay under packet loss / latency / outage (covered by VAL19)

2. Architecture

edged (relay executor)
  → relay_impairment_proxy:19041   ← proxy in clean or bandwidth mode
      → edge_deadletter_lab_peer:19042  ← mTLS receive + JSON evidence

All traffic is loopback. The impairment proxy (relay_impairment_proxy.go) is reused from VAL19 and is kept in clean mode for tiers T1–T4. Tier T5 applies a 1 Mbps bandwidth constraint to demonstrate backpressure.

Port assignments (isolated from VAL19 19030–19033)

Component

Address

edged

127.0.0.1:19040

proxy (edged→)

127.0.0.1:19041

peer server

127.0.0.1:19042

proxy ctrl API

127.0.0.1:19043

Relay executor configuration

The edge.toml generated by edge_relay_throughput_setup.go increases concurrency relative to the deadletter lab defaults:

[scheduler]
max_concurrent_relays = 4
schedule_interval_seconds = 1
max_segments_per_scheduling_round = 50

[relay]
worker_count = 4
dial_timeout_seconds = 5
ack_timeout_seconds = 5

Higher concurrency ensures the benchmark saturates the loopback network rather than the scheduler tick interval.

3. Workload Tiers

Segments are seeded to PENDING state (not deadletter). The relay executor picks them up automatically on the first scheduling round — no relay deadletter retry invocations are needed.

Tier

Label

N

Size

Proxy mode

Purpose

T1

warmup

1

64 B

clean

Baseline single-segment latency

T2

small-10

10

64 B

clean

Baseline segments/sec

T3

small-100

100

64 B

clean

Scheduler throughput under load

T4

large-10

10

128 KB

clean

Bytes/sec characterisation

T5

constrained

10

128 KB

1 Mbps

Backpressure evidence

Each tier uses a fresh edged and peer instance (separate BoltDB and TLS material) to prevent ledger state from one tier affecting another. The impairment proxy is shared and stays up across all tiers.

Segment seeding

edge_relay_throughput_setup.go generates segment IDs of the form val20-${tier}-${N:03d} (e.g. val20-t3-001 through val20-t3-100).

Each segment payload is deterministic:

val20:<segment_id>:AAAA...

(prefix up to 32 bytes; remainder filled with A).

Queue depth monitoring

During T3 (N=100), the runner writes an initial snapshot immediately, samples peer-received.json every 4 seconds while the tier is running, and records a final snapshot at tier completion in t3/queue-depth.jsonl. Each snapshot records:

{"snapshot": 3, "elapsed_s": 12, "received": 48, "pending": 52}

VAL20-04 requires at least two snapshots and verifies that the received sequence is non-decreasing across the full captured window.

4. 10-Check Matrix

ID

Tier

Description

Pass criterion

VAL20-01

T1

Health: edged starts, single segment delivered

received=1, elapsed ≤ 30 s

VAL20-02

T2

Baseline throughput: N=10 × 64 B

all 10 delivered, sps ≥ 0.5 seg/s

VAL20-03

T3

Scheduler load: N=100 × 64 B

all 100 delivered within 180 s

VAL20-04

T3

Queue depth: non-increasing over sample window

at least 2 snapshots, and received[i] received[i+1] throughout

VAL20-05

T2/3

Scaling: T3 sps within 3× of T2 sps

T3 sps ≥ T2 sps / 3

VAL20-06

T4

Large delivery: N=10 × 128 KB delivered

all 10 delivered within 60 s

VAL20-07

T4

Bytes/sec on loopback

bps ≥ 102 400 B/s (100 KB/s)

VAL20-08

T5

Backpressure timing: constrained > clean

T5 elapsed_ms > T4 elapsed_ms

VAL20-09

T4/5

Backpressure rate: 1 Mbps throttle is visible

T5 bps < 50% of T4 bps

VAL20-10

all

Zero loss across all tiers

delivered segment IDs exactly match seeded IDs for every tier

Threshold rationale

  • VAL20-01 ≤ 30 s: conservative; single segment on loopback with 1 s scheduler tick should deliver in < 3 s.

  • VAL20-02 sps ≥ 0.5: 10 segments in ≤ 20 s — safe floor for 4 workers on any CI host.

  • VAL20-03 ≤ 180 s: 100 segments through 4 workers with 1 s tick = ~25 rounds theoretically; 180 s is 5× safety margin.

  • VAL20-05 within 3×: T3 has larger queue so some scheduling overhead is expected; 3× band catches regressions without requiring linear scaling.

  • VAL20-07 ≥ 100 KB/s: 10 × 128 KB = 1.28 MB; 100 KB/s implies ≤ 13 s — well below the 60 s timeout on loopback.

  • VAL20-09 T5 < 50% T4: at 1 Mbps (125 KB/s) constraint vs. loopback (typically >> 10 MB/s), the ratio should be < 2% not just < 50%. The 50% threshold is a conservative floor.

5. Metrics and Evidence

Per-tier evidence directory ($EVIDENCE_DIR/t${N}/)

File

Description

setup.log

Output of setup binary

seed-manifest.json

Seeded segment catalogue with IDs, sizes, tier label

edge.toml

edged configuration for this tier

edged.log

edged relay executor log (scheduling decisions, delivery events)

peer.log

Peer server log (connection + receive events)

peer-received.json

Cumulative delivery evidence (count + segment list)

proxy-stats.json

Impairment proxy stats at tier completion

tier-result.json

Computed metrics: elapsed_ms, sps, bps, zero_loss

queue-depth.jsonl

(T3 only) Queue depth snapshots every 4 seconds

Top-level evidence files

File

Description

build.log

Go build output for all binaries

proxy.log

Impairment proxy log (shared across all tiers)

val20-baseline.json

Consolidated performance baseline: all tier metrics + 10-check

val20-baseline.json schema

{
  "val": "VAL20",
  "date": "<RFC3339>",
  "passes": 10,
  "fails": 0,
  "tiers": [
    {
      "tier": "t2",
      "count": 10,
      "size_bytes": 64,
      "total_bytes": 640,
      "elapsed_ms": 1823,
      "delivered": 10,
      "segs_per_sec": 5.486,
      "bytes_per_sec": 351,
      "zero_loss": true
    }
  ],
  "backpressure_ratio": 42.1,
  "throughput_baseline": {
    "small_segment_sps": 5.486,
    "large_segment_bps": 12845056,
    "constrained_1mbps_bps": 119283
  }
}

backpressure_ratio = T4 bps / T5 bps. A ratio > 10 confirms the 1 Mbps constraint is actively throttling.

6. Throughput Characterisation Method

The measured segments/sec and bytes/sec values are end-to-end from the moment edged is started to the moment all segments appear in peer-received.json. This includes:

  • BoltDB ledger read latency (one scan per scheduling round)

  • Relay executor scheduling overhead (1 s tick, up to 50 seg/round)

  • mTLS connection establishment (per segment or per connection pool)

  • Payload transfer over loopback TCP

  • ACK write-back from peer to edged

The 1 s schedule tick is the dominant factor for small queues. As queue depth grows, the scheduler delivers multiple segments per tick (up to 50), so segments/sec climbs toward the connection-rate limit.

The T3 and T4 completion checks are pass/fail gated on both delivery count and their stated elapsed-time thresholds:

  • VAL20-03: all 100 segments within 180 s

  • VAL20-06: all 10 × 128 KB segments within 60 s

Performance baseline interpretation

Metric

Typical loopback value (expected range)

Single segment latency (T1)

1 000 – 3 000 ms

Small segment sps (T2, N=10)

2 – 20 seg/s

Scheduler load sps (T3, N=100)

5 – 40 seg/s

Large payload bps (T4, 128 KB)

5 MB/s – 50 MB/s

Constrained bps (T5, 1 Mbps)

80 KB/s – 120 KB/s

Values outside these ranges indicate either a configuration issue (too few workers, wrong scheduler interval) or a system load anomaly.

7. Run the Benchmark

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_relay_throughput_val20_lab.sh

Optional custom evidence directory:

bash scripts/labs/run_relay_throughput_val20_lab.sh \
  "$PWD/evidence/val20-relay-throughput-local-$(date +%F)"

Expected runtime: 3–6 minutes (dominated by T3 100-segment delivery and T5 constrained transfer).

8. Report Template

VAL20 — Relay Throughput Benchmark
Date:           <YYYY-MM-DD>
Environment:    <OS, Go version, kernel>
Evidence dir:   <path>

Tier results:
  T1 warmup:        1 seg delivered in <X>ms
  T2 small-10:     10 seg delivered in <X>ms  (sps=<Y>)
  T3 small-100:   100 seg delivered in <X>ms  (sps=<Y>)
  T4 large-10:     10 × 128KB in <X>ms        (bps=<Y>)
  T5 constrained:  10 × 128KB @ 1Mbps in <X>ms (bps=<Y>)

Performance baseline:
  Small segment sps:           <Y> seg/s
  Large segment bytes/sec:     <Y> B/s
  Constrained 1Mbps bytes/sec: <Y> B/s
  Backpressure ratio (T4/T5):  <Y>×

10-check matrix:
  VAL20-01 PASS/FAIL  <detail>
  VAL20-02 PASS/FAIL  <detail>
  VAL20-03 PASS/FAIL  <detail>
  VAL20-04 PASS/FAIL  <detail>
  VAL20-05 PASS/FAIL  <detail>
  VAL20-06 PASS/FAIL  <detail>
  VAL20-07 PASS/FAIL  <detail>
  VAL20-08 PASS/FAIL  <detail>
  VAL20-09 PASS/FAIL  <detail>
  VAL20-10 PASS/FAIL  <detail>

Overall: PASS=<N> FAIL=<N>

9. Tooling

File

Role

scripts/labs/run_relay_throughput_val20_lab.sh

Benchmark runner (this lab’s entry point)

scripts/labs/edge_relay_throughput_setup.go

Setup binary — TLS, BoltDB, PENDING segment seeds

scripts/labs/relay_impairment_proxy.go

Reused from VAL19 — clean/bandwidth proxy modes

scripts/labs/edge_deadletter_lab_peer.go

Reused from deadletter lab — mTLS peer server

10. Known Limitations

  • 1 s scheduler tick floor: T1 latency is always ≥ 1 s due to schedule_interval_seconds=1. This matches production configuration — the benchmark exposes this, not a defect.

  • BoltDB single-writer lock: each tier restarts edged against a fresh DB directory; concurrent multi-instance benchmarking is out of scope.

  • Loopback ceiling: bytes/sec figures reflect loopback TCP, not real network. The benchmark establishes a floor; real-network deployment will be lower.

  • No “packet loss” tier: true packet loss requires tc netem (root/CAP_NET_ADMIN). The cutoff proxy mode (tested in VAL19) is the closest available approximation.