VAL 08 — Fleet Rollout Throughput Validation¶
Purpose¶
This plan validates that the AutonomyOps control-plane can accept concurrent rollout plan creation at the workplan target of ≥100 concurrent device rollouts without errors.
It measures throughput at four concurrency tiers (N=1, 10, 50, 100 workers), documents the scaling behaviour of the SQLite-backed store under load, and produces a repeatable evidence record that is safe to quote in design-partner conversations.
Claims Under Test¶
ID |
Claim |
|---|---|
VAL08-C1 |
The control-plane accepts 500 concurrent plan creates (100 workers × 5 plans each) with zero errors |
VAL08-C2 |
All plans created across the full N=1/10/50/100 matrix are stored durably and appear in paginated |
VAL08-C3 |
The entire N=100 scenario completes within 30 seconds wall-clock |
VAL08-C4 |
Throughput at N=100 is ≥ throughput at N=1 (no regression under concurrency) |
Why a New Function (Not Extending VAL07)¶
Concern |
VAL07 |
VAL08 |
|---|---|---|
Focus |
Single-operation latency (p50/p95/p99) |
Throughput at increasing concurrency |
Plan history |
Polluted by 45 plans from VAL07 sequential + concurrent batches |
Clean DB (removed and recreated before each run) |
Concurrency tiers |
Fixed N=5 for wall-clock check |
N=1, 10, 50, 100 (scenario matrix) |
Workplan target |
|
|
Using a fresh CP avoids accumulated plan history from influencing the list-consistency check and isolates timing measurements.
Architecture¶
Tooling¶
Tool |
Purpose |
|---|---|
|
Concurrent worker simulation — N subshells, each creating |
|
HTTP status capture without response body overhead |
|
Wall-clock millisecond timestamps bracketing each scenario |
|
Throughput arithmetic (plans/sec = total / elapsed_s) |
Per-worker temp files ( |
Error count aggregation across concurrent subshells |
Port Allocation¶
Resource |
Value |
|---|---|
CP listen |
|
Metrics |
|
Data dir |
|
Concurrency Model¶
Each scenario launches N bash background subshells. Each subshell POSTs batch_per_worker
(5) plans sequentially using deterministic plan IDs (val08-n${N}-w${w}-b${b}). All
subshells are awaited with wait before wall-clock is stopped.
Plan IDs are globally unique across all scenarios because N, w, and b together form a
unique triple.
SQLite Serialisation Note¶
The control-plane uses modernc.org/sqlite with SetMaxOpenConns(1), serialising all
writes to a single connection. This is the expected bottleneck. VAL08-07 checks that
throughput at N=100 does not drop below N=1 — a plateau is acceptable and expected; a
regression would indicate lock contention or a panic loop.
Environment Assumptions¶
Assumption |
Value |
|---|---|
Platform |
Linux ( |
CP binary |
|
Transport |
Plain HTTP (no TLS) |
RBAC enforcement |
Not set (bootstrap-mode; see lab script CP invocation) |
Workers per scenario |
N ∈ {1, 10, 50, 100} |
Plans per worker |
5 (sequential within worker) |
Total plans created |
(1+10+50+100) × 5 = 805 |
Wall-clock bound (N=100) |
30 s |
Scenario Matrix¶
VAL08-01 — Control-Plane Reachable¶
Action: GET /v1/health against dedicated CP at 18993.
Evidence: val08-health.txt
Pass criterion: HTTP 200.
VAL08-02 — N=1 Scenario: Zero Errors¶
Action: 1 worker creates 5 plans sequentially.
Evidence: scenario-n1/scenario-report.txt, scenario-n1/worker-1.txt
Pass criterion: errors=0, ok=5.
VAL08-03 — N=10 Scenario: Zero Errors¶
Action: 10 concurrent workers each create 5 plans (50 total).
Evidence: scenario-n10/scenario-report.txt, scenario-n10/worker-{1..10}.txt
Pass criterion: errors=0, ok=50.
VAL08-04 — N=50 Scenario: Zero Errors¶
Action: 50 concurrent workers each create 5 plans (250 total).
Evidence: scenario-n50/scenario-report.txt, scenario-n50/worker-{1..50}.txt
Pass criterion: errors=0, ok=250.
VAL08-05 — N=100 Scenario: Zero Errors (workplan target)¶
Action: 100 concurrent workers each create 5 plans (500 total).
Evidence: scenario-n100/scenario-report.txt, scenario-n100/worker-{1..100}.txt
Pass criterion: errors=0, ok=500.
Workplan reference: “≥100 concurrent device rollouts (proposed validation target)”
VAL08-06 — N=100 Wall-Clock ≤ 30 s¶
Action: Measure elapsed time for the N=100 scenario.
Evidence: val08-wall-clock-n100.txt
Pass criterion: elapsed_ms ≤ 30000.
VAL08-07 — Throughput Scaling (N=100 ≥ N=1)¶
Action: Compare tput_n100 (plans/sec) to tput_n1.
Evidence: val08-throughput-scaling.txt
Pass criterion: tput_n100 ≥ tput_n1.
Rationale: Verifies that issuing more concurrent requests does not make throughput worse.
A plateau (equal throughput) is acceptable given SQLite’s single-writer model.
VAL08-08 — Aggregate Zero Errors¶
Action: Sum error counts across all four scenarios.
Evidence: val08-error-aggregate.txt
Pass criterion: total_errors=0.
VAL08-09 — List Count Consistent¶
Action: Page through GET /v1/rollouts?limit=100 after all scenarios complete and compare the
accumulated .plans[] count to grand_total − total_errors.
Evidence: val08-list-consistency.txt
Pass criterion: list_count ≥ expected_min.
VAL08-10 — Prometheus Observations Recorded¶
Action: Scrape http://127.0.0.1:19093/metrics; sum cp_http_requests_total.
Evidence: val08-metrics-raw.txt, val08-prometheus-check.txt
Pass criterion: cp_http_requests_total > 0.
Evidence Files¶
File |
Description |
|---|---|
|
Control-plane stdout/stderr |
|
Health check result |
|
N=1 throughput + error summary |
|
N=10 throughput + error summary |
|
N=50 throughput + error summary |
|
N=100 throughput + error summary |
|
Per-worker error counts (one file per worker) |
|
Elapsed ms for N=100 scenario |
|
Throughput at all four tiers + scaling pass flag |
|
Total error count across all scenarios |
|
List endpoint count vs expected minimum |
|
Raw Prometheus scrape |
|
|
|
Human-readable composite report (10 checks) |
|
Machine-readable composite report |
Pass/Fail Criteria¶
Full pass: All 10 checks report PASS.
Minimum acceptable: VAL08-01, VAL08-05, VAL08-08 pass (workplan target + zero-error guarantee). Remaining checks are context for performance characterisation.
Key thresholds:
Check |
Threshold |
|---|---|
VAL08-05 (N=100 success) |
|
VAL08-06 (wall clock) |
|
VAL08-07 (scaling) |
|
VAL08-08 (aggregate) |
|
Failure Handling¶
Symptom |
Likely Cause |
Resolution |
|---|---|---|
VAL08-01 FAIL |
CP binary missing or port conflict |
Check |
VAL08-05 FAIL, errors > 0 |
SQLite write contention returning 5xx |
Inspect |
VAL08-06 FAIL, elapsed > 30s |
Very slow host or high swap |
Note hardware; bound is intentionally generous (30s for 500 creates) |
VAL08-07 FAIL, tput_n100 < tput_n1 |
Serialisation regression or panic loop |
Check CP logs for errors during N=100 scenario |
VAL08-09 FAIL, list_count low |
Missing pages, failed creates, or list-path regression |
Check |