VAL 07 — Fleet Rollout Latency Baseline¶
1. Purpose and Claims¶
This validation establishes a reproducible latency baseline for the operator-facing rollout API and proves the system meets the workplan performance targets under normal and modest-concurrency conditions.
# |
Claim |
|---|---|
VAL07-C1 |
The control plane starts, becomes reachable, and responds to health probes |
VAL07-C2 |
Rollout plan creation ( |
VAL07-C3 |
Rollout plan listing ( |
VAL07-C4 |
Under modest concurrency (5 parallel plan creates), all requests succeed and the total wall-clock time is ≤ 2000 ms, proving operator-facing responsiveness is not blocked by single-writer SQLite serialisation |
Relationship to workplan:
The workplan lists rollout plan creation latency < 500ms p99 as a Proposed
validation target (Decision Tier: Proposed validation target) and states that
the proof artifact required for Gate D is a “Performance baseline report with
p50/p95/p99 latencies.” VAL07 produces that baseline report for the local-
SQLite control plane running in the lab environment.
What these numbers mean:
VAL07 is a local-lab baseline, not a production load test. The bounds are deliberately generous (100/300/500 ms) for an in-process SQLite store on development hardware. They prove the system is not slow by accident — a mis-wired handler, a missing index, or an unintentional synchronous fsync would show up as latency regressions against these bounds. Production sizing against a PostgreSQL store at network scale is out of scope here.
2. Existing Lab Coverage¶
Question |
Answer |
|---|---|
Is this covered by an existing LAB? |
No. The |
Which LAB/evidence bundle is extended? |
|
Why a new function rather than extending |
The rollout lab kills its control-plane and cleans up at the end of its phase. VAL07 needs a fresh, isolated control-plane instance so latency measurements are not contaminated by prior write history. Using a different port (18992) guarantees no socket conflicts with any other lab phase. |
3. Benchmark Architecture¶
Tooling¶
Tool |
Purpose |
|---|---|
|
Per-request HTTP timing (fractional seconds, microsecond precision) |
Python (inline heredoc) |
Percentile computation using the nearest-rank method |
|
Concurrent batch wall-clock measurement |
|
Prometheus counter extraction for VAL07-09 |
Percentile method¶
The nearest-rank method is used with N=20 samples:
idx = max(0, min(int(p × N / 100 + 0.5) − 1, N − 1))
With 20 samples, p99 maps to the maximum observation. This is intentional: the test deliberately sizes the sample so that the stated p99 bound is the maximum tolerated latency, not a statistical approximation. Any single request that exceeds 500 ms will cause VAL07-04 to fail.
Measurement scope¶
curl time_total measures the full round-trip from connection establishment
through response body receipt. It includes:
TCP connect (loopback → negligible)
HTTP request serialisation
Server-side handler: JSON unmarshal → SQLite write → JSON marshal
HTTP response serialisation
It excludes DNS resolution (not applicable for 127.0.0.1) and TLS handshake (VAL07 runs over plain HTTP; TLS is covered in the cert lab).
Control plane isolation¶
VAL07 starts a dedicated control-plane instance:
Listen:
127.0.0.1:18992Metrics:
127.0.0.1:19092Data dir:
$WORK_DIR/val07(removed and recreated on each run so the SQLite database is fresh and has no prior state)Lifecycle: started at the beginning of
run_rollout_latency_val07_lab(), killed withkill+waitat the end — does not share the global$ORCHESTRATOR_PIDcleanup path
SQLite serialisation note¶
The control plane uses a single SQLite connection (SetMaxOpenConns(1)) to
serialise all writes. Concurrent POST /v1/rollouts requests are therefore
serialised by the database layer. VAL07-06/07 prove that this serialisation
does not cause client-visible errors or unacceptable latency at N=5 concurrency
— the expected pattern is sequential queuing, not failure.
4. Harness¶
VAL07 is implemented as run_rollout_latency_val07_lab() in
scripts/labs/run_cli_audit_lab.sh. It runs after run_support_bundle_val06_lab
as the final validation slice.
No prior lab state is required. VAL07 creates its own fresh control-plane
and SQLite database. It does not read from $AUTONOMY_AUDIT_DIR or depend on
any other lab phase.
Evidence directory: $EVIDENCE_DIR/val07/
5. Exact Scenarios¶
VAL07-01 — Control Plane Reachable¶
Purpose: Confirm that the dedicated VAL07 control plane starts successfully and responds to health probes before the benchmark begins. A failure here indicates a port conflict or binary build problem, not a latency issue.
Action:
autonomy-orchestrator serve \
--listen 127.0.0.1:18992 \
--metrics-addr 127.0.0.1:19092 \
--data-dir $WORK_DIR/val07 \
--log-format text
curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:18992/v1/health
Evidence file: val07/val07-health.txt — health_code, pass
Pass criterion: health_code=200.
VAL07-02, 03, 04 — Plan-Create Latency (p50/p95/p99)¶
Purpose: Establish the latency profile for POST /v1/rollouts across 20
sequential requests and assert each percentile against its bound. VAL07-04
(p99 ≤ 500 ms) is the primary workplan target.
Action:
For i in 1..20:
curl -s -o /dev/null -w '%{http_code} %{time_total}\n' \
-X POST -H 'Content-Type: application/json' \
-d '{"metadata":{"id":"val07-plan-NNN"},...}' \
http://127.0.0.1:18992/v1/rollouts
Each response line is appended to val07-create-raw.txt (format: <code> <seconds>). Only 2xx responses contribute to the timing sample. Python
computes p50/p95/p99 using the nearest-rank method, and the validation requires
the full 20 successful timing samples to be present before any percentile check
can pass.
Evidence files:
val07/val07-create-raw.txt— 20 raw<code> <seconds>linesval07/val07-create-percentiles.txt—expected_n,n,sample_complete,p50_ms,p95_ms,p99_ms,min_ms,max_ms
Pass criteria:
Check |
Bound |
|---|---|
VAL07-02 |
|
VAL07-03 |
|
VAL07-04 |
|
VAL07-05 — Plan-List Latency (p99)¶
Purpose: Confirm that GET /v1/rollouts (list all plans) meets the latency
bound when 20 plans exist in the store. The list path scans the SQLite events
table and marshals all records; this exercises a realistic operator query.
Action:
For i in 1..20:
curl -s -o /dev/null -w '%{http_code} %{time_total}\n' \
http://127.0.0.1:18992/v1/rollouts
Evidence files:
val07/val07-list-raw.txt— 20 raw linesval07/val07-list-percentiles.txt—expected_n,n,sample_complete,p50_ms,p95_ms,p99_ms,min_ms,max_ms
Pass criterion: n=20 and p99_ms ≤ 500.
VAL07-06 + 07 — Concurrent Plan Creates¶
Purpose: Prove operator-facing responsiveness under modest concurrency. 5
parallel POST /v1/rollouts requests are launched as background curl processes;
all must return 2xx (VAL07-06) and the total wall-clock time must be ≤ 2000 ms
(VAL07-07). This documents the expected serialisation behaviour of the single-
writer SQLite connection without treating it as a failure.
Action:
t_start=$(date +%s%3N)
for j in 1..5:
curl ... POST /v1/rollouts & # concurrent
done
wait
t_end=$(date +%s%3N)
Evidence files:
val07/val07-concurrent-raw.txt— 5 lines (one per background curl)val07/val07-concurrent-summary.txt—concurrent_n,conc_ok,conc_errors,wall_ms,bound_ms=2000,wall_pass
Pass criteria:
VAL07-06:
conc_errors=0andconc_ok=5VAL07-07:
wall_ms≤ 2000
VAL07-08 — Zero Client Errors¶
Purpose: Confirm that no benchmark request returned a non-2xx HTTP status code. Counts errors from all three phases: sequential creates (20), sequential lists (20), and concurrent creates (5) — 45 requests total.
Evidence file: val07/val07-error-summary.txt — total_requests, error_count, pass
Pass criterion: error_count = 0.
VAL07-09 — Prometheus Observations¶
Purpose: Confirm that the Prometheus cp_http_requests_total counter
reflects the benchmark traffic, proving Prometheus instrumentation is wired and
receiving real observations from the VAL07 control-plane instance.
Action:
curl -fsS http://127.0.0.1:19092/metrics | grep '^cp_http_requests_total'
Evidence file: val07/val07-prometheus-check.txt — cp_http_requests_total, pass
Pass criterion: cp_http_requests_total > 0.
6. Evidence Files¶
All files are written to $EVIDENCE_DIR/val07/.
File |
Produced by |
Contains |
|---|---|---|
|
orchestrator stdout+stderr |
Startup logs and per-request log lines |
|
|
|
|
create loop |
20 lines: |
|
Python percentile script |
|
|
list loop |
20 lines: |
|
Python percentile script |
|
|
concurrent curl batch |
5 lines: |
|
|
|
|
error counter |
|
|
|
Prometheus text exposition from VAL07 CP |
|
|
|
|
composite report |
9-check PASS/FAIL + latency summary |
|
composite report |
Machine-readable JSON with latency objects |
7. Pass/Fail Criteria¶
Check ID |
Name |
File |
Pass condition |
|---|---|---|---|
VAL07-01 |
orchestrator_reachable |
|
|
VAL07-02 |
plan_create_p50 |
|
|
VAL07-03 |
plan_create_p95 |
|
|
VAL07-04 |
plan_create_p99 |
|
|
VAL07-05 |
plan_list_p99 |
|
|
VAL07-06 |
concurrent_success |
|
|
VAL07-07 |
concurrent_wall_clock |
|
|
VAL07-08 |
zero_client_errors |
|
|
VAL07-09 |
prometheus_observations |
|
|
Overall pass: all 9 checks pass and val07-report.txt reports
pass=9 fail=0 total=9.
Failure handling:
VAL07-01 fails: inspect
val07-cp.logfor startup errors; check whether port 18992 is already in use by another processVAL07-02..04 fail (latency bound exceeded or incomplete sample): inspect
val07-create-percentiles.txt; ifn < 20, the control plane returned non-2xx responses and the latency corpus is incomplete, so cross-referenceval07-create-raw.txtbefore interpreting the percentile numbers; ifmax_msis very high butp95_msis within bound, investigate what caused the outlier (disk I/O spike, OS scheduling, GC pause); consider whether the SQLite WAL sync setting is appropriate for the test environmentVAL07-05 fails (list latency or incomplete sample): inspect
val07-list-percentiles.txt; ifn < 20, one or more list requests failed; otherwise, if latency grows with plan count, the list endpoint may be performing a full table scan without a suitable indexVAL07-06 fails (
conc_errors > 0): inspectval07-concurrent-raw.txtfor the failing HTTP codes; a409 Conflictindicates duplicate plan IDs; a503indicates the single-writer connection is returning errors rather than queuing — investigatestore.goVAL07-07 fails (
wall_ms > 2000): the 5 concurrent requests are serialised through a single SQLite connection; if wall time > 2000 ms, each request is taking > 400 ms on average, which should already be caught by VAL07-04 in the sequential phaseVAL07-08 fails: cross-reference the error count with
val07-create-raw.txtandval07-list-raw.txtto identify which operation returned a non-2xx codeVAL07-09 fails: inspect
val07-metrics-raw.txt; if the file is empty, the metrics server failed to start (checkval07-cp.logfor--metrics-addrbinding errors); if the file contains metrics butcp_http_requests_totalis missing, the Prometheus instrumentation was not registered for this endpoint
8. Report Template¶
# VAL 07 — Fleet Rollout Latency Baseline Report
timestamp: 2026-03-21T10:00:00Z
samples_per_operation: 20
## Results
VAL07-01 orchestrator_reachable: PASS
VAL07-02 plan_create_p50: PASS (p50=2ms bound=100ms)
VAL07-03 plan_create_p95: PASS (p95=5ms bound=300ms)
VAL07-04 plan_create_p99: PASS (p99=8ms bound=500ms)
VAL07-05 plan_list_p99: PASS (p99=4ms bound=500ms)
VAL07-06 concurrent_success: PASS (5/5 succeeded)
VAL07-07 concurrent_wall_clock: PASS (elapsed=42ms bound=2000ms)
VAL07-08 zero_client_errors: PASS (errors=0)
VAL07-09 prometheus_observations: PASS (count=45)
## Summary
pass=9 fail=0 total=9
The runner also prints
VAL 07: pass=9 fail=0 total=9 (report: val07-report.txt) to stdout so CI
log scanners can grep for VAL 07: pass= without parsing the report file.
The machine-readable val07-report.json includes the full latency objects:
{
"plan_create_ms": {"p50": 2, "p95": 5, "p99": 8},
"plan_list_ms": {"p99": 4},
"concurrent": {"n": 5, "ok": 5, "wall_ms": 42}
}
9. Environment Assumptions¶
Assumption |
Detail |
|---|---|
Hardware |
Development machine or CI runner with local SSD; loopback TCP |
Control plane |
In-process SQLite with |
Isolation |
Fresh data directory ( |
RBAC |
|
TLS |
Plain HTTP on loopback; TLS overhead not included |
Concurrency |
N=5 concurrent creates; not a saturation test |
Sample size |
N=20 per operation; p99 = max observation with this sample size |
The bounds are chosen to be passing under normal development hardware while still catching regressions:
Operation |
p50 bound |
p95 bound |
p99 bound |
Rationale |
|---|---|---|---|---|
POST /v1/rollouts |
100 ms |
300 ms |
500 ms |
Workplan target; typical measured value is 1–10 ms |
GET /v1/rollouts |
— |
— |
500 ms |
Read path; typically faster than write path |
5× concurrent wall clock |
— |
— |
2000 ms |
5 × p99 bound; confirms queuing, not failure |
10. How to Run¶
VAL07 executes automatically as the final validation slice when the full lab is run:
export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local
bash scripts/labs/run_cli_audit_lab.sh
To inspect results after a run:
# Quick pass/fail
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-report.txt
# Latency numbers
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-create-percentiles.txt
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-list-percentiles.txt
# Concurrent test result
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-concurrent-summary.txt
# Error count
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-error-summary.txt
# Prometheus counter
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-prometheus-check.txt
# Machine-readable report with latency objects
jq '{pass_count, fail_count, plan_create_ms, plan_list_ms, concurrent}' \
evidence/pr17-cli-audit-local-2026-03-17/val07/val07-report.json
# Raw timing data
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-create-raw.txt