VAL 07 — Fleet Rollout Latency Baseline¶

1. Purpose and Claims¶

This validation establishes a reproducible latency baseline for the operator-facing rollout API and proves the system meets the workplan performance targets under normal and modest-concurrency conditions.

#	Claim
VAL07-C1	The control plane starts, becomes reachable, and responds to health probes
VAL07-C2	Rollout plan creation (`POST /v1/rollouts`) meets the workplan latency target: p50 ≤ 100 ms, p95 ≤ 300 ms, p99 ≤ 500 ms (20 sequential samples)
VAL07-C3	Rollout plan listing (`GET /v1/rollouts`) meets the same latency target: p99 ≤ 500 ms (20 sequential samples after 20 plans exist)
VAL07-C4	Under modest concurrency (5 parallel plan creates), all requests succeed and the total wall-clock time is ≤ 2000 ms, proving operator-facing responsiveness is not blocked by single-writer SQLite serialisation

Relationship to workplan:

The workplan lists rollout plan creation latency < 500ms p99 as a Proposed validation target (Decision Tier: Proposed validation target) and states that the proof artifact required for Gate D is a “Performance baseline report with p50/p95/p99 latencies.” VAL07 produces that baseline report for the local- SQLite control plane running in the lab environment.

What these numbers mean:

VAL07 is a local-lab baseline, not a production load test. The bounds are deliberately generous (100/300/500 ms) for an in-process SQLite store on development hardware. They prove the system is not slow by accident — a mis-wired handler, a missing index, or an unintentional synchronous fsync would show up as latency regressions against these bounds. Production sizing against a PostgreSQL store at network scale is out of scope here.

2. Existing Lab Coverage¶

Question	Answer
Is this covered by an existing LAB?	No. The `run_rollout_lab()` phase in `run_cli_audit_lab.sh` exercises rollout plan create/publish/cancel but takes no timing measurements.
Which LAB/evidence bundle is extended?	`scripts/labs/run_cli_audit_lab.sh` — new function `run_rollout_latency_val07_lab()` added as the final slice.
Why a new function rather than extending `run_rollout_lab()`?	The rollout lab kills its control-plane and cleans up at the end of its phase. VAL07 needs a fresh, isolated control-plane instance so latency measurements are not contaminated by prior write history. Using a different port (18992) guarantees no socket conflicts with any other lab phase.

3. Benchmark Architecture¶

Tooling¶

Tool	Purpose
`curl -s -o /dev/null -w '%{http_code} %{time_total}'`	Per-request HTTP timing (fractional seconds, microsecond precision)
Python (inline heredoc)	Percentile computation using the nearest-rank method
`date +%s%3N` (millisecond wall clock)	Concurrent batch wall-clock measurement
`grep '^cp_http_requests_total'`	Prometheus counter extraction for VAL07-09

Percentile method¶

The nearest-rank method is used with N=20 samples:

idx = max(0, min(int(p × N / 100 + 0.5) − 1, N − 1))

With 20 samples, p99 maps to the maximum observation. This is intentional: the test deliberately sizes the sample so that the stated p99 bound is the maximum tolerated latency, not a statistical approximation. Any single request that exceeds 500 ms will cause VAL07-04 to fail.

Measurement scope¶

curl time_total measures the full round-trip from connection establishment through response body receipt. It includes:

TCP connect (loopback → negligible)
HTTP request serialisation
Server-side handler: JSON unmarshal → SQLite write → JSON marshal
HTTP response serialisation

It excludes DNS resolution (not applicable for 127.0.0.1) and TLS handshake (VAL07 runs over plain HTTP; TLS is covered in the cert lab).

Control plane isolation¶

VAL07 starts a dedicated control-plane instance:

Listen: 127.0.0.1:18992
Metrics: 127.0.0.1:19092
Data dir: $WORK_DIR/val07 (removed and recreated on each run so the SQLite database is fresh and has no prior state)
Lifecycle: started at the beginning of run_rollout_latency_val07_lab(), killed with kill+wait at the end — does not share the global $ORCHESTRATOR_PID cleanup path

SQLite serialisation note¶

The control plane uses a single SQLite connection (SetMaxOpenConns(1)) to serialise all writes. Concurrent POST /v1/rollouts requests are therefore serialised by the database layer. VAL07-06/07 prove that this serialisation does not cause client-visible errors or unacceptable latency at N=5 concurrency — the expected pattern is sequential queuing, not failure.

4. Harness¶

VAL07 is implemented as run_rollout_latency_val07_lab() in scripts/labs/run_cli_audit_lab.sh. It runs after run_support_bundle_val06_lab as the final validation slice.

No prior lab state is required. VAL07 creates its own fresh control-plane and SQLite database. It does not read from $AUTONOMY_AUDIT_DIR or depend on any other lab phase.

Evidence directory: $EVIDENCE_DIR/val07/

5. Exact Scenarios¶

VAL07-01 — Control Plane Reachable¶

Purpose: Confirm that the dedicated VAL07 control plane starts successfully and responds to health probes before the benchmark begins. A failure here indicates a port conflict or binary build problem, not a latency issue.

Action:

autonomy-orchestrator serve \
  --listen 127.0.0.1:18992 \
  --metrics-addr 127.0.0.1:19092 \
  --data-dir $WORK_DIR/val07 \
  --log-format text

curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:18992/v1/health

Evidence file: val07/val07-health.txt — health_code, pass

Pass criterion: health_code=200.

VAL07-02, 03, 04 — Plan-Create Latency (p50/p95/p99)¶

Purpose: Establish the latency profile for POST /v1/rollouts across 20 sequential requests and assert each percentile against its bound. VAL07-04 (p99 ≤ 500 ms) is the primary workplan target.

Action:

For i in 1..20:

curl -s -o /dev/null -w '%{http_code} %{time_total}\n' \
  -X POST -H 'Content-Type: application/json' \
  -d '{"metadata":{"id":"val07-plan-NNN"},...}' \
  http://127.0.0.1:18992/v1/rollouts

Each response line is appended to val07-create-raw.txt (format: <code> <seconds>). Only 2xx responses contribute to the timing sample. Python computes p50/p95/p99 using the nearest-rank method, and the validation requires the full 20 successful timing samples to be present before any percentile check can pass.

Evidence files:

val07/val07-create-raw.txt — 20 raw <code> <seconds> lines
val07/val07-create-percentiles.txt — expected_n, n, sample_complete, p50_ms, p95_ms, p99_ms, min_ms, max_ms

Pass criteria:

Check	Bound
VAL07-02	`n=20` and `p50_ms` ≤ 100
VAL07-03	`n=20` and `p95_ms` ≤ 300
VAL07-04	`n=20` and `p99_ms` ≤ 500

VAL07-05 — Plan-List Latency (p99)¶

Purpose: Confirm that GET /v1/rollouts (list all plans) meets the latency bound when 20 plans exist in the store. The list path scans the SQLite events table and marshals all records; this exercises a realistic operator query.

Action:

For i in 1..20:

curl -s -o /dev/null -w '%{http_code} %{time_total}\n' \
  http://127.0.0.1:18992/v1/rollouts

Evidence files:

val07/val07-list-raw.txt — 20 raw lines
val07/val07-list-percentiles.txt — expected_n, n, sample_complete, p50_ms, p95_ms, p99_ms, min_ms, max_ms

Pass criterion: n=20 and p99_ms ≤ 500.

VAL07-06 + 07 — Concurrent Plan Creates¶

Purpose: Prove operator-facing responsiveness under modest concurrency. 5 parallel POST /v1/rollouts requests are launched as background curl processes; all must return 2xx (VAL07-06) and the total wall-clock time must be ≤ 2000 ms (VAL07-07). This documents the expected serialisation behaviour of the single- writer SQLite connection without treating it as a failure.

Action:

t_start=$(date +%s%3N)
for j in 1..5:
  curl ... POST /v1/rollouts &   # concurrent
done
wait
t_end=$(date +%s%3N)

Evidence files:

val07/val07-concurrent-raw.txt — 5 lines (one per background curl)
val07/val07-concurrent-summary.txt — concurrent_n, conc_ok, conc_errors, wall_ms, bound_ms=2000, wall_pass

Pass criteria:

VAL07-06: conc_errors=0 and conc_ok=5
VAL07-07: wall_ms ≤ 2000

VAL07-08 — Zero Client Errors¶

Purpose: Confirm that no benchmark request returned a non-2xx HTTP status code. Counts errors from all three phases: sequential creates (20), sequential lists (20), and concurrent creates (5) — 45 requests total.

Evidence file: val07/val07-error-summary.txt — total_requests, error_count, pass

Pass criterion: error_count = 0.

VAL07-09 — Prometheus Observations¶

Purpose: Confirm that the Prometheus cp_http_requests_total counter reflects the benchmark traffic, proving Prometheus instrumentation is wired and receiving real observations from the VAL07 control-plane instance.

Action:

curl -fsS http://127.0.0.1:19092/metrics | grep '^cp_http_requests_total'

Evidence file: val07/val07-prometheus-check.txt — cp_http_requests_total, pass

Pass criterion: cp_http_requests_total > 0.

6. Evidence Files¶

All files are written to $EVIDENCE_DIR/val07/.

File	Produced by	Contains
`val07-cp.log`	orchestrator stdout+stderr	Startup logs and per-request log lines
`val07-health.txt`	`curl -w '%{http_code}'`	`health_code`, `pass`
`val07-create-raw.txt`	create loop	20 lines: `<http_code> <time_total_s>`
`val07-create-percentiles.txt`	Python percentile script	`expected_n`, `n`, `sample_complete`, `p50_ms`, `p95_ms`, `p99_ms`, `min_ms`, `max_ms`
`val07-list-raw.txt`	list loop	20 lines: `<http_code> <time_total_s>`
`val07-list-percentiles.txt`	Python percentile script	`expected_n`, `n`, `sample_complete`, `p50_ms`, `p95_ms`, `p99_ms`, `min_ms`, `max_ms`
`val07-concurrent-raw.txt`	concurrent curl batch	5 lines: `<http_code> <time_total_s>`
`val07-concurrent-summary.txt`	`date +%s%3N` wall clock	`concurrent_n`, `conc_ok`, `conc_errors`, `wall_ms`, `bound_ms`, `wall_pass`
`val07-error-summary.txt`	error counter	`total_requests=45`, `error_count`, `pass`
`val07-metrics-raw.txt`	`curl /metrics`	Prometheus text exposition from VAL07 CP
`val07-prometheus-check.txt`	`grep + awk`	`cp_http_requests_total`, `pass`
`val07-report.txt`	composite report	9-check PASS/FAIL + latency summary
`val07-report.json`	composite report	Machine-readable JSON with latency objects

7. Pass/Fail Criteria¶

Check ID	Name	File	Pass condition
VAL07-01	orchestrator_reachable	`val07-health.txt`	`health_code=200`
VAL07-02	plan_create_p50	`val07-create-percentiles.txt`	`p50_ms` ≤ 100
VAL07-03	plan_create_p95	`val07-create-percentiles.txt`	`p95_ms` ≤ 300
VAL07-04	plan_create_p99	`val07-create-percentiles.txt`	`p99_ms` ≤ 500
VAL07-05	plan_list_p99	`val07-list-percentiles.txt`	`p99_ms` ≤ 500
VAL07-06	concurrent_success	`val07-concurrent-summary.txt`	`conc_errors=0` and `conc_ok=5`
VAL07-07	concurrent_wall_clock	`val07-concurrent-summary.txt`	`wall_ms` ≤ 2000
VAL07-08	zero_client_errors	`val07-error-summary.txt`	`error_count=0`
VAL07-09	prometheus_observations	`val07-prometheus-check.txt`	`cp_http_requests_total` > 0

Overall pass: all 9 checks pass and val07-report.txt reports pass=9 fail=0 total=9.

Failure handling:

VAL07-01 fails: inspect val07-cp.log for startup errors; check whether port 18992 is already in use by another process
VAL07-02..04 fail (latency bound exceeded or incomplete sample): inspect val07-create-percentiles.txt; if n < 20, the control plane returned non-2xx responses and the latency corpus is incomplete, so cross-reference val07-create-raw.txt before interpreting the percentile numbers; if max_ms is very high but p95_ms is within bound, investigate what caused the outlier (disk I/O spike, OS scheduling, GC pause); consider whether the SQLite WAL sync setting is appropriate for the test environment
VAL07-05 fails (list latency or incomplete sample): inspect val07-list-percentiles.txt; if n < 20, one or more list requests failed; otherwise, if latency grows with plan count, the list endpoint may be performing a full table scan without a suitable index
VAL07-06 fails (conc_errors > 0): inspect val07-concurrent-raw.txt for the failing HTTP codes; a 409 Conflict indicates duplicate plan IDs; a 503 indicates the single-writer connection is returning errors rather than queuing — investigate store.go
VAL07-07 fails (wall_ms > 2000): the 5 concurrent requests are serialised through a single SQLite connection; if wall time > 2000 ms, each request is taking > 400 ms on average, which should already be caught by VAL07-04 in the sequential phase
VAL07-08 fails: cross-reference the error count with val07-create-raw.txt and val07-list-raw.txt to identify which operation returned a non-2xx code
VAL07-09 fails: inspect val07-metrics-raw.txt; if the file is empty, the metrics server failed to start (check val07-cp.log for --metrics-addr binding errors); if the file contains metrics but cp_http_requests_total is missing, the Prometheus instrumentation was not registered for this endpoint

8. Report Template¶

# VAL 07 — Fleet Rollout Latency Baseline Report
timestamp: 2026-03-21T10:00:00Z
samples_per_operation: 20

## Results
VAL07-01 orchestrator_reachable:  PASS
VAL07-02 plan_create_p50:         PASS  (p50=2ms   bound=100ms)
VAL07-03 plan_create_p95:         PASS  (p95=5ms   bound=300ms)
VAL07-04 plan_create_p99:         PASS  (p99=8ms   bound=500ms)
VAL07-05 plan_list_p99:           PASS  (p99=4ms   bound=500ms)
VAL07-06 concurrent_success:      PASS  (5/5 succeeded)
VAL07-07 concurrent_wall_clock:   PASS  (elapsed=42ms  bound=2000ms)
VAL07-08 zero_client_errors:      PASS  (errors=0)
VAL07-09 prometheus_observations: PASS  (count=45)

## Summary
pass=9  fail=0  total=9

The runner also prints VAL 07: pass=9 fail=0 total=9 (report: val07-report.txt) to stdout so CI log scanners can grep for VAL 07: pass= without parsing the report file.

The machine-readable val07-report.json includes the full latency objects:

{
  "plan_create_ms": {"p50": 2, "p95": 5, "p99": 8},
  "plan_list_ms": {"p99": 4},
  "concurrent": {"n": 5, "ok": 5, "wall_ms": 42}
}

9. Environment Assumptions¶

Assumption	Detail
Hardware	Development machine or CI runner with local SSD; loopback TCP
Control plane	In-process SQLite with `SetMaxOpenConns(1)`; no PostgreSQL
Isolation	Fresh data directory (`$WORK_DIR/val07`); no prior state
RBAC	`AUTONOMY_RBAC_ENFORCEMENT=0` (not the subject of this validation)
TLS	Plain HTTP on loopback; TLS overhead not included
Concurrency	N=5 concurrent creates; not a saturation test
Sample size	N=20 per operation; p99 = max observation with this sample size

The bounds are chosen to be passing under normal development hardware while still catching regressions:

Operation	p50 bound	p95 bound	p99 bound	Rationale
POST /v1/rollouts	100 ms	300 ms	500 ms	Workplan target; typical measured value is 1–10 ms
GET /v1/rollouts	—	—	500 ms	Read path; typically faster than write path
5× concurrent wall clock	—	—	2000 ms	5 × p99 bound; confirms queuing, not failure

10. How to Run¶

VAL07 executes automatically as the final validation slice when the full lab is run:

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_cli_audit_lab.sh

To inspect results after a run:

# Quick pass/fail
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-report.txt

# Latency numbers
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-create-percentiles.txt
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-list-percentiles.txt

# Concurrent test result
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-concurrent-summary.txt

# Error count
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-error-summary.txt

# Prometheus counter
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-prometheus-check.txt

# Machine-readable report with latency objects
jq '{pass_count, fail_count, plan_create_ms, plan_list_ms, concurrent}' \
  evidence/pr17-cli-audit-local-2026-03-17/val07/val07-report.json

# Raw timing data
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-create-raw.txt