VAL 07 — Fleet Rollout Latency Baseline

1. Purpose and Claims

This validation establishes a reproducible latency baseline for the operator-facing rollout API and proves the system meets the workplan performance targets under normal and modest-concurrency conditions.

#

Claim

VAL07-C1

The control plane starts, becomes reachable, and responds to health probes

VAL07-C2

Rollout plan creation (POST /v1/rollouts) meets the workplan latency target: p50 ≤ 100 ms, p95 ≤ 300 ms, p99 ≤ 500 ms (20 sequential samples)

VAL07-C3

Rollout plan listing (GET /v1/rollouts) meets the same latency target: p99 ≤ 500 ms (20 sequential samples after 20 plans exist)

VAL07-C4

Under modest concurrency (5 parallel plan creates), all requests succeed and the total wall-clock time is ≤ 2000 ms, proving operator-facing responsiveness is not blocked by single-writer SQLite serialisation

Relationship to workplan:

The workplan lists rollout plan creation latency < 500ms p99 as a Proposed validation target (Decision Tier: Proposed validation target) and states that the proof artifact required for Gate D is a “Performance baseline report with p50/p95/p99 latencies.” VAL07 produces that baseline report for the local- SQLite control plane running in the lab environment.

What these numbers mean:

VAL07 is a local-lab baseline, not a production load test. The bounds are deliberately generous (100/300/500 ms) for an in-process SQLite store on development hardware. They prove the system is not slow by accident — a mis-wired handler, a missing index, or an unintentional synchronous fsync would show up as latency regressions against these bounds. Production sizing against a PostgreSQL store at network scale is out of scope here.


2. Existing Lab Coverage

Question

Answer

Is this covered by an existing LAB?

No. The run_rollout_lab() phase in run_cli_audit_lab.sh exercises rollout plan create/publish/cancel but takes no timing measurements.

Which LAB/evidence bundle is extended?

scripts/labs/run_cli_audit_lab.sh — new function run_rollout_latency_val07_lab() added as the final slice.

Why a new function rather than extending run_rollout_lab()?

The rollout lab kills its control-plane and cleans up at the end of its phase. VAL07 needs a fresh, isolated control-plane instance so latency measurements are not contaminated by prior write history. Using a different port (18992) guarantees no socket conflicts with any other lab phase.


3. Benchmark Architecture

Tooling

Tool

Purpose

curl -s -o /dev/null -w '%{http_code} %{time_total}'

Per-request HTTP timing (fractional seconds, microsecond precision)

Python (inline heredoc)

Percentile computation using the nearest-rank method

date +%s%3N (millisecond wall clock)

Concurrent batch wall-clock measurement

grep '^cp_http_requests_total'

Prometheus counter extraction for VAL07-09

Percentile method

The nearest-rank method is used with N=20 samples:

idx = max(0, min(int(p × N / 100 + 0.5) − 1, N − 1))

With 20 samples, p99 maps to the maximum observation. This is intentional: the test deliberately sizes the sample so that the stated p99 bound is the maximum tolerated latency, not a statistical approximation. Any single request that exceeds 500 ms will cause VAL07-04 to fail.

Measurement scope

curl time_total measures the full round-trip from connection establishment through response body receipt. It includes:

  • TCP connect (loopback → negligible)

  • HTTP request serialisation

  • Server-side handler: JSON unmarshal → SQLite write → JSON marshal

  • HTTP response serialisation

It excludes DNS resolution (not applicable for 127.0.0.1) and TLS handshake (VAL07 runs over plain HTTP; TLS is covered in the cert lab).

Control plane isolation

VAL07 starts a dedicated control-plane instance:

  • Listen: 127.0.0.1:18992

  • Metrics: 127.0.0.1:19092

  • Data dir: $WORK_DIR/val07 (removed and recreated on each run so the SQLite database is fresh and has no prior state)

  • Lifecycle: started at the beginning of run_rollout_latency_val07_lab(), killed with kill+wait at the end — does not share the global $ORCHESTRATOR_PID cleanup path

SQLite serialisation note

The control plane uses a single SQLite connection (SetMaxOpenConns(1)) to serialise all writes. Concurrent POST /v1/rollouts requests are therefore serialised by the database layer. VAL07-06/07 prove that this serialisation does not cause client-visible errors or unacceptable latency at N=5 concurrency — the expected pattern is sequential queuing, not failure.


4. Harness

VAL07 is implemented as run_rollout_latency_val07_lab() in scripts/labs/run_cli_audit_lab.sh. It runs after run_support_bundle_val06_lab as the final validation slice.

No prior lab state is required. VAL07 creates its own fresh control-plane and SQLite database. It does not read from $AUTONOMY_AUDIT_DIR or depend on any other lab phase.

Evidence directory: $EVIDENCE_DIR/val07/


5. Exact Scenarios

VAL07-01 — Control Plane Reachable

Purpose: Confirm that the dedicated VAL07 control plane starts successfully and responds to health probes before the benchmark begins. A failure here indicates a port conflict or binary build problem, not a latency issue.

Action:

autonomy-orchestrator serve \
  --listen 127.0.0.1:18992 \
  --metrics-addr 127.0.0.1:19092 \
  --data-dir $WORK_DIR/val07 \
  --log-format text

curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:18992/v1/health

Evidence file: val07/val07-health.txthealth_code, pass

Pass criterion: health_code=200.


VAL07-02, 03, 04 — Plan-Create Latency (p50/p95/p99)

Purpose: Establish the latency profile for POST /v1/rollouts across 20 sequential requests and assert each percentile against its bound. VAL07-04 (p99 ≤ 500 ms) is the primary workplan target.

Action:

For i in 1..20:

curl -s -o /dev/null -w '%{http_code} %{time_total}\n' \
  -X POST -H 'Content-Type: application/json' \
  -d '{"metadata":{"id":"val07-plan-NNN"},...}' \
  http://127.0.0.1:18992/v1/rollouts

Each response line is appended to val07-create-raw.txt (format: <code> <seconds>). Only 2xx responses contribute to the timing sample. Python computes p50/p95/p99 using the nearest-rank method, and the validation requires the full 20 successful timing samples to be present before any percentile check can pass.

Evidence files:

  • val07/val07-create-raw.txt — 20 raw <code> <seconds> lines

  • val07/val07-create-percentiles.txtexpected_n, n, sample_complete, p50_ms, p95_ms, p99_ms, min_ms, max_ms

Pass criteria:

Check

Bound

VAL07-02

n=20 and p50_ms ≤ 100

VAL07-03

n=20 and p95_ms ≤ 300

VAL07-04

n=20 and p99_ms ≤ 500


VAL07-05 — Plan-List Latency (p99)

Purpose: Confirm that GET /v1/rollouts (list all plans) meets the latency bound when 20 plans exist in the store. The list path scans the SQLite events table and marshals all records; this exercises a realistic operator query.

Action:

For i in 1..20:

curl -s -o /dev/null -w '%{http_code} %{time_total}\n' \
  http://127.0.0.1:18992/v1/rollouts

Evidence files:

  • val07/val07-list-raw.txt — 20 raw lines

  • val07/val07-list-percentiles.txtexpected_n, n, sample_complete, p50_ms, p95_ms, p99_ms, min_ms, max_ms

Pass criterion: n=20 and p99_ms ≤ 500.


VAL07-06 + 07 — Concurrent Plan Creates

Purpose: Prove operator-facing responsiveness under modest concurrency. 5 parallel POST /v1/rollouts requests are launched as background curl processes; all must return 2xx (VAL07-06) and the total wall-clock time must be ≤ 2000 ms (VAL07-07). This documents the expected serialisation behaviour of the single- writer SQLite connection without treating it as a failure.

Action:

t_start=$(date +%s%3N)
for j in 1..5:
  curl ... POST /v1/rollouts &   # concurrent
done
wait
t_end=$(date +%s%3N)

Evidence files:

  • val07/val07-concurrent-raw.txt — 5 lines (one per background curl)

  • val07/val07-concurrent-summary.txtconcurrent_n, conc_ok, conc_errors, wall_ms, bound_ms=2000, wall_pass

Pass criteria:

  • VAL07-06: conc_errors=0 and conc_ok=5

  • VAL07-07: wall_ms ≤ 2000


VAL07-08 — Zero Client Errors

Purpose: Confirm that no benchmark request returned a non-2xx HTTP status code. Counts errors from all three phases: sequential creates (20), sequential lists (20), and concurrent creates (5) — 45 requests total.

Evidence file: val07/val07-error-summary.txttotal_requests, error_count, pass

Pass criterion: error_count = 0.


VAL07-09 — Prometheus Observations

Purpose: Confirm that the Prometheus cp_http_requests_total counter reflects the benchmark traffic, proving Prometheus instrumentation is wired and receiving real observations from the VAL07 control-plane instance.

Action:

curl -fsS http://127.0.0.1:19092/metrics | grep '^cp_http_requests_total'

Evidence file: val07/val07-prometheus-check.txtcp_http_requests_total, pass

Pass criterion: cp_http_requests_total > 0.


6. Evidence Files

All files are written to $EVIDENCE_DIR/val07/.

File

Produced by

Contains

val07-cp.log

orchestrator stdout+stderr

Startup logs and per-request log lines

val07-health.txt

curl -w '%{http_code}'

health_code, pass

val07-create-raw.txt

create loop

20 lines: <http_code> <time_total_s>

val07-create-percentiles.txt

Python percentile script

expected_n, n, sample_complete, p50_ms, p95_ms, p99_ms, min_ms, max_ms

val07-list-raw.txt

list loop

20 lines: <http_code> <time_total_s>

val07-list-percentiles.txt

Python percentile script

expected_n, n, sample_complete, p50_ms, p95_ms, p99_ms, min_ms, max_ms

val07-concurrent-raw.txt

concurrent curl batch

5 lines: <http_code> <time_total_s>

val07-concurrent-summary.txt

date +%s%3N wall clock

concurrent_n, conc_ok, conc_errors, wall_ms, bound_ms, wall_pass

val07-error-summary.txt

error counter

total_requests=45, error_count, pass

val07-metrics-raw.txt

curl /metrics

Prometheus text exposition from VAL07 CP

val07-prometheus-check.txt

grep + awk

cp_http_requests_total, pass

val07-report.txt

composite report

9-check PASS/FAIL + latency summary

val07-report.json

composite report

Machine-readable JSON with latency objects


7. Pass/Fail Criteria

Check ID

Name

File

Pass condition

VAL07-01

orchestrator_reachable

val07-health.txt

health_code=200

VAL07-02

plan_create_p50

val07-create-percentiles.txt

p50_ms ≤ 100

VAL07-03

plan_create_p95

val07-create-percentiles.txt

p95_ms ≤ 300

VAL07-04

plan_create_p99

val07-create-percentiles.txt

p99_ms ≤ 500

VAL07-05

plan_list_p99

val07-list-percentiles.txt

p99_ms ≤ 500

VAL07-06

concurrent_success

val07-concurrent-summary.txt

conc_errors=0 and conc_ok=5

VAL07-07

concurrent_wall_clock

val07-concurrent-summary.txt

wall_ms ≤ 2000

VAL07-08

zero_client_errors

val07-error-summary.txt

error_count=0

VAL07-09

prometheus_observations

val07-prometheus-check.txt

cp_http_requests_total > 0

Overall pass: all 9 checks pass and val07-report.txt reports pass=9 fail=0 total=9.

Failure handling:

  • VAL07-01 fails: inspect val07-cp.log for startup errors; check whether port 18992 is already in use by another process

  • VAL07-02..04 fail (latency bound exceeded or incomplete sample): inspect val07-create-percentiles.txt; if n < 20, the control plane returned non-2xx responses and the latency corpus is incomplete, so cross-reference val07-create-raw.txt before interpreting the percentile numbers; if max_ms is very high but p95_ms is within bound, investigate what caused the outlier (disk I/O spike, OS scheduling, GC pause); consider whether the SQLite WAL sync setting is appropriate for the test environment

  • VAL07-05 fails (list latency or incomplete sample): inspect val07-list-percentiles.txt; if n < 20, one or more list requests failed; otherwise, if latency grows with plan count, the list endpoint may be performing a full table scan without a suitable index

  • VAL07-06 fails (conc_errors > 0): inspect val07-concurrent-raw.txt for the failing HTTP codes; a 409 Conflict indicates duplicate plan IDs; a 503 indicates the single-writer connection is returning errors rather than queuing — investigate store.go

  • VAL07-07 fails (wall_ms > 2000): the 5 concurrent requests are serialised through a single SQLite connection; if wall time > 2000 ms, each request is taking > 400 ms on average, which should already be caught by VAL07-04 in the sequential phase

  • VAL07-08 fails: cross-reference the error count with val07-create-raw.txt and val07-list-raw.txt to identify which operation returned a non-2xx code

  • VAL07-09 fails: inspect val07-metrics-raw.txt; if the file is empty, the metrics server failed to start (check val07-cp.log for --metrics-addr binding errors); if the file contains metrics but cp_http_requests_total is missing, the Prometheus instrumentation was not registered for this endpoint


8. Report Template

# VAL 07 — Fleet Rollout Latency Baseline Report
timestamp: 2026-03-21T10:00:00Z
samples_per_operation: 20

## Results
VAL07-01 orchestrator_reachable:  PASS
VAL07-02 plan_create_p50:         PASS  (p50=2ms   bound=100ms)
VAL07-03 plan_create_p95:         PASS  (p95=5ms   bound=300ms)
VAL07-04 plan_create_p99:         PASS  (p99=8ms   bound=500ms)
VAL07-05 plan_list_p99:           PASS  (p99=4ms   bound=500ms)
VAL07-06 concurrent_success:      PASS  (5/5 succeeded)
VAL07-07 concurrent_wall_clock:   PASS  (elapsed=42ms  bound=2000ms)
VAL07-08 zero_client_errors:      PASS  (errors=0)
VAL07-09 prometheus_observations: PASS  (count=45)

## Summary
pass=9  fail=0  total=9

The runner also prints VAL 07: pass=9 fail=0 total=9 (report: val07-report.txt) to stdout so CI log scanners can grep for VAL 07: pass= without parsing the report file.

The machine-readable val07-report.json includes the full latency objects:

{
  "plan_create_ms": {"p50": 2, "p95": 5, "p99": 8},
  "plan_list_ms": {"p99": 4},
  "concurrent": {"n": 5, "ok": 5, "wall_ms": 42}
}

9. Environment Assumptions

Assumption

Detail

Hardware

Development machine or CI runner with local SSD; loopback TCP

Control plane

In-process SQLite with SetMaxOpenConns(1); no PostgreSQL

Isolation

Fresh data directory ($WORK_DIR/val07); no prior state

RBAC

AUTONOMY_RBAC_ENFORCEMENT=0 (not the subject of this validation)

TLS

Plain HTTP on loopback; TLS overhead not included

Concurrency

N=5 concurrent creates; not a saturation test

Sample size

N=20 per operation; p99 = max observation with this sample size

The bounds are chosen to be passing under normal development hardware while still catching regressions:

Operation

p50 bound

p95 bound

p99 bound

Rationale

POST /v1/rollouts

100 ms

300 ms

500 ms

Workplan target; typical measured value is 1–10 ms

GET /v1/rollouts

500 ms

Read path; typically faster than write path

5× concurrent wall clock

2000 ms

5 × p99 bound; confirms queuing, not failure


10. How to Run

VAL07 executes automatically as the final validation slice when the full lab is run:

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_cli_audit_lab.sh

To inspect results after a run:

# Quick pass/fail
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-report.txt

# Latency numbers
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-create-percentiles.txt
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-list-percentiles.txt

# Concurrent test result
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-concurrent-summary.txt

# Error count
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-error-summary.txt

# Prometheus counter
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-prometheus-check.txt

# Machine-readable report with latency objects
jq '{pass_count, fail_count, plan_create_ms, plan_list_ms, concurrent}' \
  evidence/pr17-cli-audit-local-2026-03-17/val07/val07-report.json

# Raw timing data
cat evidence/pr17-cli-audit-local-2026-03-17/val07/val07-create-raw.txt