VAL25 — Fleet Rollout Proof Report Generator¶

Audience: engineering leads, product managers, and external reviewers who need a consolidated, evidence-backed assessment of fleet rollout readiness.

VAL25 is a report generator, not a test runner. It reads evidence produced by five completed validation slices (VAL07–VAL11) and produces a single production-quality proof report with explicit readiness conclusions.

1. Scope¶

VAL25 consolidates evidence from:

Slice	Name	What it proves
VAL07	Fleet Rollout Latency Baseline	p50/p95/p99 latency under light load (N=20 samples)
VAL08	Fleet Rollout Throughput Validation	Zero errors at N=1/10/50/100 concurrent plan creates
VAL09	Stuck Rollout Detection	Stuck plans reliably detected and excluded when paused/terminal
VAL10	Rollback Reliability Validation	Retry and rollback batch success rate ≥ 99%
VAL11	Fleet Rollout Chaos Test Pack	Data durability, rapid restart, stuck proxy detection, cascade recovery

Branch rule: coverage by existing runner¶

Existing asset	Coverage
`run_soak_val12_report.sh`	30-day fleet soak — separate domain, different metrics
`run_soak_val18_report.sh`	30-day HA soak — separate domain
`run_soak_val24_report.sh`	30-day relay soak — separate domain
`run_cli_audit_lab.sh`	Runs VAL07–VAL11 but produces per-slice reports; no cross-slice aggregation

New aggregator required. No existing script reads and combines VAL07–VAL11 evidence into a single proof artifact.

Out of scope¶

PostgreSQL backend validation (single-node SQLite only)
30-day durability soak (covered by VAL12)
Multi-node or multi-region deployment testing
SIGKILL chaos (SIGTERM only)
Concurrent-under-kill write requests
Edge-agent reconnect after control-plane restart
Throughput beyond N=100 concurrent plans

2. Evidence Structure¶

VAL25 reads from the evidence directory produced by run_cli_audit_lab.sh (default: evidence/cli-audit-lab-YYYY-MM-DD).

Input file	Produced by	Contents
`val07/val07-report.json`	`run_rollout_latency_val07_lab()`	Latency percentiles, concurrent timing, 9-check results
`val08/val08-report.json`	`run_rollout_throughput_val08_lab()`	Per-tier throughput, wall times, 10-check results
`val09/val09-report.json`	`run_stuck_detection_val09_lab()`	Stuck detection counts, threshold/sleep config, 10-check results
`val10/val10-report.json`	`run_rollback_reliability_val10_lab()`	Success rates per batch type, 10-check results
`val11/val11-report.json`	`run_chaos_val11_lab()`	Chaos scenario outcomes, 10-check results

VAL25 expects these five reports to come from one coherent run_cli_audit_lab.sh evidence set. The generator checks each report’s embedded timestamp and requires the found reports to fall within a single 6-hour evidence window before it can issue a design-partner readiness conclusion.

Missing slice reports are reported as MISSING in the coverage table rather than aborting. Reports that exist but do not match the expected schema are also degraded to MISSING with a schema-mismatch detail instead of crashing the generator.

3. Metric Definitions and Targets¶

VAL25 reports on the following metrics. Targets are drawn from the Gap Closure Workplan (v1.2). Measured results and proposed targets are explicitly separated throughout the report.

Latency (VAL07)¶

Metric	Target	Source
plan_create p50	≤ 100 ms	VAL07 workplan target
plan_create p95	≤ 300 ms	VAL07 workplan target
plan_create p99	≤ 500 ms	VAL07 workplan target (primary)
plan_list p99	≤ 500 ms	VAL07 workplan target
Concurrent 5× wall	≤ 2,000 ms	VAL07 workplan target
Sample count	N = 20	Fixed in VAL07 runner

Measurement method: curl -w '%{time_total}' on each request; Python nearest-rank percentile computation.

Throughput (VAL08)¶

Metric	Target	Source
N=100 errors	= 0	VAL08 primary target
N=100 wall time	≤ 30,000 ms	VAL08 workplan target
N=1 vs N=100 throughput	N=100 ≥ N=1 plans/sec	Scaling sanity check

SQLite single-writer plateau is expected at high N — linear scaling is explicitly not a target. The throughput figure is informational; zero errors is the binding criterion.

Recovery Success Rates (VAL10)¶

Metric	Target	Source
Retry batch success rate	≥ 0.990	VAL10 workplan target
Rollback batch success rate	≥ 0.990	VAL10 workplan target
Aggregate success rate	≥ 0.990	VAL10 primary target

Batch size: N=5 per type (10 total). Aggregate = (retry_ok + rollback_ok) / 10.

Chaos Resilience (VAL11)¶

Reported as pass/fail for each of four scenarios:

Scenario	Description
Data durability	SQLite data survives SIGTERM + restart (1 cycle)
3× rapid restart	CP survives 3 consecutive SIGTERM + restart cycles
Stuck proxy detection	Device-unresponsive plan detected by stuck scanner
Cascade recovery	3 stuck plans return to terminal states after manual intervention

4. Readiness Level Definitions¶

VAL25 evaluates three readiness levels. Only Design Partner level is achievable with this validation suite.

Design Partner Ready¶

Criteria (all must hold):

All five VAL07–VAL11 slices pass (zero failed checks each)
plan_create p99 ≤ 500 ms (VAL07)
VAL08 primary N=100 scenario passes (VAL08-05)
Rollback aggregate rate ≥ 0.990 (VAL10)
Data durability after SIGTERM restart (VAL11)
Evidence timestamps are coherent (single 6-hour evidence window)

Meaning: the core rollout lifecycle is functional and stable enough to offer to early adopters under the scope limitations stated in §1.

GA Ready¶

Not achievable with VAL07–VAL11 alone. Additional requirements:

PostgreSQL backend validation under equivalent load scenarios
VAL12 30-day fleet soak: Gate D requires rollback rate ≥ 0.990 sustained over 1,440 rounds and fleet plan count ≥ 100 reached
Multi-node HA cluster rollout delivery path validated

Public Production Claim¶

Requires everything for GA Ready plus:

External security hardening audit
SLA-grade observability and alerting validation
Multi-region topology testing

5. 10-Check Matrix¶

ID	When	Description	Pass criterion
VAL25-01	Setup	VAL07 latency report found and all checks pass	`s07 == "PASS"` (9/9 checks)
VAL25-02	Setup	VAL08 throughput report found and all checks pass	`s08 == "PASS"` (10/10 checks)
VAL25-03	Setup	VAL09 stuck detection report found and all checks pass	`s09 == "PASS"` (10/10 checks)
VAL25-04	Setup	VAL10 rollback reliability report found and all checks pass	`s10 == "PASS"` (10/10 checks)
VAL25-05	Setup	VAL11 chaos report found and all checks pass	`s11 == "PASS"` (10/10 checks)
VAL25-06	Metric	plan_create p99 ≤ 500 ms	`lat_p99 <= 500`
VAL25-07	Metric	N=100 concurrent with zero errors	`val08 VAL08-05 passes`
VAL25-08	Metric	Rollback aggregate success rate ≥ 0.990	`agg_rate >= 0.990`
VAL25-09	Metric	Data durability after SIGTERM restart	`val11 VAL11-03 passes`
VAL25-10	Summary	Design partner readiness — all above pass and evidence coherent	VAL25-01..09 all PASS and evidence timestamps fall within one 6-hour window

6. Run the Report¶

Prerequisites¶

Run the full cli-audit-lab (or at minimum VAL07–VAL11 slices):

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_cli_audit_lab.sh

The evidence directory is printed at the end: evidence/cli-audit-lab-YYYY-MM-DD.

Generate the proof report¶

bash scripts/labs/run_fleet_rollout_proof_report_val25.sh \
  evidence/cli-audit-lab-2026-03-23

Output files¶

File	Contents
stdout	Human-readable proof report
`val25/val25-proof-report.txt`	Same content as stdout
`val25/val25-proof-report.json`	Machine-readable JSON artifact

7. Final Report Format¶

VAL25 — Fleet Rollout Proof Report
Generated:    <YYYY-MM-DDTHH:MM:SSZ>
Evidence dir: <path>

Environment:
  Backend:    SQLite (single-node, in-process)
  Chaos:      SIGTERM only (no SIGKILL, no iptables, no network partitions)
  Topology:   Single-node, single-region
  Test suite: VAL07–VAL11 (5 validation slices, embedded in run_cli_audit_lab.sh)
  Evidence:   single coherent evidence window
  Note:       All latency and throughput figures are from controlled lab runs on
              a single host.  Results are NOT representative of multi-node or
              production-scale deployments.

Scenario Coverage:
  VAL07  Fleet Rollout Latency Baseline    PASS     (9/9 checks)
  VAL08  Fleet Rollout Throughput          PASS     (10/10 checks)
  VAL09  Stuck Rollout Detection           PASS     (10/10 checks)
  VAL10  Rollback Reliability              PASS     (10/10 checks)
  VAL11  Fleet Rollout Chaos               PASS     (10/10 checks)

Latency Metrics  (VAL07):
  Samples (plan_create):   20
  p50:    12.4 ms   [target <= 100 ms]   PASS
  p95:    19.8 ms   [target <= 300 ms]   PASS
  p99:    24.1 ms   [target <= 500 ms]   PASS
  plan_list p99:    8.3 ms   [target <= 500 ms]   PASS
  Concurrent:  5/5 succeeded  wall=142 ms   [target: all ok, wall <= 2000 ms]

Throughput Metrics  (VAL08, plans/sec, SQLite single-writer):
  N=1:      18.42 plans/sec   (reference baseline)
  N=10:     62.11 plans/sec
  N=50:     74.33 plans/sec
  N=100:    78.05 plans/sec   wall=6409 ms   errors=0
  Note:   Throughput plateau at high N is expected with SQLite (single writer).
          Zero errors at N=100 is the primary target, not linear scaling.

Recovery Success Rates  (VAL10, rollback reliability):
  Retry    batch:  5/5   rate=1.0000   [target >= 0.990]   PASS
  Rollback batch:  5/5   rate=1.0000   [target >= 0.990]   PASS
  Aggregate:      10/10  rate=1.0000   [target >= 0.990]   PASS

Chaos Resilience  (VAL11, SIGTERM-only):
  Data durability after SIGTERM restart:             PASS
  3x rapid restart resilience:                       PASS
  Device-unresponsive stuck detection (stuck proxy): PASS
  Bulk cascade recovery (3 plans):                   PASS

Stuck Detection Accuracy  (VAL09):
  Threshold:         3 s
  Injection sleep:   4 s
  Empty scans:       1   (no false positives on fresh plans)
  Stale detected:    1   (stuck plans caught above threshold)
  Final scan count:  1   (one residual stuck plan after recovery)

10-Check Matrix:
  VAL25-01 PASS  VAL07 latency report: all 9 checks passed
           val07 9/9 checks passed
  VAL25-02 PASS  VAL08 throughput report: all 10 checks passed
           val08 10/10 checks passed
  VAL25-03 PASS  VAL09 stuck detection report: all 10 checks passed
           val09 10/10 checks passed
  VAL25-04 PASS  VAL10 rollback reliability report: all 10 checks passed
           val10 10/10 checks passed
  VAL25-05 PASS  VAL11 chaos report: all 10 checks passed
           val11 10/10 checks passed
  VAL25-06 PASS  Latency: plan_create p99 <= 500ms [target]
           measured p99=24.1ms
  VAL25-07 PASS  Throughput: N=100 concurrent with zero errors [target]
           total_errors=0
  VAL25-08 PASS  Rollback: aggregate success rate >= 0.990 [target]
           aggregate_rate=1.0000 (10/10)
  VAL25-09 PASS  Chaos: data durability preserved after SIGTERM restart [target]
           val11 VAL11-03 data_durability_post_restart
  VAL25-10 PASS  Summary: design partner readiness — all five slices pass and key targets met

Overall: PASS=10 FAIL=0

Known Failures and Limitations:
  ...  (see §1 Out of scope)

Readiness Conclusion:
  DESIGN PARTNER READY  ✓
  ...
  GA READY  ✗  (NOT YET)
  ...
  PUBLIC PRODUCTION CLAIM  ✗  (NOT YET)
  ...

Verdict: DESIGN PARTNER READY

8. Tooling¶

File	Role
`scripts/labs/run_fleet_rollout_proof_report_val25.sh`	VAL25 report generator
`scripts/labs/run_cli_audit_lab.sh`	Source of VAL07–VAL11 evidence
`docs/tutorials/rollout-latency-validation.md`	VAL07 formal plan
`docs/tutorials/rollout-throughput-validation.md`	VAL08 formal plan
`docs/tutorials/stuck-detection-validation.md`	VAL09 formal plan
`docs/tutorials/rollback-reliability-validation.md`	VAL10 formal plan
`docs/tutorials/chaos-validation.md`	VAL11 formal plan