VAL25 — Fleet Rollout Proof Report Generator

Audience: engineering leads, product managers, and external reviewers who need a consolidated, evidence-backed assessment of fleet rollout readiness.

VAL25 is a report generator, not a test runner. It reads evidence produced by five completed validation slices (VAL07–VAL11) and produces a single production-quality proof report with explicit readiness conclusions.

1. Scope

VAL25 consolidates evidence from:

Slice

Name

What it proves

VAL07

Fleet Rollout Latency Baseline

p50/p95/p99 latency under light load (N=20 samples)

VAL08

Fleet Rollout Throughput Validation

Zero errors at N=1/10/50/100 concurrent plan creates

VAL09

Stuck Rollout Detection

Stuck plans reliably detected and excluded when paused/terminal

VAL10

Rollback Reliability Validation

Retry and rollback batch success rate ≥ 99%

VAL11

Fleet Rollout Chaos Test Pack

Data durability, rapid restart, stuck proxy detection, cascade recovery

Branch rule: coverage by existing runner

Existing asset

Coverage

run_soak_val12_report.sh

30-day fleet soak — separate domain, different metrics

run_soak_val18_report.sh

30-day HA soak — separate domain

run_soak_val24_report.sh

30-day relay soak — separate domain

run_cli_audit_lab.sh

Runs VAL07–VAL11 but produces per-slice reports; no cross-slice aggregation

New aggregator required. No existing script reads and combines VAL07–VAL11 evidence into a single proof artifact.

Out of scope

  • PostgreSQL backend validation (single-node SQLite only)

  • 30-day durability soak (covered by VAL12)

  • Multi-node or multi-region deployment testing

  • SIGKILL chaos (SIGTERM only)

  • Concurrent-under-kill write requests

  • Edge-agent reconnect after control-plane restart

  • Throughput beyond N=100 concurrent plans

2. Evidence Structure

VAL25 reads from the evidence directory produced by run_cli_audit_lab.sh (default: evidence/cli-audit-lab-YYYY-MM-DD).

Input file

Produced by

Contents

val07/val07-report.json

run_rollout_latency_val07_lab()

Latency percentiles, concurrent timing, 9-check results

val08/val08-report.json

run_rollout_throughput_val08_lab()

Per-tier throughput, wall times, 10-check results

val09/val09-report.json

run_stuck_detection_val09_lab()

Stuck detection counts, threshold/sleep config, 10-check results

val10/val10-report.json

run_rollback_reliability_val10_lab()

Success rates per batch type, 10-check results

val11/val11-report.json

run_chaos_val11_lab()

Chaos scenario outcomes, 10-check results

VAL25 expects these five reports to come from one coherent run_cli_audit_lab.sh evidence set. The generator checks each report’s embedded timestamp and requires the found reports to fall within a single 6-hour evidence window before it can issue a design-partner readiness conclusion.

Missing slice reports are reported as MISSING in the coverage table rather than aborting. Reports that exist but do not match the expected schema are also degraded to MISSING with a schema-mismatch detail instead of crashing the generator.

3. Metric Definitions and Targets

VAL25 reports on the following metrics. Targets are drawn from the Gap Closure Workplan (v1.2). Measured results and proposed targets are explicitly separated throughout the report.

Latency (VAL07)

Metric

Target

Source

plan_create p50

≤ 100 ms

VAL07 workplan target

plan_create p95

≤ 300 ms

VAL07 workplan target

plan_create p99

≤ 500 ms

VAL07 workplan target (primary)

plan_list p99

≤ 500 ms

VAL07 workplan target

Concurrent 5× wall

≤ 2,000 ms

VAL07 workplan target

Sample count

N = 20

Fixed in VAL07 runner

Measurement method: curl -w '%{time_total}' on each request; Python nearest-rank percentile computation.

Throughput (VAL08)

Metric

Target

Source

N=100 errors

= 0

VAL08 primary target

N=100 wall time

≤ 30,000 ms

VAL08 workplan target

N=1 vs N=100 throughput

N=100 ≥ N=1 plans/sec

Scaling sanity check

SQLite single-writer plateau is expected at high N — linear scaling is explicitly not a target. The throughput figure is informational; zero errors is the binding criterion.

Recovery Success Rates (VAL10)

Metric

Target

Source

Retry batch success rate

≥ 0.990

VAL10 workplan target

Rollback batch success rate

≥ 0.990

VAL10 workplan target

Aggregate success rate

≥ 0.990

VAL10 primary target

Batch size: N=5 per type (10 total). Aggregate = (retry_ok + rollback_ok) / 10.

Chaos Resilience (VAL11)

Reported as pass/fail for each of four scenarios:

Scenario

Description

Data durability

SQLite data survives SIGTERM + restart (1 cycle)

3× rapid restart

CP survives 3 consecutive SIGTERM + restart cycles

Stuck proxy detection

Device-unresponsive plan detected by stuck scanner

Cascade recovery

3 stuck plans return to terminal states after manual intervention

4. Readiness Level Definitions

VAL25 evaluates three readiness levels. Only Design Partner level is achievable with this validation suite.

Design Partner Ready

Criteria (all must hold):

  1. All five VAL07–VAL11 slices pass (zero failed checks each)

  2. plan_create p99 ≤ 500 ms (VAL07)

  3. VAL08 primary N=100 scenario passes (VAL08-05)

  4. Rollback aggregate rate ≥ 0.990 (VAL10)

  5. Data durability after SIGTERM restart (VAL11)

  6. Evidence timestamps are coherent (single 6-hour evidence window)

Meaning: the core rollout lifecycle is functional and stable enough to offer to early adopters under the scope limitations stated in §1.

GA Ready

Not achievable with VAL07–VAL11 alone. Additional requirements:

  1. PostgreSQL backend validation under equivalent load scenarios

  2. VAL12 30-day fleet soak: Gate D requires rollback rate ≥ 0.990 sustained over 1,440 rounds and fleet plan count ≥ 100 reached

  3. Multi-node HA cluster rollout delivery path validated

Public Production Claim

Requires everything for GA Ready plus:

  1. External security hardening audit

  2. SLA-grade observability and alerting validation

  3. Multi-region topology testing

5. 10-Check Matrix

ID

When

Description

Pass criterion

VAL25-01

Setup

VAL07 latency report found and all checks pass

s07 == "PASS" (9/9 checks)

VAL25-02

Setup

VAL08 throughput report found and all checks pass

s08 == "PASS" (10/10 checks)

VAL25-03

Setup

VAL09 stuck detection report found and all checks pass

s09 == "PASS" (10/10 checks)

VAL25-04

Setup

VAL10 rollback reliability report found and all checks pass

s10 == "PASS" (10/10 checks)

VAL25-05

Setup

VAL11 chaos report found and all checks pass

s11 == "PASS" (10/10 checks)

VAL25-06

Metric

plan_create p99 ≤ 500 ms

lat_p99 <= 500

VAL25-07

Metric

N=100 concurrent with zero errors

val08 VAL08-05 passes

VAL25-08

Metric

Rollback aggregate success rate ≥ 0.990

agg_rate >= 0.990

VAL25-09

Metric

Data durability after SIGTERM restart

val11 VAL11-03 passes

VAL25-10

Summary

Design partner readiness — all above pass and evidence coherent

VAL25-01..09 all PASS and evidence timestamps fall within one 6-hour window

6. Run the Report

Prerequisites

Run the full cli-audit-lab (or at minimum VAL07–VAL11 slices):

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_cli_audit_lab.sh

The evidence directory is printed at the end: evidence/cli-audit-lab-YYYY-MM-DD.

Generate the proof report

bash scripts/labs/run_fleet_rollout_proof_report_val25.sh \
  evidence/cli-audit-lab-2026-03-23

Output files

File

Contents

stdout

Human-readable proof report

val25/val25-proof-report.txt

Same content as stdout

val25/val25-proof-report.json

Machine-readable JSON artifact

7. Final Report Format

VAL25 — Fleet Rollout Proof Report
Generated:    <YYYY-MM-DDTHH:MM:SSZ>
Evidence dir: <path>

Environment:
  Backend:    SQLite (single-node, in-process)
  Chaos:      SIGTERM only (no SIGKILL, no iptables, no network partitions)
  Topology:   Single-node, single-region
  Test suite: VAL07–VAL11 (5 validation slices, embedded in run_cli_audit_lab.sh)
  Evidence:   single coherent evidence window
  Note:       All latency and throughput figures are from controlled lab runs on
              a single host.  Results are NOT representative of multi-node or
              production-scale deployments.

Scenario Coverage:
  VAL07  Fleet Rollout Latency Baseline    PASS     (9/9 checks)
  VAL08  Fleet Rollout Throughput          PASS     (10/10 checks)
  VAL09  Stuck Rollout Detection           PASS     (10/10 checks)
  VAL10  Rollback Reliability              PASS     (10/10 checks)
  VAL11  Fleet Rollout Chaos               PASS     (10/10 checks)

Latency Metrics  (VAL07):
  Samples (plan_create):   20
  p50:    12.4 ms   [target <= 100 ms]   PASS
  p95:    19.8 ms   [target <= 300 ms]   PASS
  p99:    24.1 ms   [target <= 500 ms]   PASS
  plan_list p99:    8.3 ms   [target <= 500 ms]   PASS
  Concurrent:  5/5 succeeded  wall=142 ms   [target: all ok, wall <= 2000 ms]

Throughput Metrics  (VAL08, plans/sec, SQLite single-writer):
  N=1:      18.42 plans/sec   (reference baseline)
  N=10:     62.11 plans/sec
  N=50:     74.33 plans/sec
  N=100:    78.05 plans/sec   wall=6409 ms   errors=0
  Note:   Throughput plateau at high N is expected with SQLite (single writer).
          Zero errors at N=100 is the primary target, not linear scaling.

Recovery Success Rates  (VAL10, rollback reliability):
  Retry    batch:  5/5   rate=1.0000   [target >= 0.990]   PASS
  Rollback batch:  5/5   rate=1.0000   [target >= 0.990]   PASS
  Aggregate:      10/10  rate=1.0000   [target >= 0.990]   PASS

Chaos Resilience  (VAL11, SIGTERM-only):
  Data durability after SIGTERM restart:             PASS
  3x rapid restart resilience:                       PASS
  Device-unresponsive stuck detection (stuck proxy): PASS
  Bulk cascade recovery (3 plans):                   PASS

Stuck Detection Accuracy  (VAL09):
  Threshold:         3 s
  Injection sleep:   4 s
  Empty scans:       1   (no false positives on fresh plans)
  Stale detected:    1   (stuck plans caught above threshold)
  Final scan count:  1   (one residual stuck plan after recovery)

10-Check Matrix:
  VAL25-01 PASS  VAL07 latency report: all 9 checks passed
           val07 9/9 checks passed
  VAL25-02 PASS  VAL08 throughput report: all 10 checks passed
           val08 10/10 checks passed
  VAL25-03 PASS  VAL09 stuck detection report: all 10 checks passed
           val09 10/10 checks passed
  VAL25-04 PASS  VAL10 rollback reliability report: all 10 checks passed
           val10 10/10 checks passed
  VAL25-05 PASS  VAL11 chaos report: all 10 checks passed
           val11 10/10 checks passed
  VAL25-06 PASS  Latency: plan_create p99 <= 500ms [target]
           measured p99=24.1ms
  VAL25-07 PASS  Throughput: N=100 concurrent with zero errors [target]
           total_errors=0
  VAL25-08 PASS  Rollback: aggregate success rate >= 0.990 [target]
           aggregate_rate=1.0000 (10/10)
  VAL25-09 PASS  Chaos: data durability preserved after SIGTERM restart [target]
           val11 VAL11-03 data_durability_post_restart
  VAL25-10 PASS  Summary: design partner readiness — all five slices pass and key targets met

Overall: PASS=10 FAIL=0

Known Failures and Limitations:
  ...  (see §1 Out of scope)

Readiness Conclusion:
  DESIGN PARTNER READY  ✓
  ...
  GA READY  ✗  (NOT YET)
  ...
  PUBLIC PRODUCTION CLAIM  ✗  (NOT YET)
  ...

Verdict: DESIGN PARTNER READY

8. Tooling

File

Role

scripts/labs/run_fleet_rollout_proof_report_val25.sh

VAL25 report generator

scripts/labs/run_cli_audit_lab.sh

Source of VAL07–VAL11 evidence

docs/tutorials/rollout-latency-validation.md

VAL07 formal plan

docs/tutorials/rollout-throughput-validation.md

VAL08 formal plan

docs/tutorials/stuck-detection-validation.md

VAL09 formal plan

docs/tutorials/rollback-reliability-validation.md

VAL10 formal plan

docs/tutorials/chaos-validation.md

VAL11 formal plan