VAL25 — Fleet Rollout Proof Report Generator¶
Audience: engineering leads, product managers, and external reviewers who need a consolidated, evidence-backed assessment of fleet rollout readiness.
VAL25 is a report generator, not a test runner. It reads evidence produced by five completed validation slices (VAL07–VAL11) and produces a single production-quality proof report with explicit readiness conclusions.
1. Scope¶
VAL25 consolidates evidence from:
Slice |
Name |
What it proves |
|---|---|---|
VAL07 |
Fleet Rollout Latency Baseline |
p50/p95/p99 latency under light load (N=20 samples) |
VAL08 |
Fleet Rollout Throughput Validation |
Zero errors at N=1/10/50/100 concurrent plan creates |
VAL09 |
Stuck Rollout Detection |
Stuck plans reliably detected and excluded when paused/terminal |
VAL10 |
Rollback Reliability Validation |
Retry and rollback batch success rate ≥ 99% |
VAL11 |
Fleet Rollout Chaos Test Pack |
Data durability, rapid restart, stuck proxy detection, cascade recovery |
Branch rule: coverage by existing runner¶
Existing asset |
Coverage |
|---|---|
|
30-day fleet soak — separate domain, different metrics |
|
30-day HA soak — separate domain |
|
30-day relay soak — separate domain |
|
Runs VAL07–VAL11 but produces per-slice reports; no cross-slice aggregation |
New aggregator required. No existing script reads and combines VAL07–VAL11 evidence into a single proof artifact.
Out of scope¶
PostgreSQL backend validation (single-node SQLite only)
30-day durability soak (covered by VAL12)
Multi-node or multi-region deployment testing
SIGKILL chaos (SIGTERM only)
Concurrent-under-kill write requests
Edge-agent reconnect after control-plane restart
Throughput beyond N=100 concurrent plans
2. Evidence Structure¶
VAL25 reads from the evidence directory produced by run_cli_audit_lab.sh
(default: evidence/cli-audit-lab-YYYY-MM-DD).
Input file |
Produced by |
Contents |
|---|---|---|
|
|
Latency percentiles, concurrent timing, 9-check results |
|
|
Per-tier throughput, wall times, 10-check results |
|
|
Stuck detection counts, threshold/sleep config, 10-check results |
|
|
Success rates per batch type, 10-check results |
|
|
Chaos scenario outcomes, 10-check results |
VAL25 expects these five reports to come from one coherent run_cli_audit_lab.sh
evidence set. The generator checks each report’s embedded timestamp and
requires the found reports to fall within a single 6-hour evidence window
before it can issue a design-partner readiness conclusion.
Missing slice reports are reported as MISSING in the coverage table rather
than aborting. Reports that exist but do not match the expected schema are also
degraded to MISSING with a schema-mismatch detail instead of crashing the
generator.
3. Metric Definitions and Targets¶
VAL25 reports on the following metrics. Targets are drawn from the Gap Closure Workplan (v1.2). Measured results and proposed targets are explicitly separated throughout the report.
Latency (VAL07)¶
Metric |
Target |
Source |
|---|---|---|
plan_create p50 |
≤ 100 ms |
VAL07 workplan target |
plan_create p95 |
≤ 300 ms |
VAL07 workplan target |
plan_create p99 |
≤ 500 ms |
VAL07 workplan target (primary) |
plan_list p99 |
≤ 500 ms |
VAL07 workplan target |
Concurrent 5× wall |
≤ 2,000 ms |
VAL07 workplan target |
Sample count |
N = 20 |
Fixed in VAL07 runner |
Measurement method: curl -w '%{time_total}' on each request; Python
nearest-rank percentile computation.
Throughput (VAL08)¶
Metric |
Target |
Source |
|---|---|---|
N=100 errors |
= 0 |
VAL08 primary target |
N=100 wall time |
≤ 30,000 ms |
VAL08 workplan target |
N=1 vs N=100 throughput |
N=100 ≥ N=1 plans/sec |
Scaling sanity check |
SQLite single-writer plateau is expected at high N — linear scaling is explicitly not a target. The throughput figure is informational; zero errors is the binding criterion.
Recovery Success Rates (VAL10)¶
Metric |
Target |
Source |
|---|---|---|
Retry batch success rate |
≥ 0.990 |
VAL10 workplan target |
Rollback batch success rate |
≥ 0.990 |
VAL10 workplan target |
Aggregate success rate |
≥ 0.990 |
VAL10 primary target |
Batch size: N=5 per type (10 total). Aggregate = (retry_ok + rollback_ok) / 10.
Chaos Resilience (VAL11)¶
Reported as pass/fail for each of four scenarios:
Scenario |
Description |
|---|---|
Data durability |
SQLite data survives SIGTERM + restart (1 cycle) |
3× rapid restart |
CP survives 3 consecutive SIGTERM + restart cycles |
Stuck proxy detection |
Device-unresponsive plan detected by stuck scanner |
Cascade recovery |
3 stuck plans return to terminal states after manual intervention |
4. Readiness Level Definitions¶
VAL25 evaluates three readiness levels. Only Design Partner level is achievable with this validation suite.
Design Partner Ready¶
Criteria (all must hold):
All five VAL07–VAL11 slices pass (zero failed checks each)
plan_create p99 ≤ 500 ms (VAL07)
VAL08 primary N=100 scenario passes (
VAL08-05)Rollback aggregate rate ≥ 0.990 (VAL10)
Data durability after SIGTERM restart (VAL11)
Evidence timestamps are coherent (single 6-hour evidence window)
Meaning: the core rollout lifecycle is functional and stable enough to offer to early adopters under the scope limitations stated in §1.
GA Ready¶
Not achievable with VAL07–VAL11 alone. Additional requirements:
PostgreSQL backend validation under equivalent load scenarios
VAL12 30-day fleet soak: Gate D requires rollback rate ≥ 0.990 sustained over 1,440 rounds and fleet plan count ≥ 100 reached
Multi-node HA cluster rollout delivery path validated
Public Production Claim¶
Requires everything for GA Ready plus:
External security hardening audit
SLA-grade observability and alerting validation
Multi-region topology testing
5. 10-Check Matrix¶
ID |
When |
Description |
Pass criterion |
|---|---|---|---|
VAL25-01 |
Setup |
VAL07 latency report found and all checks pass |
|
VAL25-02 |
Setup |
VAL08 throughput report found and all checks pass |
|
VAL25-03 |
Setup |
VAL09 stuck detection report found and all checks pass |
|
VAL25-04 |
Setup |
VAL10 rollback reliability report found and all checks pass |
|
VAL25-05 |
Setup |
VAL11 chaos report found and all checks pass |
|
VAL25-06 |
Metric |
plan_create p99 ≤ 500 ms |
|
VAL25-07 |
Metric |
N=100 concurrent with zero errors |
|
VAL25-08 |
Metric |
Rollback aggregate success rate ≥ 0.990 |
|
VAL25-09 |
Metric |
Data durability after SIGTERM restart |
|
VAL25-10 |
Summary |
Design partner readiness — all above pass and evidence coherent |
VAL25-01..09 all PASS and evidence timestamps fall within one 6-hour window |
6. Run the Report¶
Prerequisites¶
Run the full cli-audit-lab (or at minimum VAL07–VAL11 slices):
export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local
bash scripts/labs/run_cli_audit_lab.sh
The evidence directory is printed at the end:
evidence/cli-audit-lab-YYYY-MM-DD.
Generate the proof report¶
bash scripts/labs/run_fleet_rollout_proof_report_val25.sh \
evidence/cli-audit-lab-2026-03-23
Output files¶
File |
Contents |
|---|---|
stdout |
Human-readable proof report |
|
Same content as stdout |
|
Machine-readable JSON artifact |
7. Final Report Format¶
VAL25 — Fleet Rollout Proof Report
Generated: <YYYY-MM-DDTHH:MM:SSZ>
Evidence dir: <path>
Environment:
Backend: SQLite (single-node, in-process)
Chaos: SIGTERM only (no SIGKILL, no iptables, no network partitions)
Topology: Single-node, single-region
Test suite: VAL07–VAL11 (5 validation slices, embedded in run_cli_audit_lab.sh)
Evidence: single coherent evidence window
Note: All latency and throughput figures are from controlled lab runs on
a single host. Results are NOT representative of multi-node or
production-scale deployments.
Scenario Coverage:
VAL07 Fleet Rollout Latency Baseline PASS (9/9 checks)
VAL08 Fleet Rollout Throughput PASS (10/10 checks)
VAL09 Stuck Rollout Detection PASS (10/10 checks)
VAL10 Rollback Reliability PASS (10/10 checks)
VAL11 Fleet Rollout Chaos PASS (10/10 checks)
Latency Metrics (VAL07):
Samples (plan_create): 20
p50: 12.4 ms [target <= 100 ms] PASS
p95: 19.8 ms [target <= 300 ms] PASS
p99: 24.1 ms [target <= 500 ms] PASS
plan_list p99: 8.3 ms [target <= 500 ms] PASS
Concurrent: 5/5 succeeded wall=142 ms [target: all ok, wall <= 2000 ms]
Throughput Metrics (VAL08, plans/sec, SQLite single-writer):
N=1: 18.42 plans/sec (reference baseline)
N=10: 62.11 plans/sec
N=50: 74.33 plans/sec
N=100: 78.05 plans/sec wall=6409 ms errors=0
Note: Throughput plateau at high N is expected with SQLite (single writer).
Zero errors at N=100 is the primary target, not linear scaling.
Recovery Success Rates (VAL10, rollback reliability):
Retry batch: 5/5 rate=1.0000 [target >= 0.990] PASS
Rollback batch: 5/5 rate=1.0000 [target >= 0.990] PASS
Aggregate: 10/10 rate=1.0000 [target >= 0.990] PASS
Chaos Resilience (VAL11, SIGTERM-only):
Data durability after SIGTERM restart: PASS
3x rapid restart resilience: PASS
Device-unresponsive stuck detection (stuck proxy): PASS
Bulk cascade recovery (3 plans): PASS
Stuck Detection Accuracy (VAL09):
Threshold: 3 s
Injection sleep: 4 s
Empty scans: 1 (no false positives on fresh plans)
Stale detected: 1 (stuck plans caught above threshold)
Final scan count: 1 (one residual stuck plan after recovery)
10-Check Matrix:
VAL25-01 PASS VAL07 latency report: all 9 checks passed
val07 9/9 checks passed
VAL25-02 PASS VAL08 throughput report: all 10 checks passed
val08 10/10 checks passed
VAL25-03 PASS VAL09 stuck detection report: all 10 checks passed
val09 10/10 checks passed
VAL25-04 PASS VAL10 rollback reliability report: all 10 checks passed
val10 10/10 checks passed
VAL25-05 PASS VAL11 chaos report: all 10 checks passed
val11 10/10 checks passed
VAL25-06 PASS Latency: plan_create p99 <= 500ms [target]
measured p99=24.1ms
VAL25-07 PASS Throughput: N=100 concurrent with zero errors [target]
total_errors=0
VAL25-08 PASS Rollback: aggregate success rate >= 0.990 [target]
aggregate_rate=1.0000 (10/10)
VAL25-09 PASS Chaos: data durability preserved after SIGTERM restart [target]
val11 VAL11-03 data_durability_post_restart
VAL25-10 PASS Summary: design partner readiness — all five slices pass and key targets met
Overall: PASS=10 FAIL=0
Known Failures and Limitations:
... (see §1 Out of scope)
Readiness Conclusion:
DESIGN PARTNER READY ✓
...
GA READY ✗ (NOT YET)
...
PUBLIC PRODUCTION CLAIM ✗ (NOT YET)
...
Verdict: DESIGN PARTNER READY
8. Tooling¶
File |
Role |
|---|---|
|
VAL25 report generator |
|
Source of VAL07–VAL11 evidence |
|
VAL07 formal plan |
|
VAL08 formal plan |
|
VAL09 formal plan |
|
VAL10 formal plan |
|
VAL11 formal plan |