VAL29 — AutonomyOps v1 Public-Claim Evidence Matrix¶
Audience: founders, engineering leads, product managers, and external reviewers making ship/no-ship or claim-level decisions for AutonomyOps v1.
VAL29 is a meta-aggregator, not a test runner. It reads the four proof-report JSON artifacts produced by VAL25–VAL28 and produces a single capability-level evidence matrix with honest, per-claim readiness assessments. Nothing is soft-pedalled. BETA claims are labelled explicitly. Gaps are stated precisely.
1. Scope¶
VAL29 reads from:
Source |
Contents |
|---|---|
|
Fleet rollout proof (VAL07–VAL11) |
|
HA control-plane proof (VAL13–VAL17) |
|
Edge relay proof (VAL19–VAL23, optional VAL24) |
|
Cross-cutting proof (VAL01–VAL06) |
VAL29 does not re-run any tests. It is idempotent and read-only.
Before VAL29 may emit DESIGN PARTNER READY, the four proof reports must also
fall within a single 7-day evidence campaign window and a disclosure
artifact must exist at val29/design-partner-disclosures.json inside the
cli-audit-lab evidence directory.
What VAL29 covers¶
The full v1 claim set across four capability groups:
Group |
Claims |
|---|---|
Fleet Rollouts |
Latency, throughput, stuck detection, rollback, chaos, soak, PG backend |
HA Control Plane |
Failover, zero data loss, replication lag, backup/restore, split-brain, quorum, soak |
Edge Relay |
Outage, impairment, throughput, queue/overflow, deadletter, bandwidth, soak, multi-peer |
Cross-Cutting |
Cert rotation, trust-chain, RBAC, audit, OTel, support bundle, PG audit, external audit |
What VAL29 does NOT cover¶
OS Reconstruction (hardware-gated, out of scope for current suite)
Multi-architecture container validation (Gate E, hardware-gated)
Native riscv64 hardware CI (Gate E, hardware-gated)
Any item requiring satellite/cellular connectivity hardware
2. Evidence State Definitions¶
State |
Meaning |
|---|---|
|
Claim fully supported by completed VAL runs with all checks passing |
|
Claim supported but with documented limitations; must be disclosed to users |
|
Framework / tooling exists; Gate D not yet run (30-day soaks) |
|
Not validated; additional engineering or hardware required |
|
Required for Public Production Claim; beyond current VAL scope |
3. Recommendation Definitions¶
Recommendation |
Meaning |
|---|---|
|
Safe to include in design partner ship as-is |
|
May ship to design partners with explicit written disclosure |
|
Cannot ship until Gate D passes (30-day soaks) |
|
Must not claim until gap closed; omit from marketing until then |
|
Must not claim until third-party evidence obtained |
4. Evidence Matrix¶
Fleet Rollouts¶
ID |
Claim |
Evidence |
State |
Recommendation |
|---|---|---|---|---|
FR-PERF-01 |
Plan creation latency p99 ≤ 500 ms |
VAL07 |
VALIDATED |
OK_DESIGN_PARTNER |
FR-THRU-01 |
N=100 concurrent device rollouts; zero errors |
VAL08 |
VALIDATED |
OK_DESIGN_PARTNER |
FR-RECV-01 |
Stuck rollout detection + recovery (retry/rollback) |
VAL09 |
VALIDATED |
OK_DESIGN_PARTNER |
FR-RECV-02 |
Rollback success rate ≥ 99% (aggregate over 10 plans) |
VAL10 |
VALIDATED |
OK_DESIGN_PARTNER |
FR-CHOS-01 |
Fleet chaos resilience: CP restart, kill cycles, corrupt artifacts |
VAL11 |
VALIDATED |
OK_DESIGN_PARTNER |
FR-SOAK-01 |
30-day fleet soak: ≥ 100 plans, rollback rate ≥ 0.990 (Gate D) |
VAL12 |
NOT_STARTED |
NOT_STARTED |
FR-INFR-01 |
PostgreSQL backend: CP runs on PG with full validation |
None |
DEFER |
DEFER |
FR-PERF-02 |
Throughput recalibrated on production-representative hardware |
None |
DEFER |
DEFER |
Key limitations:
FR-CHOS-01: SIGTERM only — SIGKILL kill and iptables chaos not tested.
FR-SOAK-01: VAL12 framework is complete; Gate D requires a 30-day continuous run.
FR-INFR-01 / FR-PERF-02: All fleet VALs run against SQLite on a single host. PG backend and production hardware must be validated before a GA claim.
HA Control Plane¶
ID |
Claim |
Evidence |
State |
Recommendation |
|---|---|---|---|---|
HA-FAIL-01 |
Leader failover ≤ 5,000 ms: SIGTERM, SIGKILL, 3× rapid cycles |
VAL13 |
VALIDATED |
OK_DESIGN_PARTNER |
HA-FAIL-02 |
Zero data loss across leader failover |
VAL13 |
VALIDATED |
OK_DESIGN_PARTNER |
HA-REPL-01 |
Replication lag distribution; derived alerting thresholds |
VAL14 |
BETA |
BETA_ONLY |
HA-BKUP-01 |
Backup/restore: timing ≤ 30 s / 60 s, SHA-256 integrity, correctness |
VAL15 |
VALIDATED |
OK_DESIGN_PARTNER |
HA-SBRC-01 |
Split-brain detection (epoch divergence) + manual recovery |
VAL16 |
VALIDATED |
OK_DESIGN_PARTNER |
HA-QRMO-01 |
Quorum loss: detected ≤ 30,000 ms, writes blocked, recovery |
VAL17 |
VALIDATED |
OK_DESIGN_PARTNER |
HA-FAIL-03 |
Streaming-replication promotion (standby → primary, real PG HA) |
None |
DEFER |
DEFER |
HA-SOAK-01 |
30-day HA soak: ≥ 3 failovers, failover_ms ≤ 10,000, continuity = 1.0 |
VAL18 |
NOT_STARTED |
NOT_STARTED |
HA-CHOS-01 |
Real network partition chaos (iptables / tc) |
None |
DEFER |
DEFER |
Key limitations:
HA-REPL-01: Alerting thresholds (healthy/degraded/alert ms) are derived from Docker write_lag measurements. Docker disk I/O is materially faster than cloud VMs. These thresholds must be recalibrated against production
write_lagobservations before any alerting deployment. Treat as informational for design partner.HA-SBRC-01: SQL metadata injection only — no real network partitions. Automatic split-brain recovery is out of scope (manual
promote-leaderonly).HA-QRMO-01:
docker stop/startonly — notiptables; write-gate verified via status fields (no/v1/rolloutsendpoint on this binary).HA-FAIL-03 / HA-CHOS-01: Required for GA claim.
Edge Relay¶
ID |
Claim |
Evidence |
State |
Recommendation |
|---|---|---|---|---|
RL-IMPW-01 |
Outage handling: all segments → DEADLETTER within max_retry exhaustion |
VAL19 |
VALIDATED |
OK_DESIGN_PARTNER |
RL-IMPW-02 |
Bandwidth impairment (1/10 Mbps): delivery confirmed, throughput informational |
VAL19 |
BETA |
BETA_ONLY |
RL-IMPW-03 |
Latency impairment (200/500 ms): delivery confirmed within dial/ack timeouts |
VAL19 |
VALIDATED |
OK_DESIGN_PARTNER |
RL-THRU-01 |
Throughput: 5 tiers T1–T5, zero loss, queue monotonicity |
VAL20 |
BETA |
BETA_ONLY |
RL-QMGM-01 |
Queue drain (200×64 B), LRU eviction accounting, relay-status accuracy |
VAL21 |
VALIDATED |
OK_DESIGN_PARTNER |
RL-DEAD-01 |
Deadletter: retry → delivery (Group R = 1.000), BoltDB retention, purge |
VAL22 |
VALIDATED |
OK_DESIGN_PARTNER |
RL-BAND-01 |
Bandwidth management: unlimited/rate-only/quota-only/hot-reload; S-E unit test |
VAL23 |
BETA |
BETA_ONLY |
RL-SOAK-01 |
30-day relay soak: rounds ≥ 1,440, clean_delivery_rate ≥ 0.990, loss = 0 |
VAL24 |
NOT_STARTED / VALIDATED |
NOT_STARTED / OK_DESIGN_PARTNER |
RL-MULT-01 |
Multi-peer relay: delivery isolation between ≥ 2 concurrent peers |
None |
DEFER |
DEFER |
RL-CONN-01 |
Contested-connectivity (satellite, cellular, WAN loss > 5%) |
None |
DEFER |
DEFER |
RL-CRAS-01 |
BoltDB crash consistency on unclean power failure |
None |
DEFER |
FUTURE_REQUIRED |
Key limitations:
RL-IMPW-02: No hard throughput SLA under bandwidth impairment. Figures are informational from proxy stats (
last_conn_bytes / last_conn_ms). SLA-grade throughput targets under impairment have not been defined or measured.RL-THRU-01: segs/sec and bytes/sec figures are from single-host Docker runs only. No production hardware baseline. No hard throughput floor is set — zero-loss correctness is the only validated property.
RL-BAND-01: S-E (daily quota reset) validated by injected-clock unit test only. A live 24-hour run has not been performed. This must be resolved (live run or formal exception approval) before the bandwidth management claim can leave beta.
RL-SOAK-01: State is driven by the direct
soak_val24.gate_d_overallsignal from VAL27. Until Gate D passes, the row remainsNOT_STARTED.RL-CONN-01: Must be explicitly disclosed as a gap if shipping to design partners who expect satellite/cellular support.
RL-CRAS-01: Power-failure crash consistency is required for Public Production Claim.
Cross-Cutting¶
ID |
Claim |
Evidence |
State |
Recommendation |
|---|---|---|---|---|
XC-CERT-01 |
Zero-downtime cert rotation: detection, mTLS continuity, timing ≤ 300 s, audit |
VAL01 |
VALIDATED |
OK_DESIGN_PARTNER |
XC-CERT-02 |
Trust-chain rejection: missing, invalid chain, expired, revoked, wrong server |
VAL02 |
VALIDATED |
OK_DESIGN_PARTNER |
XC-RBAC-01 |
RBAC enforcement: 5 DENY + 5 ALLOW + 3 NOT_GUARDED + 1 AUDIT check |
VAL03 |
VALIDATED¹ |
OK_DESIGN_PARTNER |
XC-AUDT-01 |
Audit completeness: 25 event types, 6 categories, schema, latency ≤ 2,000 ms |
VAL04 |
VALIDATED |
OK_DESIGN_PARTNER |
XC-OTEL-01 |
OTel: Prometheus /metrics, WAL pipeline, OTLP flush, trace ID propagation |
VAL05 |
VALIDATED |
OK_DESIGN_PARTNER |
XC-BNDL-01 |
Support bundle: archive ≤ 30 s, 6 collectors, secrets redacted, degraded mode |
VAL06 |
VALIDATED |
OK_DESIGN_PARTNER |
XC-AUDT-02 |
PG-backed audit store: query performance under load |
None |
DEFER |
DEFER |
XC-OTEL-02 |
OTel pipeline validated against production-grade OTLP collector |
None |
DEFER |
DEFER |
XC-SCRT-01 |
External security audit of cert management and RBAC surfaces |
None |
FUTURE_REQUIRED |
FUTURE_REQUIRED |
XC-COMP-01 |
Compliance audit of audit completeness (SOC 2, etc.) |
None |
FUTURE_REQUIRED |
FUTURE_REQUIRED |
¹ VAL03 checks can be SKIP (not FAIL) when the HA server is unavailable. If any checks were skipped, re-run with HA server available to confirm full 14-check coverage before a GA claim.
Key limitations:
XC-CERT-01: CRL is loaded at CP start; runtime cert revocation requires CP restart. Tested with SQLite-backed CP only.
XC-OTEL-01: OTLP sink is a local test server (127.0.0.1:14318), not a production collector. Metrics use
prometheus/client_golang(not OTel SDK metrics).XC-BNDL-01: Tested with synthetic secrets (known
deadbeefsalt andval06-secret-passpassword). Production secret scanning completeness not independently audited.XC-AUDT-02: The
--pg-urlaudit path was not load-tested; SQLite is the only audited backend. Required before GA.XC-SCRT-01 / XC-COMP-01: Third-party evidence required for Public Production Claim.
5. Readiness Levels¶
Design Partner Ready¶
All four proof reports (VAL25/VAL26/VAL27/VAL28) must confirm
design_partner: true, the proof reports must fall within one 7-day evidence
campaign window, and the following BETA disclosures must be recorded in
val29/design-partner-disclosures.json and made in writing to each design
partner:
Replication lag alerting thresholds (HA-REPL-01): Derived from Docker measurements. Must be recalibrated against production
write_lagbefore alerting deployment.Relay throughput figures (RL-THRU-01): Single-host Docker only. No production hardware calibration. No hard throughput SLA.
Relay bandwidth impairment throughput (RL-IMPW-02): Informational proxy stats only. No SLA-grade target defined.
Relay daily quota reset (RL-BAND-01): Validated by injected-clock unit test only. Live 24-hour run not performed.
Relay 30-day soak (RL-SOAK-01): Not yet completed. Reliability claims for long-running deployments are provisional until Gate D passes.
Contested-connectivity (RL-CONN-01): Not validated. Satellite/cellular connectivity is out of scope for all current relay VALs.
GA Ready¶
Design Partner PLUS all of:
VAL12 Gate D (fleet 30-day soak): rollback_success_rate ≥ 0.990
VAL18 Gate D (HA 30-day soak): failovers ≥ 3, failover_ms ≤ 10,000, data_continuity_rate = 1.0, ha_uptime_pct ≥ 99.9
VAL24 Gate D (relay 30-day soak): rounds ≥ 1,440, clean_delivery_rate ≥ 0.990, loss = 0
Multi-peer relay validation (at least basic 2-peer delivery + isolation)
VAL20 throughput recalibrated on production-representative hardware
VAL14 alerting thresholds recalibrated against production write_lag
VAL23 S-E daily reset: live 24-hour run OR formally approved exception
VAL03 full 14-check coverage with HA server (no SKIPs)
Streaming-replication promotion failover (HA-FAIL-03)
PG-backed audit store query performance (XC-AUDT-02)
Public Production Claim¶
GA Ready PLUS:
BoltDB crash consistency on unclean power failure (RL-CRAS-01)
Multi-peer relay soaks with message isolation proof
Production-grade observability: queue depth alerting, deadletter paging, bandwidth quota depletion notification
Real network partition chaos for HA and relay (HA-CHOS-01)
External security audit (XC-SCRT-01)
Penetration testing of mTLS trust-chain boundaries
Compliance audit of audit event completeness (XC-COMP-01)
Production-hardware throughput benchmarks for fleet rollouts and relay
6. Run the Matrix¶
Prerequisites¶
Run all four proof report generators first:
# Fleet, HA, and cross-cutting (share a cli-audit-lab evidence dir)
bash scripts/labs/run_fleet_rollout_proof_report_val25.sh \
evidence/cli-audit-lab-YYYY-MM-DD
bash scripts/labs/run_ha_proof_report_val26.sh \
evidence/cli-audit-lab-YYYY-MM-DD
bash scripts/labs/run_crosscut_proof_report_val28.sh \
evidence/cli-audit-lab-YYYY-MM-DD
# Relay (uses auto-discovered standalone evidence dirs)
bash scripts/labs/run_relay_proof_report_val27.sh evidence/
Generate the evidence matrix¶
Create the disclosure artifact first:
mkdir -p evidence/cli-audit-lab-YYYY-MM-DD/val29
cat > evidence/cli-audit-lab-YYYY-MM-DD/val29/design-partner-disclosures.json <<'EOF'
{
"written_disclosures": [
"ha_replication_lag_thresholds_docker_derived",
"relay_throughput_single_host_only",
"relay_impairment_throughput_informational_only",
"relay_daily_quota_reset_unit_test_only",
"relay_soak_reliability_provisional_until_gate_d",
"relay_contested_connectivity_not_validated"
]
}
EOF
Then run VAL29:
bash scripts/labs/run_evidence_matrix_val29.sh \
evidence/cli-audit-lab-YYYY-MM-DD \
evidence/
Output files¶
File |
Contents |
|---|---|
stdout |
Evidence matrix report |
|
Same content as stdout |
|
Machine-readable JSON with full matrix |
The JSON artifact contains the full matrix array plus:
readinessevidence_campaigndesign_partner_disclosures
7. Tooling¶
File |
Role |
|---|---|
|
VAL29 evidence matrix generator |
|
VAL25 fleet rollout proof (input) |
|
VAL26 HA proof (input) |
|
VAL27 relay proof (input) |
|
VAL28 cross-cutting proof (input) |
|
VAL25 formal plan |
|
VAL26 formal plan |
|
VAL27 formal plan |
|
VAL28 formal plan |
|
This document |