VAL 10 — Rollback Reliability Validation¶
Purpose¶
This plan validates that the autonomy rollback command surface (preview + execute) is
reliable across the supported rollout strategies, surfaces correct operator diagnostics,
and emits the expected audit events. It establishes a measurable rollback success rate
against the workplan target of ≥99%.
Claims Under Test¶
ID |
Claim |
|---|---|
VAL10-C1 |
|
VAL10-C2 |
|
VAL10-C3 |
|
VAL10-C4 |
Aggregate rollback success rate across all 10 executions is ≥ 99%; this slice’s own retained |
Branch-Specific Rule¶
Question |
Answer |
|---|---|
Covered by existing lab? |
Partially. |
Lab to extend |
|
Why new function? |
The existing rollback lab at port 18091 is torn down mid-lab; its |
New runner required? |
No. Extending |
Scenario Matrix¶
VAL10-01 — Preview All Targets Exit 0¶
Action: Run autonomy rollback preview --target <kind> for all four kinds
(rollout_plan, rollout_stage, ha_leader_resign, relay_deadletter).
Evidence: val10-preview-rollout_plan.txt, val10-preview-rollout_stage.txt,
val10-preview-ha_leader_resign.txt, val10-preview-relay_deadletter.txt
Pass criterion: All 4 commands exit 0. preview_errors=0.
Note: Preview is read-only. No control-plane connection required.
VAL10-02 — Preview rollout_plan JSON Schema¶
Action: autonomy rollback preview --target rollout_plan --output json.
Evidence: val10-preview-rollout_plan.json, val10-preview-rollout_plan-check.txt
Pass criterion: safety_class=terminal, orchestrated=true,
valid_strategies contains both retry and rollback.
VAL10-03 — Preview relay_deadletter JSON Schema¶
Action: autonomy rollback preview --target relay_deadletter --output json.
Evidence: val10-preview-relay_deadletter.json, val10-preview-relay-check.txt
Pass criterion: orchestrated=false, manual_path contains edgectl.
VAL10-04 — Execute retry: Batch Success Rate¶
Action: Create 5 plans (val10-retry-1 through val10-retry-5); run
autonomy rollback execute --target rollout_plan --strategy retry for each.
Evidence: val10-retry-plans-created.txt, retry/execute-retry-{1..5}.txt,
val10-retry-rate.txt
Pass criterion: ok=5, fail=0, success_rate=1.000 (100% ≥ 99%).
Expected output: executed target=rollout_plan resource=val10-retry-N outcome=success previous=published new=active
VAL10-05 — Execute rollback: Batch Success Rate¶
Action: Create 5 plans (val10-rollback-1 through val10-rollback-5); run
autonomy rollback execute --target rollout_plan --strategy rollback for each.
Evidence: val10-rollback-plans-created.txt, rollback/execute-rollback-{1..5}.txt,
val10-rollback-rate.txt
Pass criterion: ok=5, fail=0, success_rate=1.000 (100% ≥ 99%).
Expected output: executed target=rollout_plan resource=val10-rollback-N outcome=success previous=published new=rolled_back
VAL10-06 — Execute JSON Output Shape¶
Action: Re-execute retry on val10-retry-1 (already in active phase, so retry
is idempotent) with --output json.
Evidence: val10-execute-json.json, val10-execute-json-check.txt
Pass criterion: Response JSON has non-empty Outcome (or outcome), NewState
(or new_state), and Kind (or kind) fields.
Note: Field names are Go struct exported names rendered by json.MarshalIndent;
the check handles both CamelCase and snake_case variants.
VAL10-07 — Execute Error: Nonexistent Plan¶
Action: rollback execute --target rollout_plan --strategy retry --resource val10-nonexistent-plan.
Evidence: val10-execute-nonexistent.txt, val10-nonexistent-check.txt
Pass criterion: Command exits non-zero.
Expected message: rollback execute: ...not found (HTTP 404 from CP).
VAL10-08 — Execute Error: relay_deadletter Not Orchestrated¶
Action: rollback execute --target relay_deadletter --resource seg-1/peer-1.
Evidence: val10-execute-relay-not-orchestrated.txt,
val10-relay-not-orchestrated-check.txt
Pass criterion: Command exits non-zero and output contains edgectl instructions.
Expected message: includes edgectl relay deadletter retry|purge instructions.
VAL10-09 — Audit: rollback.preview.requested Events¶
Action: autonomy audit query --event-type rollback.preview.requested against
the retained audit store, scoped to this slice’s actor and start time.
Evidence: val10-audit-preview-events.json, val10-audit-preview-check.txt
Pass criterion: count ≥ 4 for actor val10-preview-op with timestamp >= val10_start_time.
VAL10-10 — Audit: rollback.executed Success Events + Aggregate Rate¶
Action: autonomy audit query --event-type rollback.executed --outcome success
scoped to this slice’s actor and start time; compute aggregate rate from
VAL10-04 + VAL10-05 results.
Evidence: val10-audit-execute-events.json, val10-aggregate-rate.txt
Pass criterion:
agg_success_rate ≥ 0.990(workplan target: ≥99%)At least 10 retained
rollback.executedsuccess events from this slice’s batch executes
Harness Plan¶
Tools¶
Tool |
Purpose |
|---|---|
|
Read-only safety profile verification |
|
Dispatches to |
|
Create test plans via raw API |
|
JSON field extraction from preview/execute JSON output |
|
Verify actor-scoped, time-scoped audit events from retained store |
Control-Plane Setup¶
Resource |
Value |
|---|---|
Port |
|
Data dir |
|
RBAC |
|
Operator identity |
|
Plan Lifecycle¶
Plans |
Phase at creation |
Strategy |
Expected new_phase |
|---|---|---|---|
|
published |
retry |
active |
|
published |
rollback |
rolled_back |
Both strategies operate on published plans:
retry:recoverRetry— checks not-terminal, not-paused → callsUpdatePhase(active)+ refreshesupdated_atrollback:recoverRollback— checks not-terminal → callsRollbackPlan()→ terminal phase
Success Rate Measurement¶
retry_rate = retry_ok / (retry_ok + retry_fail)
rollback_rate = rollback_ok / (rollback_ok + rollback_fail)
aggregate_rate = (retry_ok + rollback_ok) / (retry_total + rollback_total)
target: aggregate_rate ≥ 0.990
With 10 clean plan creates and no external interference, the expected result is
aggregate_rate = 1.000.
Known Failure Modes¶
Mode |
Description |
Detectable by |
|---|---|---|
CP start failure |
Port conflict or binary missing |
|
Plan create failure |
RBAC blocking or duplicate ID |
|
Strategy=retry on terminal |
Plan already in terminal phase |
|
Missing |
CLI validation rejects before dispatch |
Exit code check; error message in output |
Audit query empty |
Audit store not populated for this slice |
|
Out-of-Scope Items¶
Item |
Reason |
|---|---|
|
Requires |
|
Covered by existing |
Automatic trigger rollback |
|
30-day soak (≥100 plans) |
Scope of workplan GA gate; not covered by CLI lab |
PostgreSQL backend rollback |
Requires live PG instance |
Evidence Files¶
File |
Description |
|---|---|
|
CP startup and per-request logs |
|
CP health check result |
|
Text preview output |
|
Text preview output |
|
Text preview output |
|
Text preview output |
|
|
|
JSON preview for rollout_plan |
|
|
|
JSON preview for relay_deadletter |
|
|
|
5 plan create results (HTTP 201) |
|
Per-plan retry execute stdout/stderr |
|
|
|
5 plan create results (HTTP 201) |
|
Per-plan rollback execute stdout/stderr |
|
|
|
JSON output for retry execute (VAL10-06) |
|
|
|
Error output for nonexistent plan |
|
|
|
Error + edgectl instructions |
|
|
|
|
|
|
|
|
|
|
|
Human-readable composite report (10 checks + rate table) |
|
Machine-readable JSON with |
Pass/Fail Criteria¶
Full pass: All 10 checks report PASS.
Minimum acceptable: VAL10-04, VAL10-05, VAL10-10 pass — success path for both strategies at ≥99% rate.
Key thresholds:
Check |
Threshold |
|---|---|
VAL10-04 (retry rate) |
|
VAL10-05 (rollback rate) |
|
VAL10-07 (nonexistent error) |
exit code ≠ 0 |
VAL10-08 (relay error) |
exit code ≠ 0 + |
VAL10-09 (preview audit) |
|
VAL10-10 (aggregate rate) |
|