VAL 01 — Zero-Downtime Certificate Rotation Validation¶
1. Purpose and Guarantee¶
This validation proves that autonomy cert rotate preserves point-in-time
connectivity for new mTLS client connections and completes within a bounded time
window.
The zero-downtime claim in this validation is client-side, not server-side:
the control-plane process remains running the entire time, while the client-side
certificate and private key files for node-c are replaced atomically in place.
The pre-rotation and post-rotation probes use the same file paths, so a new
curl invocation naturally picks up the rotated client certificate without any
control-plane restart.
cert rotate is crash-safe: if the process is interrupted before the rename, the
temporary file is left behind and the original cert is untouched. If interrupted after
the rename, the new cert is in place and the key file is updated in a separate atomic
write. The worst case is a brief window where the cert and key files are from different
generations; this is detectable and correctable by re-running cert rotate.
This validation captures live evidence of all six claims:
# |
Claim |
|---|---|
VAL01-1 |
|
VAL01-2 |
Old cert accepted over live mTLS before rotation |
VAL01-3 |
|
VAL01-4 |
Expiry window clears after rotation to 90-day validity |
VAL01-5 |
New cert accepted over live mTLS without restarting the control-plane |
VAL01-6 |
|
2. Scope¶
Covered¶
Client certificate rotation (edge node → control-plane mTLS path)
Expiry detection via
autonomy cert list --expiring-within-daysAtomic in-place rotation via
autonomy cert rotateTiming bound: elapsed wall-clock seconds ≤ 300
New cert accepted without restarting the control-plane (new-connection continuity claim)
cert.rotatedaudit event captured in the retained file-backed storeSerial number change proving a new keypair was issued (not a no-op renewal)
Not covered (known gaps)¶
CA rotation: no CLI support; requires manual key replacement and leaf re-issuance
Server-side cert rotation: rotating the control-plane’s own TLS certificate requires a process restart (server TLS config is loaded once at startup); this is out of scope
OCSP-based live status: not implemented; revocation checking is CRL-only
In-flight request continuity: VAL01 proves successful new handshakes before and after rotation, not an uninterrupted stream of requests across the rotation window
Multi-node simultaneous rotation: single node pair (client + server); coordinated rotation across an HA cluster is covered in the self-hosted certificate rotation runbook (HA control-plane context section)
HSM-backed CA keys: the lab uses a locally generated CA; production CA key management is out of scope for automated validation
As of March 19, 2026, the current validation backlog does not define later VAL slices for CA rotation, server-certificate hot reload, uninterrupted in-flight continuity, or coordinated multi-node rotation. The only adjacent planned follow-on is VAL-02 Trust-chain rejection validation, which is expected to cover certificate accept/reject behavior, not the excluded rotation workflows above.
3. Harness¶
VAL01 is embedded in run_cert_lab() as Phase 8 within
scripts/labs/run_cli_audit_lab.sh. No separate runner is required.
Setup inherited from earlier phases:
Phases 1–4: local CA generated by
openssl;node-aandnode-bcerts issuedPhase 5: mTLS-enforcing control-plane started on
localhost:18443using the same CA (--tls-cert-file,--tls-key-file,--tls-ca-file,--tls-crl-file)Phases 6–7: CRL distribution and revocation tests
Phase 8 setup:
Issue a 2-day certificate for
node-c.edge.localusing the same CA (short validity so it falls inside the--expiring-within-days 5window immediately)Run the six validation checks against this cert while the control-plane from Phase 5 remains live
The Phase 5 control-plane is not restarted between VAL01-2 (pre-rotation) and VAL01-5 (post-rotation). This is what proves continuity for new client connections across the rotation.
4. Exact Scenarios¶
VAL01-1 — Expiry Detection¶
Purpose: Confirm that cert list --expiring-within-days identifies a certificate
about to expire.
Setup: node-c.edge.local was issued with --validity-days 2 so its expiry is
approximately now + 48 hours, well within the 5-day window.
Action:
AUTONOMY_RBAC_ENFORCEMENT=0 \
autonomy cert list \
--cert-file /tmp/.../node-c.crt \
--expiring-within-days 5
Evidence file: autonomy/cert-rotation-list-expiring.txt
Pass criterion: File contains the string expiring or the identity node-c.
VAL01-2 — Pre-rotation mTLS Connection¶
Purpose: Baseline — the 2-day cert is accepted by the live control-plane before rotation.
Action:
curl -fsS \
--cacert /tmp/.../ca.crt \
--cert /tmp/.../node-c.crt \
--key /tmp/.../node-c.key \
https://localhost:18443/v1/health
Evidence file: autonomy/cert-rotation-prerotate-health.json
Pass criterion: File contains "status":"ok".
VAL01-3 — Rotation Timing (Bounded Downtime)¶
Purpose: Prove that the rotation operation itself completes within the 5-minute practical bound. (Actual elapsed time is sub-second; the 300-second bound is a formal SLA floor rather than a tight measurement target.)
Action:
rotation_start=$(date +%s)
AUTONOMY_RBAC_ENFORCEMENT=0 \
autonomy cert rotate \
--cert-file /tmp/.../node-c.crt \
--key-file /tmp/.../node-c.key \
--ca-cert /tmp/.../ca.crt \
--ca-key /tmp/.../ca.key
rotation_end=$(date +%s)
rotation_elapsed=$((rotation_end - rotation_start))
printf "rotation_elapsed_seconds=%d\nbound_seconds=300\npass=%s\n" \
"$rotation_elapsed" \
"$([ "$rotation_elapsed" -le 300 ] && echo true || echo false)"
Evidence files:
autonomy/cert-rotation-rotate.txt—cert rotatestdoutautonomy/cert-rotation-audit-rotate.log— slog line withcert.rotatedautonomy/cert-rotation-timing.txt— elapsed + bound + pass flag
Pass criterion: cert-rotation-timing.txt contains pass=true.
VAL01-4 — Expiry Window Cleared¶
Purpose: After rotation to 90-day validity the cert no longer appears in the 5-day expiry window.
Action:
AUTONOMY_RBAC_ENFORCEMENT=0 \
autonomy cert list \
--cert-file /tmp/.../node-c.crt \
--expiring-within-days 5
Evidence file: autonomy/cert-rotation-list-after.txt
Pass criterion: File contains no certificates matched.
VAL01-5 — Post-rotation mTLS (Zero-Downtime Claim)¶
Purpose: The core claim. The new cert is accepted by the same, unmodified control-plane process immediately after rotation, with no restart.
Action:
curl -fsS \
--cacert /tmp/.../ca.crt \
--cert /tmp/.../node-c.crt \
--key /tmp/.../node-c.key \
https://localhost:18443/v1/health
Evidence file: autonomy/cert-rotation-postrotate-health.json
Pass criterion: File contains "status":"ok".
Note: the pre-rotation and post-rotation curl calls use the same file paths. The
control-plane process is unchanged; the new curl invocation rereads the rotated
client cert and key from those paths and establishes a fresh mTLS connection
without any explicit restart or reload step.
VAL01-6 — Audit Event Captured¶
Purpose: Prove the cert.rotated event is written to the retained audit store,
providing an operator-queryable record of every rotation.
Action:
AUTONOMY_RBAC_ENFORCEMENT=0 \
autonomy audit query \
--audit-dir "$AUTONOMY_AUDIT_DIR" \
--event-type cert.rotated \
--output json
Evidence file: autonomy/cert-rotation-audit-events.json
Pass criterion: File contains the string cert.rotated.
Serial Assertion (supplementary)¶
Not a scored check, but captured as additional proof that a new keypair was actually generated (not a no-op or timestamp-only renewal):
# Before rotation
openssl x509 -in node-c.crt -noout -subject -serial -dates \
> cert-rotation-before-dates.txt
# After rotation
openssl x509 -in node-c.crt -noout -subject -serial -dates \
> cert-rotation-after-dates.txt
Evidence files:
autonomy/cert-rotation-before-dates.txtautonomy/cert-rotation-after-dates.txt
Criterion: before_serial ≠ after_serial (both non-empty). Reported in the
composite report as serials_differ=true.
5. Evidence Files¶
All files are written to $EVIDENCE_DIR/autonomy/ by the lab runner.
File |
Produced by |
Contains |
|---|---|---|
|
|
Identity row with |
|
|
|
|
|
Baseline serial, notBefore, notAfter |
|
|
|
|
|
|
|
|
Structured log line: |
|
|
Post-rotation serial, notBefore, notAfter |
|
|
|
|
|
|
|
|
JSON array with |
|
Composite report written by Phase 8 |
6-check PASS/FAIL + serial assertion |
6. Pass/Fail Criteria¶
Check ID |
Name |
File |
Pass condition |
|---|---|---|---|
VAL01-1 |
expiry_detection |
|
contains |
VAL01-2 |
prerotate_connect |
|
contains |
VAL01-3 |
rotation_timing |
|
contains |
VAL01-4 |
expiry_cleared |
|
contains |
VAL01-5 |
postrotate_connect |
|
contains |
VAL01-6 |
audit_captured |
|
contains |
Overall pass: all 6 checks pass and cert-rotation-val01-report.txt reports 6/6 checks PASS.
Serial assertion: serials_differ=true in the report is expected but not a gate — it
confirms a new keypair was issued; mismatch would indicate a cert rotate implementation
regression.
Failure handling:
VAL01-2 or VAL01-5 fails (no
"status":"ok"): the control-plane from Phase 5 may have exited; checkcert-rotation-audit-rotate.logfor errors and re-run the full cert labVAL01-3 fails (
pass=false): impossible in practice (rotation is sub-second); indicates a system overload or clock skew issue in the CI environmentVAL01-4 fails (expiry not cleared):
cert rotatemay have been called without--validity-days; the default is 90 days — verify the rotate command output incert-rotation-rotate.txtVAL01-6 fails (no
cert.rotated): audit dir may not match; checkAUTONOMY_AUDIT_DIRis set to$RETAINED_EVIDENCE_DIR/storebefore the query
7. Deferred Coverage Matrix¶
The table below closes the remaining coverage ambiguity by stating exactly which rotation-adjacent cases are still outside VAL-01, whether a later committed VAL slice exists, and what kind of future work would be required to validate them.
Area |
Covered by VAL-01? |
Later committed VAL? |
Current status |
|---|---|---|---|
CA rotation |
No |
No |
Product workflow not implemented; requires manual CA replacement and leaf re-issuance |
Server-certificate hot reload |
No |
No |
Product behavior not implemented; control-plane TLS cert is loaded at startup |
In-flight request continuity |
No |
No |
Not validated; would require a streaming or long-lived connection harness |
Coordinated multi-node rotation |
No |
No |
Runbook guidance exists, but no committed validation slice exercises clustered rotation |
Trust-chain acceptance/rejection after cert changes |
Partially adjacent |
Yes, |
Planned separately as trust-chain rejection validation, not as rotation continuity validation |
The safest operator interpretation today is:
Treat VAL-01 as proof of bounded client-certificate rotation for fresh mTLS connections against a live control-plane.
Treat the rows above as open validation gaps unless and until the product gains new capabilities or a later validation slice is explicitly added and merged.
8. Report Template¶
The composite report written to autonomy/cert-rotation-val01-report.txt follows this
format:
# VAL 01 — Certificate Rotation Validation Report
timestamp: 2026-03-19T10:00:00Z
## Results
VAL01-1 expiry_detection: PASS
VAL01-2 prerotate_connect: PASS
VAL01-3 rotation_timing: PASS (elapsed=0s bound=300s)
VAL01-4 expiry_cleared: PASS
VAL01-5 postrotate_connect: PASS
VAL01-6 audit_captured: PASS
## Serial assertion
before_serial=3e8
after_serial=3e9
serials_differ=true
A run is green when:
All six lines end in
PASSserials_differ=true
The runner also prints VAL 01: 6/6 checks PASS (report: cert-rotation-val01-report.txt) to
stdout so CI log scanners can grep for failure without parsing the report file.
9. How to Run¶
Phase 8 executes automatically as part of run_cert_lab() when the full lab is run:
export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local
bash scripts/labs/run_cli_audit_lab.sh
The composite report is printed to stdout as part of the cert lab output. All evidence
files land in $EVIDENCE_DIR/autonomy/ (default:
evidence/pr17-cli-audit-local-2026-03-17/autonomy/).
To inspect results after a run:
# Quick pass/fail
cat evidence/pr17-cli-audit-local-2026-03-17/autonomy/cert-rotation-val01-report.txt
# Verify zero-downtime claim
diff \
evidence/pr17-cli-audit-local-2026-03-17/autonomy/cert-rotation-before-dates.txt \
evidence/pr17-cli-audit-local-2026-03-17/autonomy/cert-rotation-after-dates.txt
# Serial lines must differ
# Confirm audit record
jq '.[0] | {event, actor, resource, outcome}' \
evidence/pr17-cli-audit-local-2026-03-17/autonomy/cert-rotation-audit-events.json