Tutorial 04 — OS Replacement Survival and Mission Runtime Reconstruction¶
Objective: Demonstrate that the edge daemon survives a full OS replacement — including a kernel update — by detecting the change deterministically, re-verifying the runtime manifest, and restoring operation with an incremented BootEpoch. All state on the persistent partition is preserved exactly; only the ephemeral rootfs changes.
What you will demonstrate:
Observe the state root directory layout and understand the persistence contract
Run
edged precheckon first boot (BootEpoch = 0 initialised)Simulate an OS replacement by tampering the stored fingerprint
Watch precheck detect the OS update, verify the manifest signature, and reconstruct
Confirm BootEpoch increments and epoch evidence is written
Trigger each fail-closed exit code (1/2/3/5) and observe the log output
Run the
TestOSUpdateSimulationunit test and read the assertions
Time: ~20 minutes
Architecture¶
Persistent partition (survives OS replacement):
{state_root}/
├── bootstrap/
│ └── os_fingerprint.json ← CompositeHash, BootEpoch, CapturedAt
├── identity/
│ └── manifest-verify.pub ← 32-byte Ed25519 public key (raw)
├── runtime/
│ ├── binaries/ ← ONLY source allowed for copy_binary ops (GAP-9)
│ │ └── edged ← (deployed by operator before first run)
│ ├── manifest.json ← {version, binary_path, operations:[...]}
│ └── manifest.json.sig ← 64-byte Ed25519 signature (raw)
├── epoch/
│ ├── current/
│ │ └── evidence.json ← OSFingerprint + BinaryHash + RotatedAt
│ └── previous/ ← prior epoch (overwritten on each rotation)
└── edge/
├── segments/ ← content-addressed segment blobs
└── relay/
└── relay.db ← BoltDB outbound relay ledger
Ephemeral rootfs (wiped on OS replacement):
/etc/os-release, /proc/version, kernel modules, libc, etc.
Recovery invariant: the persistent partition is never wiped during an OS update.
Only the ephemeral rootfs changes. edged precheck (a systemd oneshot service) runs
first, detects the change, and restores the runtime before edged.service starts.
Evidence:
edge/stateroot/stateroot.go(path layout,CheckMount,Validate),edge/bootstrap/bootstrap.go(Bootstrap,OSUpdateDetected),edge/cmd/edged/main.go:runPrecheck
OS Fingerprint¶
The fingerprint is a BLAKE3 hash over three inputs in order:
BLAKE3(
/etc/os-release bytes ← OS name, version, ID
+ /usr/lib/os-release bytes ← vendor supplement (optional; skipped if absent)
+ uname().Release bytes ← kernel version string ← GAP-B4
)
A kernel-only update (same distro, different kernel) changes uname().Release →
different CompositeHash → reconstruction triggered. This is GAP-B4: kernel changes are
not invisible to the fingerprint.
Stored in: {state_root}/bootstrap/os_fingerprint.json
{
"composite_hash": "a3f...d91",
"os_release": "ID=ubuntu\nVERSION_ID=24.04\n...",
"uname_release": "6.8.0-101-generic",
"boot_epoch": 3,
"captured_at": "2026-03-04T10:00:00Z"
}
Evidence:
edge/bootstrap/osfingerprint.go:CaptureOSFingerprint(),edge/bootstrap/bootstrap_test.go:TestFingerprintChange_KernelOnly_TriggersReconstruction
Precheck Exit Codes¶
edged precheck is a systemd oneshot that must succeed before edged.service starts:
edged-precheck.service (Type=oneshot, RemainAfterExit=yes)
|
└─ Requires= / Before=
|
edged.service
Exit code |
Meaning |
What to check |
|---|---|---|
|
Clean: first-run init, fingerprint match, or reconstruction succeeded |
Normal operation |
|
Config parse or validation error |
|
|
StateRoot validation error (bad path, ephemeral FS) |
Mount type, path |
|
mTLS cert unreadable or expires within 7 days |
Cert rotation |
|
Reconstruction failed — |
Manifest, signature, or install op |
Evidence:
edge/cmd/edged/main.go:precheckCmd()(exit code doc comment),edge/cmd/edged/precheck_test.go(9 test cases, one per code/path)
Step 0: Set Up a Simulation State Root¶
All steps use a temporary directory to simulate the persistent partition. No Docker required — the mount check is bypassed by environment variable (test-only escape hatch).
export PATH=/home/ubuntu/go/bin:$PATH
export EDGE_STATE_ROOT_SKIP_MOUNT_CHECK=1
STATE_ROOT=$(mktemp -d /tmp/edge-state-XXXX)
echo "State root: $STATE_ROOT"
Generate an Ed25519 key pair for manifest signing:
# Generate raw 32-byte public key and 64-byte private key using Go (stdlib, no openssl)
cat > /tmp/gen_ed25519.go << 'EOF'
package main
import (
"crypto/ed25519"
"crypto/rand"
"os"
)
func main() {
pub, priv, _ := ed25519.GenerateKey(rand.Reader)
os.WriteFile("/tmp/manifest-verify.pub", []byte(pub), 0600)
os.WriteFile("/tmp/manifest-verify.key", []byte(priv), 0600)
}
EOF
cd /tmp && go run gen_ed25519.go
echo "Ed25519 key pair written to /tmp/manifest-verify.{pub,key}"
Build edged:
cd /home/ubuntu/vsc_workstation/autonomyops/edge
GOWORK=off go build -o /tmp/edged ./cmd/edged
echo "edged built: $(/tmp/edged version)"
Create the minimum state root structure and write a minimal edge.toml:
mkdir -p \
"$STATE_ROOT/bootstrap" \
"$STATE_ROOT/identity" \
"$STATE_ROOT/runtime/binaries" \
"$STATE_ROOT/epoch" \
"$STATE_ROOT/edge/segments" \
"$STATE_ROOT/edge/relay"
# Copy the manifest verify key
cp /tmp/manifest-verify.pub "$STATE_ROOT/identity/manifest-verify.pub"
# Write a minimal config (storage.local_root does not need to be STATE_ROOT)
STORAGE_ROOT=$(mktemp -d /tmp/edge-segments-XXXX)
cat > /tmp/edge.toml << EOF
schema: v1
identity:
edge_domain_id: tutorial-node-01
fleet_salt: "abababababababababababababababababababababababababababababababababab"
storage:
local_root: "$STORAGE_ROOT"
disk_ceiling_bytes: 1073741824
eviction_threshold_fraction: 0.85
assurance:
mode: baseline
memory_ceiling_bytes: 536870912
transport:
listen_addr: ":17300"
cert_file: "/tmp/node.crt"
key_file: "/tmp/node.key"
ca_file: "/tmp/node.crt"
handshake_timeout_seconds: 10
quota:
max_active_peers: 4
ingest_bytes_per_peer_per_window: 1048576
window_seconds: 60
retry:
max_retry_count: 3
backoff_base_seconds: 5
window_seconds: 60
scheduler:
max_concurrent_relays: 2
schedule_interval_seconds: 5
max_segments_per_scheduling_round: 8
tie_break_comparator: lex_segment_id
eviction:
policy: lru_priority
max_eviction_batch_size: 8
retention_window_seconds: 3600
metrics:
log_level: info
log_format: json
state_root:
root: "$STATE_ROOT"
EOF
Generate a self-signed TLS cert for the config (precheck checks cert expiry):
# Quick self-signed cert using openssl
openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:P-256 \
-keyout /tmp/node.key -out /tmp/node.crt \
-days 365 -nodes \
-subj "/CN=tutorial-node-01" 2>/dev/null
echo "TLS cert valid until: $(openssl x509 -noout -enddate -in /tmp/node.crt)"
Step 1: First-Run Precheck (BootEpoch = 0)¶
On the very first run, no fingerprint exists. precheck captures the live OS fingerprint,
stores it with BootEpoch = 0, and exits 0.
/tmp/edged --config /tmp/edge.toml precheck
echo "Exit code: $?"
Expected log output:
level=INFO msg="edge.os.fingerprint_initialized" composite_hash="<blake3-64-hex>"
Inspect the stored fingerprint:
cat "$STATE_ROOT/bootstrap/os_fingerprint.json" | python3 -m json.tool
Expected (values vary by host):
{
"composite_hash": "a3f...d91",
"os_release": "ID=ubuntu\nVERSION_ID=24.04\n...",
"uname_release": "6.8.0-101-generic",
"boot_epoch": 0,
"captured_at": "2026-..."
}
Run precheck a second time — fingerprint matches, exits 0 silently:
/tmp/edged --config /tmp/edge.toml precheck
echo "Exit code (must be 0): $?"
Step 2: Simulate OS Replacement¶
Tamper the stored fingerprint hash (simulates the OS having been replaced):
python3 - << 'EOF'
import json, pathlib, os
fp_path = pathlib.Path(os.environ["STATE_ROOT"]) / "bootstrap/os_fingerprint.json"
fp = json.loads(fp_path.read_text())
print(f"Before: composite_hash={fp['composite_hash'][:16]}... boot_epoch={fp['boot_epoch']}")
fp["composite_hash"] = "0000000000000000000000000000000000000000000000000000000000000000"
fp_path.write_text(json.dumps(fp))
print(f"After: composite_hash=000...000 (zeroed — simulates OS replacement)")
EOF
Create a minimal signed runtime manifest:
# Write the manifest
cat > "$STATE_ROOT/runtime/manifest.json" << 'EOF2'
{"version":"1","binary_path":"edged","operations":[]}
EOF2
# Sign it (raw Ed25519 signature)
cat > /tmp/sign_manifest.go << 'EOF3'
package main
import (
"crypto/ed25519"
"os"
)
func main() {
key, _ := os.ReadFile("/tmp/manifest-verify.key")
data, _ := os.ReadFile(os.Args[1])
sig := ed25519.Sign(ed25519.PrivateKey(key), data)
os.WriteFile(os.Args[1]+".sig", sig, 0600)
}
EOF3
cd /tmp && go run sign_manifest.go "$STATE_ROOT/runtime/manifest.json"
echo "Manifest signed: $STATE_ROOT/runtime/manifest.json.sig ($(wc -c < "$STATE_ROOT/runtime/manifest.json.sig") bytes)"
Step 3: Reconstruction Run¶
Run precheck with the tampered fingerprint. The runtime will:
Detect hash mismatch →
edge.os.update_detectedVerify manifest Ed25519 signature
Execute reconstruction (empty operations list)
Increment BootEpoch → save fingerprint
Rotate epoch evidence
/tmp/edged --config /tmp/edge.toml precheck
echo "Exit code (must be 0): $?"
Expected log output:
level=WARN msg="edge.os.update_detected" previous_hash="000...000" current_hash="a3f...d91"
level=INFO msg="edge.os.reconstruction_started"
level=INFO msg="edge.os.reconstruction_completed" duration_seconds=0.001 boot_epoch=1
Inspect the updated fingerprint:
cat "$STATE_ROOT/bootstrap/os_fingerprint.json" | python3 -c "
import sys, json
fp = json.load(sys.stdin)
print(f'boot_epoch = {fp[\"boot_epoch\"]}')
print(f'composite_hash= {fp[\"composite_hash\"][:32]}...')
"
Expected: boot_epoch = 1
Step 4: Epoch Evidence¶
After each successful reconstruction, RotateEpoch writes an evidence bundle to
{state_root}/epoch/current/evidence.json. This captures:
The OS fingerprint at reconstruction time
BLAKE3 hash of the
edgedbinary (fromBinariesDir)BLAKE3 hash of
bootstrap/config.yamlrotated_attimestamp
cat "$STATE_ROOT/epoch/current/evidence.json" | python3 -m json.tool
Expected:
{
"os_fingerprint": {
"composite_hash": "a3f...d91",
"boot_epoch": 1,
...
},
"binary_hash": "",
"config_hash": "",
"rotated_at": "2026-..."
}
Note:
binary_hashandconfig_hashare empty when the source files don’t exist in the simulated state root. In production,edgedandconfig.yamllive inBinariesDirandBootstrapDirrespectively.
Crash-safe rotation sequence (epoch.RotateEpoch):
1. Write evidence.json to epoch_<nanoseconds>.tmp/
2. fsync the tmp directory
3. Rename current/ → previous/ (best-effort; non-fatal)
4. Rename epoch_<N>.tmp/ → current/ ← atomic commit point
5. fsync epoch parent dir (power-loss durability)
If the process crashes before step 4, bootstrap.Bootstrap() on the next run
cleans orphaned *.tmp directories.
Evidence:
edge/epoch/epoch.go:RotateEpoch(),edge/cmd/edged/precheck_test.go:TestPrecheck_EpochRotatedAfterReconstruction
Step 5: Observe Each Fail-Closed Path¶
Exit 5 — Missing manifest verify key (fail closed):
# Remove the verify key
mv "$STATE_ROOT/identity/manifest-verify.pub" \
"$STATE_ROOT/identity/manifest-verify.pub.bak"
# Tamper fingerprint again
python3 -c "
import json, pathlib, os
fp_path = pathlib.Path(os.environ['STATE_ROOT']) / 'bootstrap/os_fingerprint.json'
fp = json.loads(fp_path.read_text())
fp['composite_hash'] = 'ffff0000'
fp_path.write_text(json.dumps(fp))
"
/tmp/edged --config /tmp/edge.toml precheck; echo "Exit: $?"
Expected:
level=ERROR msg="precheck: manifest verify key missing on reconstruction path" ...
Exit: 5
Exit 5 — Invalid manifest signature:
# Restore key, write a wrong signature
mv "$STATE_ROOT/identity/manifest-verify.pub.bak" \
"$STATE_ROOT/identity/manifest-verify.pub"
python3 -c "open('$STATE_ROOT/runtime/manifest.json.sig','wb').write(bytes(64))"
/tmp/edged --config /tmp/edge.toml precheck; echo "Exit: $?"
Expected:
level=ERROR msg="precheck: manifest signature verification failed" ...
Exit: 5
Restore and verify clean run:
# Restore the correct signature
go run /tmp/sign_manifest.go "$STATE_ROOT/runtime/manifest.json"
/tmp/edged --config /tmp/edge.toml precheck; echo "Exit: $?"
Expected: Exit: 0
Evidence:
edge/cmd/edged/precheck_test.go:TestPrecheck_ExitCode5_MissingManifestVerifyKey,TestPrecheck_InvalidSig_Exit5
Step 6: BootEpoch Increments Monotonically¶
Simulate three consecutive OS updates:
for i in 1 2 3; do
# Tamper fingerprint
python3 -c "
import json, pathlib, os
fp_path = pathlib.Path(os.environ['STATE_ROOT']) / 'bootstrap/os_fingerprint.json'
fp = json.loads(fp_path.read_text())
fp['composite_hash'] = '${i}' * 64
fp_path.write_text(json.dumps(fp))
"
# Re-sign (same manifest content, key is stable)
go run /tmp/sign_manifest.go "$STATE_ROOT/runtime/manifest.json"
/tmp/edged --config /tmp/edge.toml precheck
echo "--- OS update $i: exit=$? boot_epoch=$(python3 -c "
import json,pathlib,os
fp=json.loads(pathlib.Path(os.environ['STATE_ROOT']).joinpath('bootstrap/os_fingerprint.json').read_text())
print(fp['boot_epoch'])")"
done
Expected: boot_epoch advances from its previous value by 1 on each run.
Evidence:
edge/cmd/edged/precheck_test.go:TestPrecheck_BootEpochIncrement(three consecutive reconstructions)
Step 7: Unit Test Walkthrough — TestOSUpdateSimulation¶
The unit test exercises the full reconstruct path in-process (no binary, no config file, no TLS cert):
cd /home/ubuntu/vsc_workstation/autonomyops/edge
GOWORK=off EDGE_STATE_ROOT_SKIP_MOUNT_CHECK=1 \
go test ./bootstrap/... -run TestOSUpdateSimulation -v
Expected:
=== RUN TestOSUpdateSimulation
--- PASS: TestOSUpdateSimulation (0.00s)
What the test asserts (in order):
Bootstrap(root)— all 7 directories createdSaveFingerprint(root, stale)— stores zeroed hashOSUpdateDetected(root)→detected=true,current.CompositeHash ≠ "000..."RunReconstruction(root, emptyManifest)→ no error (empty operations list)Save
currentwithBootEpoch = stale.BootEpoch + 1LoadFingerprint(root)→final.BootEpoch == 1OSUpdateDetected(root)→detected=false(hashes now match)
Evidence:
edge/bootstrap/osupdate_test.go:TestOSUpdateSimulation
Step 8: GAP-9 — Typed Operations, No Shell¶
RunReconstruction only executes two operation types. Arbitrary shell commands are
rejected at the validation layer:
# Attempt a manifest with an unsupported operation type
ESCAPE_MANIFEST=$(mktemp)
echo '{"version":"1","operations":[{"op":"exec_shell","src":"/bin/sh","dst":"/tmp/x"}]}' \
> "$ESCAPE_MANIFEST"
# Validation failure (no binary needed — test the library directly)
cd /home/ubuntu/vsc_workstation/autonomyops/edge
GOWORK=off go test ./bootstrap/... -run TestRunReconstruction -v
Expected output includes:
--- PASS: TestRunReconstruction_SrcOutsideBinaries_Rejected
--- PASS: TestRunReconstruction_DstOutsideStateRoot_Rejected
--- PASS: TestRunReconstruction_HashMismatch_Rejected
Supported operations and their constraints:
Op |
Src constraint |
Dst constraint |
Extra check |
|---|---|---|---|
|
Must be under |
Must be under |
BLAKE3 hash match (if |
|
Must be under |
Must be under |
File readable (no parse) |
Shell metacharacters (;, |, $, `, &, >, <, etc.) in either path
cause immediate rejection.
Evidence:
edge/bootstrap/install.go:ValidateInstallOperation(),edge/bootstrap/install.go:shellMetachars
What Just Happened¶
Walked the full state root directory layout and understood the persistence contract
Demonstrated first-run fingerprint initialisation (
BootEpoch = 0)Simulated an OS replacement by zeroing the stored CompositeHash
Observed precheck detect the change, verify the Ed25519 manifest signature, execute reconstruction (empty operations), increment BootEpoch, and write epoch evidence
Triggered each fail-closed exit code (5 = missing key, 5 = bad signature)
Confirmed BootEpoch increments monotonically across consecutive OS updates
Ran
TestOSUpdateSimulationand read each assertion
Evidence Links¶
Claim |
File |
Symbol |
|---|---|---|
State root directory layout |
|
Package doc, path helpers |
OS fingerprint (BLAKE3) |
|
|
GAP-B4 (kernel change triggers reconstruction) |
|
|
Fingerprint save/load |
|
|
OSUpdateDetected |
|
|
Precheck exit codes |
|
|
Manifest Ed25519 signature |
|
|
TOCTOU closure (GAP-B1) |
|
|
GAP-9 typed ops |
|
|
Epoch crash-safe rotation |
|
|
Full precheck tests |
|
9 test cases |
OS update simulation |
|
|
Next Tutorial¶
Tutorial 05 — Portability: Run Everywhere (amd64, arm64, riscv64)