Tutorial 04 — OS Replacement Survival and Mission Runtime Reconstruction

Objective: Demonstrate that the edge daemon survives a full OS replacement — including a kernel update — by detecting the change deterministically, re-verifying the runtime manifest, and restoring operation with an incremented BootEpoch. All state on the persistent partition is preserved exactly; only the ephemeral rootfs changes.

What you will demonstrate:

  • Observe the state root directory layout and understand the persistence contract

  • Run edged precheck on first boot (BootEpoch = 0 initialised)

  • Simulate an OS replacement by tampering the stored fingerprint

  • Watch precheck detect the OS update, verify the manifest signature, and reconstruct

  • Confirm BootEpoch increments and epoch evidence is written

  • Trigger each fail-closed exit code (1/2/3/5) and observe the log output

  • Run the TestOSUpdateSimulation unit test and read the assertions

Time: ~20 minutes


Architecture

Persistent partition (survives OS replacement):
  {state_root}/
  ├── bootstrap/
  │   └── os_fingerprint.json   ← CompositeHash, BootEpoch, CapturedAt
  ├── identity/
  │   └── manifest-verify.pub   ← 32-byte Ed25519 public key (raw)
  ├── runtime/
  │   ├── binaries/             ← ONLY source allowed for copy_binary ops (GAP-9)
  │   │   └── edged             ← (deployed by operator before first run)
  │   ├── manifest.json         ← {version, binary_path, operations:[...]}
  │   └── manifest.json.sig     ← 64-byte Ed25519 signature (raw)
  ├── epoch/
  │   ├── current/
  │   │   └── evidence.json     ← OSFingerprint + BinaryHash + RotatedAt
  │   └── previous/             ← prior epoch (overwritten on each rotation)
  └── edge/
      ├── segments/             ← content-addressed segment blobs
      └── relay/
          └── relay.db          ← BoltDB outbound relay ledger

Ephemeral rootfs (wiped on OS replacement):
  /etc/os-release, /proc/version, kernel modules, libc, etc.

Recovery invariant: the persistent partition is never wiped during an OS update. Only the ephemeral rootfs changes. edged precheck (a systemd oneshot service) runs first, detects the change, and restores the runtime before edged.service starts.

Evidence: edge/stateroot/stateroot.go (path layout, CheckMount, Validate), edge/bootstrap/bootstrap.go (Bootstrap, OSUpdateDetected), edge/cmd/edged/main.go:runPrecheck


OS Fingerprint

The fingerprint is a BLAKE3 hash over three inputs in order:

BLAKE3(
  /etc/os-release bytes               ← OS name, version, ID
  + /usr/lib/os-release bytes         ← vendor supplement (optional; skipped if absent)
  + uname().Release bytes             ← kernel version string  ← GAP-B4
)

A kernel-only update (same distro, different kernel) changes uname().Release → different CompositeHash → reconstruction triggered. This is GAP-B4: kernel changes are not invisible to the fingerprint.

Stored in: {state_root}/bootstrap/os_fingerprint.json

{
  "composite_hash": "a3f...d91",
  "os_release":     "ID=ubuntu\nVERSION_ID=24.04\n...",
  "uname_release":  "6.8.0-101-generic",
  "boot_epoch":     3,
  "captured_at":    "2026-03-04T10:00:00Z"
}

Evidence: edge/bootstrap/osfingerprint.go:CaptureOSFingerprint(), edge/bootstrap/bootstrap_test.go:TestFingerprintChange_KernelOnly_TriggersReconstruction


Precheck Exit Codes

edged precheck is a systemd oneshot that must succeed before edged.service starts:

edged-precheck.service  (Type=oneshot, RemainAfterExit=yes)
  |
  └─ Requires= / Before=
  |
edged.service

Exit code

Meaning

What to check

0

Clean: first-run init, fingerprint match, or reconstruction succeeded

Normal operation

1

Config parse or validation error

edge.toml syntax/schema

2

StateRoot validation error (bad path, ephemeral FS)

Mount type, path

3

mTLS cert unreadable or expires within 7 days

Cert rotation

5

Reconstruction failed — edged MUST NOT start

Manifest, signature, or install op

Evidence: edge/cmd/edged/main.go:precheckCmd() (exit code doc comment), edge/cmd/edged/precheck_test.go (9 test cases, one per code/path)


Step 0: Set Up a Simulation State Root

All steps use a temporary directory to simulate the persistent partition. No Docker required — the mount check is bypassed by environment variable (test-only escape hatch).

export PATH=/home/ubuntu/go/bin:$PATH
export EDGE_STATE_ROOT_SKIP_MOUNT_CHECK=1

STATE_ROOT=$(mktemp -d /tmp/edge-state-XXXX)
echo "State root: $STATE_ROOT"

Generate an Ed25519 key pair for manifest signing:

# Generate raw 32-byte public key and 64-byte private key using Go (stdlib, no openssl)
cat > /tmp/gen_ed25519.go << 'EOF'
package main
import (
    "crypto/ed25519"
    "crypto/rand"
    "os"
)
func main() {
    pub, priv, _ := ed25519.GenerateKey(rand.Reader)
    os.WriteFile("/tmp/manifest-verify.pub", []byte(pub), 0600)
    os.WriteFile("/tmp/manifest-verify.key", []byte(priv), 0600)
}
EOF
cd /tmp && go run gen_ed25519.go
echo "Ed25519 key pair written to /tmp/manifest-verify.{pub,key}"

Build edged:

cd /home/ubuntu/vsc_workstation/autonomyops/edge
GOWORK=off go build -o /tmp/edged ./cmd/edged
echo "edged built: $(/tmp/edged version)"

Create the minimum state root structure and write a minimal edge.toml:

mkdir -p \
  "$STATE_ROOT/bootstrap" \
  "$STATE_ROOT/identity" \
  "$STATE_ROOT/runtime/binaries" \
  "$STATE_ROOT/epoch" \
  "$STATE_ROOT/edge/segments" \
  "$STATE_ROOT/edge/relay"

# Copy the manifest verify key
cp /tmp/manifest-verify.pub "$STATE_ROOT/identity/manifest-verify.pub"

# Write a minimal config (storage.local_root does not need to be STATE_ROOT)
STORAGE_ROOT=$(mktemp -d /tmp/edge-segments-XXXX)
cat > /tmp/edge.toml << EOF
schema: v1
identity:
  edge_domain_id: tutorial-node-01
  fleet_salt: "abababababababababababababababababababababababababababababababababab"
storage:
  local_root: "$STORAGE_ROOT"
  disk_ceiling_bytes: 1073741824
  eviction_threshold_fraction: 0.85
assurance:
  mode: baseline
  memory_ceiling_bytes: 536870912
transport:
  listen_addr: ":17300"
  cert_file: "/tmp/node.crt"
  key_file: "/tmp/node.key"
  ca_file: "/tmp/node.crt"
  handshake_timeout_seconds: 10
quota:
  max_active_peers: 4
  ingest_bytes_per_peer_per_window: 1048576
  window_seconds: 60
retry:
  max_retry_count: 3
  backoff_base_seconds: 5
  window_seconds: 60
scheduler:
  max_concurrent_relays: 2
  schedule_interval_seconds: 5
  max_segments_per_scheduling_round: 8
  tie_break_comparator: lex_segment_id
eviction:
  policy: lru_priority
  max_eviction_batch_size: 8
  retention_window_seconds: 3600
metrics:
  log_level: info
  log_format: json
state_root:
  root: "$STATE_ROOT"
EOF

Generate a self-signed TLS cert for the config (precheck checks cert expiry):

# Quick self-signed cert using openssl
openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:P-256 \
  -keyout /tmp/node.key -out /tmp/node.crt \
  -days 365 -nodes \
  -subj "/CN=tutorial-node-01" 2>/dev/null
echo "TLS cert valid until: $(openssl x509 -noout -enddate -in /tmp/node.crt)"

Step 1: First-Run Precheck (BootEpoch = 0)

On the very first run, no fingerprint exists. precheck captures the live OS fingerprint, stores it with BootEpoch = 0, and exits 0.

/tmp/edged --config /tmp/edge.toml precheck
echo "Exit code: $?"

Expected log output:

level=INFO msg="edge.os.fingerprint_initialized" composite_hash="<blake3-64-hex>"

Inspect the stored fingerprint:

cat "$STATE_ROOT/bootstrap/os_fingerprint.json" | python3 -m json.tool

Expected (values vary by host):

{
  "composite_hash": "a3f...d91",
  "os_release": "ID=ubuntu\nVERSION_ID=24.04\n...",
  "uname_release": "6.8.0-101-generic",
  "boot_epoch": 0,
  "captured_at": "2026-..."
}

Run precheck a second time — fingerprint matches, exits 0 silently:

/tmp/edged --config /tmp/edge.toml precheck
echo "Exit code (must be 0): $?"

Step 2: Simulate OS Replacement

Tamper the stored fingerprint hash (simulates the OS having been replaced):

python3 - << 'EOF'
import json, pathlib, os

fp_path = pathlib.Path(os.environ["STATE_ROOT"]) / "bootstrap/os_fingerprint.json"
fp = json.loads(fp_path.read_text())
print(f"Before: composite_hash={fp['composite_hash'][:16]}... boot_epoch={fp['boot_epoch']}")
fp["composite_hash"] = "0000000000000000000000000000000000000000000000000000000000000000"
fp_path.write_text(json.dumps(fp))
print(f"After:  composite_hash=000...000 (zeroed — simulates OS replacement)")
EOF

Create a minimal signed runtime manifest:

# Write the manifest
cat > "$STATE_ROOT/runtime/manifest.json" << 'EOF2'
{"version":"1","binary_path":"edged","operations":[]}
EOF2

# Sign it (raw Ed25519 signature)
cat > /tmp/sign_manifest.go << 'EOF3'
package main
import (
    "crypto/ed25519"
    "os"
)
func main() {
    key, _ := os.ReadFile("/tmp/manifest-verify.key")
    data, _ := os.ReadFile(os.Args[1])
    sig := ed25519.Sign(ed25519.PrivateKey(key), data)
    os.WriteFile(os.Args[1]+".sig", sig, 0600)
}
EOF3
cd /tmp && go run sign_manifest.go "$STATE_ROOT/runtime/manifest.json"
echo "Manifest signed: $STATE_ROOT/runtime/manifest.json.sig ($(wc -c < "$STATE_ROOT/runtime/manifest.json.sig") bytes)"

Step 3: Reconstruction Run

Run precheck with the tampered fingerprint. The runtime will:

  1. Detect hash mismatch → edge.os.update_detected

  2. Verify manifest Ed25519 signature

  3. Execute reconstruction (empty operations list)

  4. Increment BootEpoch → save fingerprint

  5. Rotate epoch evidence

/tmp/edged --config /tmp/edge.toml precheck
echo "Exit code (must be 0): $?"

Expected log output:

level=WARN msg="edge.os.update_detected" previous_hash="000...000" current_hash="a3f...d91"
level=INFO msg="edge.os.reconstruction_started"
level=INFO msg="edge.os.reconstruction_completed" duration_seconds=0.001 boot_epoch=1

Inspect the updated fingerprint:

cat "$STATE_ROOT/bootstrap/os_fingerprint.json" | python3 -c "
import sys, json
fp = json.load(sys.stdin)
print(f'boot_epoch    = {fp[\"boot_epoch\"]}')
print(f'composite_hash= {fp[\"composite_hash\"][:32]}...')
"

Expected: boot_epoch = 1


Step 4: Epoch Evidence

After each successful reconstruction, RotateEpoch writes an evidence bundle to {state_root}/epoch/current/evidence.json. This captures:

  • The OS fingerprint at reconstruction time

  • BLAKE3 hash of the edged binary (from BinariesDir)

  • BLAKE3 hash of bootstrap/config.yaml

  • rotated_at timestamp

cat "$STATE_ROOT/epoch/current/evidence.json" | python3 -m json.tool

Expected:

{
  "os_fingerprint": {
    "composite_hash": "a3f...d91",
    "boot_epoch": 1,
    ...
  },
  "binary_hash": "",
  "config_hash": "",
  "rotated_at": "2026-..."
}

Note: binary_hash and config_hash are empty when the source files don’t exist in the simulated state root. In production, edged and config.yaml live in BinariesDir and BootstrapDir respectively.

Crash-safe rotation sequence (epoch.RotateEpoch):

1. Write evidence.json to epoch_<nanoseconds>.tmp/
2. fsync the tmp directory
3. Rename current/ → previous/  (best-effort; non-fatal)
4. Rename epoch_<N>.tmp/ → current/  ← atomic commit point
5. fsync epoch parent dir (power-loss durability)

If the process crashes before step 4, bootstrap.Bootstrap() on the next run cleans orphaned *.tmp directories.

Evidence: edge/epoch/epoch.go:RotateEpoch(), edge/cmd/edged/precheck_test.go:TestPrecheck_EpochRotatedAfterReconstruction


Step 5: Observe Each Fail-Closed Path

Exit 5 — Missing manifest verify key (fail closed):

# Remove the verify key
mv "$STATE_ROOT/identity/manifest-verify.pub" \
   "$STATE_ROOT/identity/manifest-verify.pub.bak"

# Tamper fingerprint again
python3 -c "
import json, pathlib, os
fp_path = pathlib.Path(os.environ['STATE_ROOT']) / 'bootstrap/os_fingerprint.json'
fp = json.loads(fp_path.read_text())
fp['composite_hash'] = 'ffff0000'
fp_path.write_text(json.dumps(fp))
"

/tmp/edged --config /tmp/edge.toml precheck; echo "Exit: $?"

Expected:

level=ERROR msg="precheck: manifest verify key missing on reconstruction path" ...
Exit: 5

Exit 5 — Invalid manifest signature:

# Restore key, write a wrong signature
mv "$STATE_ROOT/identity/manifest-verify.pub.bak" \
   "$STATE_ROOT/identity/manifest-verify.pub"
python3 -c "open('$STATE_ROOT/runtime/manifest.json.sig','wb').write(bytes(64))"

/tmp/edged --config /tmp/edge.toml precheck; echo "Exit: $?"

Expected:

level=ERROR msg="precheck: manifest signature verification failed" ...
Exit: 5

Restore and verify clean run:

# Restore the correct signature
go run /tmp/sign_manifest.go "$STATE_ROOT/runtime/manifest.json"
/tmp/edged --config /tmp/edge.toml precheck; echo "Exit: $?"

Expected: Exit: 0

Evidence: edge/cmd/edged/precheck_test.go:TestPrecheck_ExitCode5_MissingManifestVerifyKey, TestPrecheck_InvalidSig_Exit5


Step 6: BootEpoch Increments Monotonically

Simulate three consecutive OS updates:

for i in 1 2 3; do
  # Tamper fingerprint
  python3 -c "
import json, pathlib, os
fp_path = pathlib.Path(os.environ['STATE_ROOT']) / 'bootstrap/os_fingerprint.json'
fp = json.loads(fp_path.read_text())
fp['composite_hash'] = '${i}' * 64
fp_path.write_text(json.dumps(fp))
"
  # Re-sign (same manifest content, key is stable)
  go run /tmp/sign_manifest.go "$STATE_ROOT/runtime/manifest.json"

  /tmp/edged --config /tmp/edge.toml precheck
  echo "--- OS update $i: exit=$? boot_epoch=$(python3 -c "
import json,pathlib,os
fp=json.loads(pathlib.Path(os.environ['STATE_ROOT']).joinpath('bootstrap/os_fingerprint.json').read_text())
print(fp['boot_epoch'])")"
done

Expected: boot_epoch advances from its previous value by 1 on each run.

Evidence: edge/cmd/edged/precheck_test.go:TestPrecheck_BootEpochIncrement (three consecutive reconstructions)


Step 7: Unit Test Walkthrough — TestOSUpdateSimulation

The unit test exercises the full reconstruct path in-process (no binary, no config file, no TLS cert):

cd /home/ubuntu/vsc_workstation/autonomyops/edge
GOWORK=off EDGE_STATE_ROOT_SKIP_MOUNT_CHECK=1 \
  go test ./bootstrap/... -run TestOSUpdateSimulation -v

Expected:

=== RUN   TestOSUpdateSimulation
--- PASS: TestOSUpdateSimulation (0.00s)

What the test asserts (in order):

  1. Bootstrap(root) — all 7 directories created

  2. SaveFingerprint(root, stale) — stores zeroed hash

  3. OSUpdateDetected(root)detected=true, current.CompositeHash "000..."

  4. RunReconstruction(root, emptyManifest) → no error (empty operations list)

  5. Save current with BootEpoch = stale.BootEpoch + 1

  6. LoadFingerprint(root)final.BootEpoch == 1

  7. OSUpdateDetected(root)detected=false (hashes now match)

Evidence: edge/bootstrap/osupdate_test.go:TestOSUpdateSimulation


Step 8: GAP-9 — Typed Operations, No Shell

RunReconstruction only executes two operation types. Arbitrary shell commands are rejected at the validation layer:

# Attempt a manifest with an unsupported operation type
ESCAPE_MANIFEST=$(mktemp)
echo '{"version":"1","operations":[{"op":"exec_shell","src":"/bin/sh","dst":"/tmp/x"}]}' \
  > "$ESCAPE_MANIFEST"

# Validation failure (no binary needed — test the library directly)
cd /home/ubuntu/vsc_workstation/autonomyops/edge
GOWORK=off go test ./bootstrap/... -run TestRunReconstruction -v

Expected output includes:

--- PASS: TestRunReconstruction_SrcOutsideBinaries_Rejected
--- PASS: TestRunReconstruction_DstOutsideStateRoot_Rejected
--- PASS: TestRunReconstruction_HashMismatch_Rejected

Supported operations and their constraints:

Op

Src constraint

Dst constraint

Extra check

copy_binary

Must be under BinariesDir

Must be under StateRoot

BLAKE3 hash match (if expected_hash set)

verify_config

Must be under BinariesDir

Must be under StateRoot

File readable (no parse)

Shell metacharacters (;, |, $, `, &, >, <, etc.) in either path cause immediate rejection.

Evidence: edge/bootstrap/install.go:ValidateInstallOperation(), edge/bootstrap/install.go:shellMetachars


What Just Happened

  • Walked the full state root directory layout and understood the persistence contract

  • Demonstrated first-run fingerprint initialisation (BootEpoch = 0)

  • Simulated an OS replacement by zeroing the stored CompositeHash

  • Observed precheck detect the change, verify the Ed25519 manifest signature, execute reconstruction (empty operations), increment BootEpoch, and write epoch evidence

  • Triggered each fail-closed exit code (5 = missing key, 5 = bad signature)

  • Confirmed BootEpoch increments monotonically across consecutive OS updates

  • Ran TestOSUpdateSimulation and read each assertion

Next Tutorial

Tutorial 05 — Portability: Run Everywhere (amd64, arm64, riscv64)