Tutorial 05 — Portability: Run Everywhere (amd64, arm64, riscv64)

Objective: Demonstrate that the edge runtime runs deterministically across three CPU architectures (amd64, arm64, riscv64) and two filesystem types (ext4, xfs) — with a single binary, no CGO, and no platform-specific configuration. The portability matrix is the formal evidence artefact for release certification.

What you will demonstrate:

  • Understand the portability claim and its limits (what is implemented vs. roadmap)

  • Run the native-arch portability matrix (4 cells: no QEMU required)

  • Understand the full 6-cell matrix (Docker + QEMU for cross-arch)

  • Inspect the evidence artefacts produced by each cell

  • Read the CGO-free dependency analysis

  • Run the portability CI gate (strict exit-1 mode)

Time: ~15 minutes (native-only, no QEMU)


Portability Matrix

The matrix has 6 cells: 3 architectures × 2 filesystems.

              ext4          xfs
amd64        [P] PASS      [P] PASS
arm64        [P] PASS      [P] PASS
riscv64      [P] PASS      [P] PASS

Status key:
  [P] PASS — all 4 steps passed
  [F] FAIL — at least one step failed
  [S] SKIP — arch/FS unavailable (non-native without QEMU, or xfs requires root)

Native (amd64): All cells run directly — no Docker, no QEMU, no root required for ext4. xfs requires a loop device (root) or a host xfs mount.

Cross-arch (arm64, riscv64): Cells run inside Docker containers with QEMU via binfmt_misc. If Docker or QEMU registration is absent the cell is marked SKIP, not FAIL. The CI gate does not require SKIP cells to pass.

Evidence: scripts/portability/core_matrix.sh, Makefile:portability-matrix, Makefile:portability-matrix-full


The CGO-Free Contract

All edge packages compile with CGO_ENABLED=0. The dependency tree contains no C library calls:

Package

CGO needed?

Notes

go.etcd.io/bbolt (BoltDB)

No

Pure Go; uses mmap + os.File.Sync()

github.com/zeebo/blake3

No

Pure Go (assembly acceleration optional, not required)

golang.org/x/sys/unix

No

Pure Go syscall wrappers; uname(2) uses syscall.RawSyscall

github.com/prometheus/client_golang

No

Pure Go HTTP + metric registry

github.com/spf13/cobra

No

Pure Go CLI

gopkg.in/yaml.v3

No

Pure Go YAML parser

No #cgo directives appear in any imported package. Cross-compilation is a GOARCH=<target> go build ./... command — no cross-toolchain, no sysroot.

Evidence: edge/go.mod, edge/ci/scan_prohibited/main.go (INV-10 import ban), edge/ci/scan_dependencies/main.go


What Each Cell Tests

Each matrix cell runs four steps in sequence. A single step failure fails the cell.

Step 1 — Go Unit Tests

On the native arch, runs:

  • telemetry/ module: WAL durability, safe-point recovery, OTLP drain

  • lock/ module: BLAKE3 fingerprint stability, canonical serialisation

On cross-arch cells (inside QEMU Docker image), unit tests are run if Docker/QEMU is available; otherwise this step logs a note and the other 3 steps proceed.

Step 2 — Randomised Crash Harness

TestCrashHarness_Randomized (telemetry/crash_harness_test.go) runs ITERATIONS (default: 20) rounds of:

  1. Append N entries to a WAL (N chosen by seeded PRNG)

  2. SIGKILL the writer goroutine at a random offset

  3. Open the WAL with OpenWAL (recovery path)

  4. Assert recovered entries ≤ written entries (no phantom data)

  5. Assert no sequence gaps in recovered entries

All iterations use the same seed for reproducibility. The seed is captured in the evidence bundle, so a failing run can be reproduced exactly:

make portability-crash-harness SEED=12345 ITERATIONS=100

Step 3 — Known-Good WAL Verify

TestKnownGoodWALFixture writes exactly 5 entries to a temp WAL. scripts/portability/wal_verify.py then independently parses the WAL file using Python (no Go) and asserts:

  • Each frame has the correct 4-byte big-endian length prefix

  • Each payload is valid JSON with seq, written_at, event fields

  • Sequence numbers are 1–5 (no gaps, no duplicates)

This verifies the WAL frame format is stable across architectures — if the frame layout drifts (endianness, alignment), the Python verifier catches it.

Step 4 — Atomic Rename Check

Writes a known value to target.tmp, renames atomically to target, reads back. Produces RENAME_ATOMIC: PASS or RENAME_ATOMIC: FAIL in the evidence JSON.

This is a filesystem capability gate: if atomic rename is broken on the target FS+arch combination (e.g. some network filesystems), the cell fails fast rather than producing a subtly incorrect WAL.

Evidence: scripts/portability/core_matrix.sh:run_cell() (steps 1–4), scripts/portability/crash_harness.sh


Step 0: Prerequisites

# Go
export PATH=/home/ubuntu/go/bin:$PATH
go version  # must be >= 1.23

# Python 3 (for WAL verifier)
python3 --version

# Docker (optional; only needed for arm64/riscv64 cells)
docker info 2>/dev/null | grep "Server Version" || echo "(Docker not available — non-native cells will SKIP)"

Step 1: Run the Native-Arch Matrix (No QEMU Required)

cd /home/ubuntu/vsc_workstation/autonomyops
make portability-matrix

This runs 1 or 2 cells (amd64/ext4, and amd64/xfs if available) with 20 crash harness iterations each.

Expected output (amd64 host, ext4 only):

==> portability matrix: native arch + ext4
Core Matrix seed: 4831762194857362481
Host arch: amd64  Host FS: ext4
Matrix: arches=[amd64 arm64 riscv64]  fs=[ext4 xfs]  iterations=20

─── cell [amd64/ext4] run-1 ─────────────────────
  [1/4] Go unit tests (telemetry + lock)
  [1/4] unit tests PASS
  [2/4] crash harness (seed=4831762194857362481, iter=20)
  [2/4] crash harness PASS
  [3/4] WAL verify (known-good WAL)
  [3/4] WAL verify PASS
  [4/4] atomic rename check
  [4/4] atomic rename check PASS

╔═══════════════════════════════════════════════════════════════╗
║          Core Invariant Matrix — Summary                     ║
╠═══════════════════════════════════════════════════════════════╣
║  amd64/ext4   PASS  all 4 steps passed                       ║
║  amd64/xfs    SKIP  xfs not native                           ║
║  arm64/ext4   SKIP  non-native arch (--native-only set)      ║
║  arm64/xfs    SKIP  non-native arch (--native-only set)      ║
║  riscv64/ext4 SKIP  non-native arch (--native-only set)      ║
║  riscv64/xfs  SKIP  non-native arch (--native-only set)      ║
╚═══════════════════════════════════════════════════════════════╝
==> portability matrix: native arch + ext4
PASS  (1 passed, 5 skipped, 0 failed)

Step 2: Inspect Evidence Artefacts

Each cell produces an evidence directory:

ls evidence/amd64/ext4/run-1/

Expected files:

unit_test.log            — go test output (telemetry + lock)
crash_harness.log        — crash harness output
crash_harness.json       — structured result {pass, seed, iterations, cells}
wal_verify.json          — WAL frame verification result
wal_verify_known_good.log — TestKnownGoodWALFixture output
rename_check.json        — atomic rename result

Read the structured results:

cat evidence/amd64/ext4/run-1/crash_harness.json | python3 -m json.tool
cat evidence/amd64/ext4/run-1/wal_verify.json | python3 -m json.tool
cat evidence/amd64/ext4/run-1/rename_check.json | python3 -m json.tool

Step 3: Reproduce a Specific Run (Seeded)

The seed is stable — any run can be reproduced exactly:

# Reproduce a previous run (replace SEED with the value from evidence/amd64/ext4/run-1/crash_harness.json)
make portability-crash-harness SEED=4831762194857362481 ITERATIONS=20

This produces deterministic output. The same seed + iterations always produces the same sequence of crash points, the same number of surviving entries, and the same assertions.


Step 4: Run the Full 6-Cell Matrix (Requires Docker + QEMU)

# Check QEMU binfmt_misc registration
ls /proc/sys/fs/binfmt_misc/ | grep -E "qemu|arm|riscv" || \
  echo "QEMU not registered — install qemu-user-static and enable binfmt_misc"

# Register QEMU (one-time, requires root):
# docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

# Run full matrix
make portability-matrix-full

Expected summary (all 6 cells, full QEMU environment):

╔═══════════════════════════════════════════════════════════════╗
║          Core Invariant Matrix — Summary                     ║
╠═══════════════════════════════════════════════════════════════╣
║  amd64/ext4   PASS  all 4 steps passed                       ║
║  amd64/xfs    PASS  all 4 steps passed                       ║
║  arm64/ext4   PASS  all 4 steps passed                       ║
║  arm64/xfs    PASS  all 4 steps passed                       ║
║  riscv64/ext4 PASS  all 4 steps passed                       ║
║  riscv64/xfs  PASS  all 4 steps passed                       ║
╚═══════════════════════════════════════════════════════════════╝
PASS  (6 passed, 0 skipped, 0 failed)

Note on riscv64: The crash harness and WAL verifier run inside a Docker image using linux/riscv64 platform via QEMU. Performance is slower (QEMU emulation), but the 4-step verification is identical to native. riscv64 is not yet a tier-1 release target — it is covered by the matrix to prevent regressions as the architecture matures.


Step 5: Portability CI Gate (Strict Mode)

The CI gate exits 1 if any non-SKIP cell fails. This is the release certification gate.

make portability-ci-gate
echo "Exit: $?"

Expected: Exit: 0

If a cell fails, the gate prints a summary of failing cells and exits 1. The evidence directory contains logs for post-mortem analysis.


Step 6: Cross-Compile for All Three Architectures

The edge binary (edged) can be cross-compiled from any host without a cross-toolchain:

cd /home/ubuntu/vsc_workstation/autonomyops/edge

for arch in amd64 arm64 riscv64; do
  out="/tmp/edged_linux_${arch}"
  echo -n "Building ${arch}... "
  GOWORK=off GOOS=linux GOARCH=${arch} CGO_ENABLED=0 \
    /home/ubuntu/go/bin/go build -o "$out" ./cmd/edged
  echo "OK ($(du -sh "$out" | cut -f1))"
  file "$out"
done

Expected:

Building amd64... OK (9.4M)
/tmp/edged_linux_amd64: ELF 64-bit LSB executable, x86-64, statically linked
Building arm64... OK (8.9M)
/tmp/edged_linux_arm64: ELF 64-bit LSB executable, ARM aarch64, statically linked
Building riscv64... OK (9.1M)
/tmp/edged_linux_riscv64: ELF 64-bit LSB executable, UCB RISC-V, statically linked

All three binaries are statically linked (no libc dependency). This is the “run everywhere” property: copy the binary to the target system, configure edge.toml, and start.


Implementation Status

Feature

Status

Notes

amd64 production support

✅ Implemented

Tier-1; full test coverage

arm64 support

✅ Implemented

Tested via QEMU matrix; production-ready

riscv64 support

✅ In matrix

QEMU-tested; not yet tier-1 release target

ext4 support

✅ Implemented

Native; atomic rename + mmap verified

xfs support

✅ In matrix

Requires loop device or xfs host for full cell

Cross-compilation

✅ Implemented

Single go build command; no cross-toolchain

Zero CGO

✅ Implemented

Verified by edge/ci/scan_dependencies/main.go

Randomised crash harness

✅ Implemented

TestCrashHarness_Randomized; seeded, reproducible

Container images (multi-arch)

Roadmap

Docker manifests for amd64+arm64+riscv64

Native riscv64 hardware testing

Roadmap

No hardware-in-the-loop CI yet

Windows / macOS edge support

Not planned

Linux-only; unix.Uname + mmap


Troubleshooting

Symptom

Cause

Fix

QEMU not available for arm64

binfmt_misc not registered

Run docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

xfs not available

No xfs mount and not root

Use --native-fs-only flag, or run as root with mkfs.xfs

Crash harness fails intermittently

Race in fsync/fdatasync on tmpfs

Run on ext4; tmpfs doesn’t guarantee ordering

CGO_ENABLED=1 cross-compile fails

Host CGO toolchain missing

Set CGO_ENABLED=0 (all edge packages support it)

riscv64 cell very slow

QEMU emulation overhead

Expected; QEMU is ≈30–50× slower than native


What Just Happened

  • Established the portability claim: 3 arches × 2 filesystems = 6 cells, all testable

  • Ran the native-arch matrix (no QEMU, no root) and read the evidence artefacts

  • Understood each cell’s 4 steps (unit tests, crash harness, WAL verify, rename check)

  • Verified that the WAL frame format is stable across architectures (Python verifier)

  • Demonstrated cross-compilation for all three targets from a single host

  • Established which features are implemented, in matrix, and on the roadmap

Back to Index

Tutorial Pack — Seed Once, Update Everywhere