Core Invariant Matrix

The Core Invariant Matrix is the formal evidence artifact for cross-architecture portability certification (workplan v1.6 §4). It proves six independent cells — three target architectures × two filesystems — and collects structured evidence for each.


Matrix Overview

Cell

Architecture

Filesystem

CI Runner

Status

1

amd64

ext4

ubuntu-24.04 (native)

gated

2

amd64

xfs

ubuntu-24.04 + loop device (root)

gated

3

arm64

ext4

ubuntu-24.04-arm (native)

gated

4

arm64

xfs

ubuntu-24.04-arm + loop device

optional

5

riscv64

ext4

QEMU binfmt_misc via ubuntu-24.04

informational

6

riscv64

xfs

QEMU binfmt_misc + loop device

optional

Gated = failure blocks the release tag. Informational = failure raises a warning but does not block.


What Each Cell Verifies

Each cell runs four sequential steps. All four must pass for the cell to be PASS.

Step 1 — Go unit tests

Runs the full telemetry/ and lock/ module test suites (GOWORK=off go test ./...) on the target architecture. This proves that WAL invariants (R1–R5, S1–S12) hold natively on each arch.

Step 2 — Randomized crash harness

Runs TestCrashHarness_Randomized with a captured PRNG seed (default 20 iterations). Each iteration:

  1. Writes N committed entries (N ∈ [1, 10], seeded) — each Append writes the WAL frame + telemetry.safe_seq.

  2. Injects a phantom frame directly into the WAL file (bypassing writeSafeSeq) to simulate the S3 crash window.

  3. Reopens the WAL (recovery path).

  4. Asserts:

    • ReadFrom(0) returns exactly N entries (phantom discarded — R2/R5)

    • Seq values are 1..N in strict monotonic order (R1/R3)

    • On-disk safe_seq equals N after recovery (S1)

The seed is logged in cmd.txt so any failing run can be reproduced exactly.

Step 3 — Known-good WAL fixture verify

Writes a 5-entry WAL via TestKnownGoodWALFixture, then runs wal_verify.py which checks all six invariants (R1–R5 + no JSON corruption) and emits wal_verify.json.

Step 4 — Atomic rename check

Writes a sentinel value to target.tmp, calls mv target.tmp target, and verifies the target file contains the sentinel. This validates POSIX atomic rename semantics (os.Rename) on the target filesystem, which is the foundation of the telemetry.safe_seq update path (§3.5).


Evidence Directory Structure

Each cell produces a run directory:

evidence/
└── <arch>/
    └── <fs>/
        └── run-<N>/
            ├── cmd.txt          ← exact command (reproducible)
            ├── dmesg.txt        ← kernel ring buffer before+after
            ├── autonomy.log     ← combined stdout+stderr from Go tests
            └── wal_verify.json  ← structured invariant report

wal_verify.json schema

{
  "timestamp":        "2026-03-03T00:00:00Z",
  "seed":             1741910400000000000,
  "arch":             "amd64",
  "fs_type":          "ext4",
  "total_iterations": 20,
  "passed":           20,
  "failed":           0,
  "pass":             true,
  "rename_atomic":    "PASS",
  "run_n":            1,
  "invariants": {
    "R1_FIRST_SEQ_IS_ONE":     { "pass": true, "msg": "ok" },
    "R2_SEQ_CONTIGUOUS":       { "pass": true, "msg": "ok" },
    "R3_SAFESEQ_LE_MAXSEQ":    { "pass": true, "msg": "ok" },
    "R4_SAFESEQ_FRAME_EXISTS": { "pass": true, "msg": "ok" },
    "R5_NO_PHANTOMS":          { "pass": true, "msg": "ok" },
    "NO_JSON_CORRUPTION":      { "pass": true, "msg": "ok" }
  },
  "violations": [],
  "iterations": [ ... ]
}

Running Locally

Fastest path — native arch + ext4 (no QEMU, no root)

make portability-matrix

Runs all four steps on your current arch against the default tmpdir filesystem. Evidence is written to evidence/<arch>/ext4/run-<N>/.

Randomized crash harness only

# Auto-seed (seed printed in output and saved to cmd.txt)
make portability-crash-harness

# Fixed seed — reproduce a specific run
make portability-crash-harness SEED=1741910400000000000

# More iterations
make portability-crash-harness ITERATIONS=100

Full 6-cell matrix (requires Docker + QEMU)

# Register QEMU binfmt handlers (one-time, requires root/Docker)
docker run --privileged --rm tonistiigi/binfmt --install arm64,riscv64

# Run all 6 cells
make portability-matrix-full

XFS cell (requires root on Linux)

sudo apt-get install xfsprogs     # Debian/Ubuntu
sudo make portability-matrix-full MATRIX_FS="ext4 xfs"

Standalone WAL verifier

# Verify any WAL directory directly
python3 scripts/portability/wal_verify.py \
    --dir /path/to/wal \
    --arch amd64 --fs ext4 \
    --out /tmp/wal_verify.json

Reproduce a failing run from its seed

Every wal_verify.json contains the seed. To reproduce:

# From the evidence file:
SEED=$(python3 -c "import json; print(json.load(open('evidence/amd64/ext4/run-1/wal_verify.json'))['seed'])")
make portability-crash-harness SEED=$SEED ITERATIONS=20

CI Invocation

Nightly (02:00 UTC)

The portability-nightly.yml workflow runs all configured cells automatically. Evidence is uploaded as GitHub Actions artifacts (portability-evidence-<arch>-<fs>) with a 30-day retention window.

Workflow status badge:

![Portability Matrix](https://github.com/autonomyops/adk/actions/workflows/portability-nightly.yml/badge.svg)

Release Gate

When a v* tag is pushed, the same workflow runs in strict mode: the release-gate job aggregates all cell results and exits 1 if any non-skip cell has "pass": false in its wal_verify.json. This blocks the GitHub Release from being created.

Evidence for release runs is retained for 90 days and included in the release artifact bundle (build_docs_and_evidence.sh).

Manual trigger

Any run can be triggered manually from the Actions tab with optional overrides:

Input

Description

Default

iterations

Crash harness iterations per cell

20

seed

Fixed PRNG seed (empty = auto-generate)

""

arches

Space-separated arch list

"amd64 arm64 riscv64"


Interpreting Results

pass

rename_atomic

Interpretation

true

PASS

All invariants hold; cell certifies

false

PASS

WAL invariant violation detected; see violations[]

false

FAIL

Atomic rename not working on this FS; fundamental portability issue

null

Cell was skipped (QEMU not available or FS not accessible)

A violations entry always includes a code field:

Code

Invariant

Meaning

FIRST_SEQ_NOT_ONE

R1

First WAL frame Seq ≠ 1

SEQ_GAP

R2

Non-contiguous Seq in committed range

SAFESEQ_GT_MAXSEQ

R3

safe_seq > maxSeq (reorder bug)

SAFESEQ_NOT_FOUND

R4

Frame at safe_seq not found in WAL

PHANTOM_ENTRIES

R5

Entries with Seq > safe_seq survived recovery

CORRUPT_JSON

Structurally valid frame with unparseable JSON


Governance

  • Phase 1 gate: cells 1 (amd64/ext4) and 3 (arm64/ext4) must both be PASS for a release tag to proceed.

  • Phase 2 hardening (deferred): cells 2 (amd64/xfs), 4 (arm64/xfs), and 5 (riscv64/ext4) are added to the release gate.

  • Platform waiver process: a failing cell may be waived by creating an entry in edge/waivers/ following the existing waiver format (INV-series).


Files

File

Purpose

scripts/portability/lib_portability.sh

Shared bash utilities (arch/FS detection, evidence capture)

scripts/portability/wal_verify.py

Standalone WAL verifier → wal_verify.json

scripts/portability/crash_harness.sh

Crash harness wrapper with evidence collection

scripts/portability/core_matrix.sh

6-cell matrix driver

telemetry/crash_harness_test.go

TestCrashHarness_Randomized + TestKnownGoodWALFixture

.github/workflows/portability-nightly.yml

Nightly + release gate CI

evidence/

Evidence output (gitignored except .gitkeep + run-1 example)