Offline-first Runbook: Buffer Then Drain¶
What you’re proving¶
Events are durably buffered when collector/send path is down.
Drain retries occur, then buffered entries replay after reconnect.
autonomy telemetry drainadvances the consumer cursor (telemetry.pos) only after successful delivery.Offline sender failure does not delete buffered entries.
Prereqs¶
Repo root:
<repo-root>Go toolchain available
Steps¶
Run telemetry offline-first tests.
GOCACHE=/tmp/go-build go test ./telemetry \
-run 'TestWALSurvivesCollectorDown|TestWALReplayAfterReconnect|TestDrainWithOfflineSender' -v
Inspect captured output.
sed -n '1,180p' docs/_generated/test-outputs/offline-drain-output.txt
Expected outputs (from real run)¶
=== RUN TestWALSurvivesCollectorDown
--- PASS: TestWALSurvivesCollectorDown
=== RUN TestWALReplayAfterReconnect
WARN telemetry/exporter: drain failed, will retry ...
--- PASS: TestWALReplayAfterReconnect
=== RUN TestDrainWithOfflineSender
--- PASS: TestDrainWithOfflineSender
PASS
Verification¶
Exit code is
0.Retry warnings appear during replay test.
All three tests pass.
Failure modes¶
Go cache permission errors: use
GOCACHE=/tmp/go-build.Slow environment can increase retry timing noise; verify by pass/fail status, not exact backoff count.
Non-goals¶
This runbook does not prove control-plane push or orchestration.
This does not claim global exactly-once semantics across fleets; delivery is at-least-once and receivers must deduplicate by stable event identifiers.
Evidence¶
telemetry/wal.gotelemetry/exporter.gotelemetry/buffer_test.go(TestWALSurvivesCollectorDown,TestWALReplayAfterReconnect)telemetry/integration_test.go(TestDrainWithOfflineSender)docs/_generated/test-outputs/offline-drain-output.txt