VAL 05 — OTel Integration Validation¶
1. Purpose and Claims¶
This validation proves that the autonomy platform’s observability subsystem works end-to-end across its two complementary paths:
Prometheus metrics path: the control-plane exposes a
/metricsendpoint populated with real observations after lab traffic flows through the rollout and cert API.Telemetry WAL / OTLP pipeline: the offline-first WAL-based event pipeline correctly persists events, exports them as JSONL, and delivers them to a live OTLP HTTP receiver with correlation IDs intact.
# |
Claim |
|---|---|
VAL05-C1 |
The control-plane Prometheus |
VAL05-C2 |
The telemetry WAL pipeline accepts events emitted by the adapter runtime API and persists them durably to the local WAL |
VAL05-C3 |
Events exported from the WAL via |
VAL05-C4 |
Correlation IDs ( |
Architecture context:
This codebase implements a custom, offline-first OTLP pipeline rather than
using the official go.opentelemetry.io/otel SDK. Key points for interpreting
this validation:
Metrics: implemented with
github.com/prometheus/client_golang(not the OTel metrics SDK). Metrics are exposed on127.0.0.1:19090/metrics.Traces: no OTel
TracerProvideror auto-instrumented spans. Correlation IDs (trace_id/span_id) are manually attached totelemetry.Eventstructs at emit time, then serialized astraceId/spanIdin the OTLP log record JSON.WAL pipeline: adapter/edge components emit events to a local append-only WAL. The
telemetry flushCLI command drains the WAL to an OTLP HTTP endpoint. VAL05 usestelemetry_emit_helper(a small lab binary) to pre-populate a test WAL with known events, because no CLI command exists to emit arbitrary adapter-side events in a lab context.
2. Scope¶
Covered¶
Prometheus
/metricsendpoint reachability and HTTP response codePresence of expected control-plane metric family names:
cp_http_requests_total,cp_http_request_duration_seconds,cp_rollout_plans_total,cp_events_ingested_totalNon-zero observations for
cp_http_requests_total,cp_http_request_duration_seconds_count,cp_rollout_plans_total, andcp_events_ingested_totalafter the slice exercises both the rollout API and a realPOST /v1/eventsingestWAL durability: events written via
telemetry.NewEmittersurvive and are readable viatelemetry statusandtelemetry exporttelemetry export --outJSONL format: mandatory fields present in outputtelemetry flushdelivery to a live OTLP HTTP sink (autonomy telemetry sink)trace_idandspan_idfields preserved in JSONL export outputtraceIdandspanIdfields present in OTLP log records delivered to the live sink
Not covered (known gaps)¶
OTel Go SDK traces: no
TracerProvider,Tracer, or auto-instrumented spans are present. End-to-end distributed tracing (W3C traceparent → Jaeger) is not implemented.OTel metrics SDK: Prometheus is used directly. No
MeterProvideror OTel metric instruments. The Prometheus/metricsoutput is not OTLP-formatted.Automatic trace context extraction:
trace_id/span_idare not extracted from inbound HTTPtraceparentheaders. Injection into the WAL is manual at emit time.slog structured log integration:
trace_id/span_idare not injected into the Goslogstructured log output. VAL05 validates only the WAL event / OTLP path.Edge Prometheus metrics:
edge/metrics/prometheus.godefines 30+ edge metrics but the edge process is not started byrun_cli_audit_lab.sh; edge metrics are out of scope for this validation.OTLP gRPC path: only OTLP/HTTP is validated; the demo collector config supports gRPC but no CLI command exercises it.
OTel Collector pipeline: the full demo stack (
demo/docker-compose.yml+demo/otel/collector.yaml) with Jaeger is not started by this lab. The localautonomy telemetry sinkis used instead.
3. Implementation Notes¶
telemetry_emit_helper binary¶
scripts/labs/telemetry_emit_helper.go is a minimal Go program compiled by
the lab runner. It:
Accepts
--dir <path>pointing to an isolated temp WAL directoryCreates a
telemetry.WAL+telemetry.Emitteragainst that directoryEmits 3 events:
autonomy.decisionwithtrace_id=4bf92f3577b34da6a3ce929d0e0e4736andspan_id=00f067aa0ba902b7autonomy.actionwithout trace contextautonomy.errorwithtrace_idonly (tests partial correlation)
Exits 0 and prints
emitted 3 events to <dir>
The helper uses an isolated WAL directory to avoid polluting the runtime WAL
at XDG_CACHE_HOME/autonomyops/telemetry.
OTLP sink port¶
The sink runs on 127.0.0.1:14318 (not the default 4318) to avoid conflicting
with any existing collector processes. The autonomy telemetry sink command
accepts an --listen flag for this purpose.
4. Harness¶
VAL05 is implemented as run_otel_val05_lab() in
scripts/labs/run_cli_audit_lab.sh. It runs after run_audit_completeness_val04_lab.
Dependencies:
Control-plane started by
run_rollout_labat127.0.0.1:18888with metrics on127.0.0.1:19090— used by VAL05-01/02/03telemetry_emit_helperbinary built at script start alongsideautonomyandorchestrator_ha_serverautonomy telemetry sink,telemetry flush,telemetry export,telemetry statusCLI subcommands
Evidence directory: $EVIDENCE_DIR/val05/
5. Exact Scenarios¶
VAL05-01 — Prometheus Endpoint HTTP 200¶
Purpose: Confirm the control-plane Prometheus endpoint is reachable and returns a valid metrics response.
Action:
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:19090/metrics
Evidence file: val05/val05-prometheus-status.txt
Pass criterion: http_code=200.
VAL05-02 — Required Metric Families Present¶
Purpose: Confirm the Prometheus output contains all expected control-plane metric family declarations.
Action: Capture full /metrics output; grep for each of:
cp_http_requests_totalcp_http_request_duration_secondscp_rollout_plans_totalcp_events_ingested_total
Evidence files:
val05/val05-prometheus-raw.txt— raw/metricsoutputval05/val05-prometheus-families.txt— PRESENT/ABSENT per metric family
Pass criterion: All 4 metric families are PRESENT.
VAL05-03 — Metric Observations Non-Zero¶
Purpose: Prove that real lab traffic produced actual observations — the metrics are not just registered but unused.
Action: The slice first performs a real POST /v1/events against the live
control-plane so cp_events_ingested_total is exercised by VAL05 itself. From
the raw Prometheus output, extract sample lines for:
cp_http_requests_totalcp_http_request_duration_seconds_countcp_rollout_plans_totalcp_events_ingested_total
Verify each value is non-zero.
Evidence file: val05/val05-prometheus-observations.txt
Pass criterion: All 4 sample lines exist and end with a non-zero value.
VAL05-04 — WAL Populated by Emit Helper¶
Purpose: Confirm the telemetry.Emitter → WAL write path works: events
submitted via the adapter API are persisted durably.
Action:
telemetry_emit_helper --dir <isolated-wal-dir>
autonomy telemetry status --dir <isolated-wal-dir> --json
Evidence files:
val05/val05-emit-helper.txt— helper stdout (emitted 3 events to <dir>)val05/val05-wal-status.json—{"total":3,"exported":0,"pending":3}val05/val05-wal-inventory.txt— WAL dir, file count, total events, pass flag
Pass criterion: wal_total_events > 0 (status JSON total > 0).
VAL05-05 — telemetry export Produces Non-Empty JSONL¶
Purpose: Confirm the WAL → JSONL export path works for downstream pipeline consumers that do not use OTLP.
Action:
autonomy telemetry export --dir <isolated-wal-dir> --out val05-export.jsonl
Evidence files:
val05/val05-export.jsonl— JSONL output (3 events, one per line)val05/val05-export-stdout.txt— CLI stdout (wrote 3 events to …)val05/val05-export-summary.txt—export_lines,passflag
Pass criterion: val05-export.jsonl contains at least 1 line (≥ 1 event).
VAL05-06 — JSONL Fields Present¶
Purpose: Verify the JSONL encoding contains all mandatory fields for downstream consumers.
Mandatory fields: "kind", "ts", "seq", "written_at", "attrs"
Action: For each field name, grep val05-export.jsonl.
Evidence file: val05/val05-export-fields.txt
Pass criterion: All 5 mandatory fields are PRESENT.
VAL05-07 — telemetry flush Delivers to Live OTLP Sink¶
Purpose: Prove the end-to-end OTLP/HTTP delivery path: WAL → telemetry flush
→ OTLP HTTP POST → autonomy telemetry sink.
Action:
# Start sink in background
autonomy telemetry sink --listen 127.0.0.1:14318 > val05-sink-output.txt &
# Flush WAL to sink
autonomy telemetry flush --dir <isolated-wal-dir> --endpoint http://127.0.0.1:14318
# Kill sink after flush completes
kill $sink_pid
Evidence files:
val05/val05-sink-output.txt— OTLP payloads received by the sink (pretty-printed JSON)val05/val05-flush-stdout.txt—telemetry flush: OK — 3 events sent to …val05/val05-flush-summary.txt—flush_ok,sink_lines,sink_payloads,passflag
Pass criterion: flush_ok=true AND sink_payloads > 0 (the sink printed
at least one received N log records payload line, not just its startup banner).
VAL05-08 — trace_id and span_id Propagated in JSONL Export¶
Purpose: Prove that trace_id and span_id set on a telemetry.Event at
emit time are preserved through the WAL → JSONL export path.
Action: Grep val05-export.jsonl for the known trace ID:
grep '"trace_id":"4bf92f3577b34da6a3ce929d0e0e4736"' val05-export.jsonl
grep '"span_id":"00f067aa0ba902b7"' val05-export.jsonl
Evidence file: val05/val05-traceid-jsonl.txt
Pass criterion: both trace_id_found=4bf92f3577b34da6a3ce929d0e0e4736 and
span_id_found=00f067aa0ba902b7 are reported.
VAL05-09 — traceId and spanId Propagated in OTLP Sink Output¶
Purpose: Prove the OTLP encoding translates trace_id / span_id from the
WAL entry into the traceId / spanId fields expected by OTLP consumers
(Jaeger, Grafana Tempo, etc.).
Action: Grep val05-sink-output.txt for the specific known traceId and
spanId values from the helper event:
grep -qi '"traceId"[[:space:]]*:[[:space:]]*"4bf92f3577b34da6a3ce929d0e0e4736"' val05-sink-output.txt
grep -qi '"spanId"[[:space:]]*:[[:space:]]*"00f067aa0ba902b7"' val05-sink-output.txt
Evidence file: val05/val05-traceid-otlp.txt
Pass criterion: both traceId_found=true and spanId_found=true.
6. Evidence Files¶
All files are written to $EVIDENCE_DIR/val05/.
File |
Produced by |
Contains |
|---|---|---|
|
|
|
|
|
Full Prometheus text exposition |
|
family check loop |
PRESENT/ABSENT per metric family |
|
|
Ingest response used to exercise |
|
sample grep |
Sample lines for non-zero check |
|
|
|
|
|
|
|
WAL check |
|
|
|
3-event JSONL (one entry per line) |
|
|
|
|
line count |
|
|
field check loop |
PRESENT/ABSENT per mandatory field |
|
|
OTLP payloads as pretty-printed JSON |
|
|
|
|
flush check |
|
|
JSONL trace_id/span_id grep |
|
|
OTLP traceId/spanId grep |
|
|
composite report |
9-check PASS/FAIL + summary line |
|
composite report |
Machine-readable JSON with all check statuses |
7. Pass/Fail Criteria¶
Check ID |
Name |
File |
Pass condition |
|---|---|---|---|
VAL05-01 |
prometheus_endpoint |
|
|
VAL05-02 |
metric_families_present |
|
All 4 families PRESENT |
VAL05-03 |
metric_observations |
|
All 4 exercised samples non-zero |
VAL05-04 |
wal_populated |
|
|
VAL05-05 |
export_jsonl_nonempty |
|
|
VAL05-06 |
export_jsonl_fields |
|
All 5 fields PRESENT |
VAL05-07 |
otlp_flush_sink |
|
|
VAL05-08 |
trace_id_jsonl |
|
Known trace ID and span ID found in JSONL |
VAL05-09 |
traceid_otlp |
|
|
Overall pass: all 9 checks pass and val05-report.txt reports pass=9 fail=0 total=9.
Failure handling:
VAL05-01 fails: the control-plane from
run_rollout_labmay have exited; check$WORK_DIR/for the orchestrator process logVAL05-02 fails (ABSENT family): a code change removed a metric registration; cross-reference
orchestrator/metrics.goNewCPMetrics()for the registrationVAL05-03 fails (zero observations): either the rollout lab did not exercise the expected API path, or the explicit
POST /v1/eventsingest in VAL05 did not succeed; checkval05-events-ingest.jsonand the control-plane logsVAL05-04 fails:
telemetry_emit_helpermay have failed (checkval05-emit-helper.txt) or the WAL open failed (newAUTONOMYOPS_WAL_LEGACY_UPGRADEenv var may be needed if the WAL directory is reused from a prior run)VAL05-05 fails: if VAL05-04 also failed the WAL is empty; otherwise the
telemetry exportcommand itself failed — checkval05-export-stdout.txtVAL05-06 fails (ABSENT field): the WAL
EntryJSON encoding changed; checktelemetry/wal.goEntry struct tagsVAL05-07 fails: either
flushexited non-zero (checkval05-flush-stdout.txt) or the sink never printed areceived N log recordspayload line; a startup banner alone is not sufficient proof of deliveryVAL05-08 fails: either the JSONL Entry no longer preserves
trace_id/span_idunder the nested"event"object, or the helper event set changedVAL05-09 fails: the sink may have received payloads, but the OTLP encoding no longer carried the expected
traceId/spanIdvalues; inspectval05-sink-output.txt
8. Report Template¶
# VAL 05 — OTel Integration Validation Report
timestamp: 2026-03-20T10:00:00Z
## Results
VAL05-01 prometheus_endpoint: PASS
VAL05-02 metric_families_present: PASS
VAL05-03 metric_observations: PASS
VAL05-04 wal_populated: PASS
VAL05-05 export_jsonl_nonempty: PASS
VAL05-06 export_jsonl_fields: PASS
VAL05-07 otlp_flush_sink: PASS
VAL05-08 trace_id_jsonl: PASS
VAL05-09 traceid_otlp: PASS
## Summary
pass=9 fail=0 total=9
The runner also prints VAL 05: pass=9 fail=0 total=9 (report: val05-report.txt) to
stdout so CI log scanners can grep for VAL 05: pass=.
9. How to Run¶
VAL05 executes automatically as the last validation slice when the full lab is run:
export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local
bash scripts/labs/run_cli_audit_lab.sh
To inspect results after a run:
# Quick pass/fail
cat evidence/pr17-cli-audit-local-2026-03-17/val05/val05-report.txt
# Prometheus metric families check
cat evidence/pr17-cli-audit-local-2026-03-17/val05/val05-prometheus-families.txt
# WAL status
cat evidence/pr17-cli-audit-local-2026-03-17/val05/val05-wal-status.json
# Verify trace_id in JSONL
grep '"trace_id"' evidence/pr17-cli-audit-local-2026-03-17/val05/val05-export.jsonl | head -3
# Inspect OTLP sink payload (first OTLP log record)
grep -A5 '"traceId"' evidence/pr17-cli-audit-local-2026-03-17/val05/val05-sink-output.txt | head -20
# Machine-readable report
jq '{pass_count, fail_count, checks}' \
evidence/pr17-cli-audit-local-2026-03-17/val05/val05-report.json