VAL 05 — OTel Integration Validation

1. Purpose and Claims

This validation proves that the autonomy platform’s observability subsystem works end-to-end across its two complementary paths:

  1. Prometheus metrics path: the control-plane exposes a /metrics endpoint populated with real observations after lab traffic flows through the rollout and cert API.

  2. Telemetry WAL / OTLP pipeline: the offline-first WAL-based event pipeline correctly persists events, exports them as JSONL, and delivers them to a live OTLP HTTP receiver with correlation IDs intact.

#

Claim

VAL05-C1

The control-plane Prometheus /metrics endpoint returns HTTP 200 and contains all expected metric families, with non-zero observations for the exercised HTTP, duration, rollout, and event-ingestion metrics after lab traffic

VAL05-C2

The telemetry WAL pipeline accepts events emitted by the adapter runtime API and persists them durably to the local WAL

VAL05-C3

Events exported from the WAL via telemetry export produce valid JSONL with all mandatory event fields present

VAL05-C4

Correlation IDs (trace_id / span_id) set on emitted events are preserved in the JSONL export and in the OTLP HTTP encoding delivered to a live sink

Architecture context:

This codebase implements a custom, offline-first OTLP pipeline rather than using the official go.opentelemetry.io/otel SDK. Key points for interpreting this validation:

  • Metrics: implemented with github.com/prometheus/client_golang (not the OTel metrics SDK). Metrics are exposed on 127.0.0.1:19090/metrics.

  • Traces: no OTel TracerProvider or auto-instrumented spans. Correlation IDs (trace_id / span_id) are manually attached to telemetry.Event structs at emit time, then serialized as traceId / spanId in the OTLP log record JSON.

  • WAL pipeline: adapter/edge components emit events to a local append-only WAL. The telemetry flush CLI command drains the WAL to an OTLP HTTP endpoint. VAL05 uses telemetry_emit_helper (a small lab binary) to pre-populate a test WAL with known events, because no CLI command exists to emit arbitrary adapter-side events in a lab context.


2. Scope

Covered

  • Prometheus /metrics endpoint reachability and HTTP response code

  • Presence of expected control-plane metric family names: cp_http_requests_total, cp_http_request_duration_seconds, cp_rollout_plans_total, cp_events_ingested_total

  • Non-zero observations for cp_http_requests_total, cp_http_request_duration_seconds_count, cp_rollout_plans_total, and cp_events_ingested_total after the slice exercises both the rollout API and a real POST /v1/events ingest

  • WAL durability: events written via telemetry.NewEmitter survive and are readable via telemetry status and telemetry export

  • telemetry export --out JSONL format: mandatory fields present in output

  • telemetry flush delivery to a live OTLP HTTP sink (autonomy telemetry sink)

  • trace_id and span_id fields preserved in JSONL export output

  • traceId and spanId fields present in OTLP log records delivered to the live sink

Not covered (known gaps)

  • OTel Go SDK traces: no TracerProvider, Tracer, or auto-instrumented spans are present. End-to-end distributed tracing (W3C traceparent → Jaeger) is not implemented.

  • OTel metrics SDK: Prometheus is used directly. No MeterProvider or OTel metric instruments. The Prometheus /metrics output is not OTLP-formatted.

  • Automatic trace context extraction: trace_id / span_id are not extracted from inbound HTTP traceparent headers. Injection into the WAL is manual at emit time.

  • slog structured log integration: trace_id / span_id are not injected into the Go slog structured log output. VAL05 validates only the WAL event / OTLP path.

  • Edge Prometheus metrics: edge/metrics/prometheus.go defines 30+ edge metrics but the edge process is not started by run_cli_audit_lab.sh; edge metrics are out of scope for this validation.

  • OTLP gRPC path: only OTLP/HTTP is validated; the demo collector config supports gRPC but no CLI command exercises it.

  • OTel Collector pipeline: the full demo stack (demo/docker-compose.yml + demo/otel/collector.yaml) with Jaeger is not started by this lab. The local autonomy telemetry sink is used instead.


3. Implementation Notes

telemetry_emit_helper binary

scripts/labs/telemetry_emit_helper.go is a minimal Go program compiled by the lab runner. It:

  1. Accepts --dir <path> pointing to an isolated temp WAL directory

  2. Creates a telemetry.WAL + telemetry.Emitter against that directory

  3. Emits 3 events:

    • autonomy.decision with trace_id=4bf92f3577b34da6a3ce929d0e0e4736 and span_id=00f067aa0ba902b7

    • autonomy.action without trace context

    • autonomy.error with trace_id only (tests partial correlation)

  4. Exits 0 and prints emitted 3 events to <dir>

The helper uses an isolated WAL directory to avoid polluting the runtime WAL at XDG_CACHE_HOME/autonomyops/telemetry.

OTLP sink port

The sink runs on 127.0.0.1:14318 (not the default 4318) to avoid conflicting with any existing collector processes. The autonomy telemetry sink command accepts an --listen flag for this purpose.


4. Harness

VAL05 is implemented as run_otel_val05_lab() in scripts/labs/run_cli_audit_lab.sh. It runs after run_audit_completeness_val04_lab.

Dependencies:

  • Control-plane started by run_rollout_lab at 127.0.0.1:18888 with metrics on 127.0.0.1:19090 — used by VAL05-01/02/03

  • telemetry_emit_helper binary built at script start alongside autonomy and orchestrator_ha_server

  • autonomy telemetry sink, telemetry flush, telemetry export, telemetry status CLI subcommands

Evidence directory: $EVIDENCE_DIR/val05/


5. Exact Scenarios

VAL05-01 — Prometheus Endpoint HTTP 200

Purpose: Confirm the control-plane Prometheus endpoint is reachable and returns a valid metrics response.

Action:

curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:19090/metrics

Evidence file: val05/val05-prometheus-status.txt

Pass criterion: http_code=200.


VAL05-02 — Required Metric Families Present

Purpose: Confirm the Prometheus output contains all expected control-plane metric family declarations.

Action: Capture full /metrics output; grep for each of:

  • cp_http_requests_total

  • cp_http_request_duration_seconds

  • cp_rollout_plans_total

  • cp_events_ingested_total

Evidence files:

  • val05/val05-prometheus-raw.txt — raw /metrics output

  • val05/val05-prometheus-families.txt — PRESENT/ABSENT per metric family

Pass criterion: All 4 metric families are PRESENT.


VAL05-03 — Metric Observations Non-Zero

Purpose: Prove that real lab traffic produced actual observations — the metrics are not just registered but unused.

Action: The slice first performs a real POST /v1/events against the live control-plane so cp_events_ingested_total is exercised by VAL05 itself. From the raw Prometheus output, extract sample lines for:

  • cp_http_requests_total

  • cp_http_request_duration_seconds_count

  • cp_rollout_plans_total

  • cp_events_ingested_total

Verify each value is non-zero.

Evidence file: val05/val05-prometheus-observations.txt

Pass criterion: All 4 sample lines exist and end with a non-zero value.


VAL05-04 — WAL Populated by Emit Helper

Purpose: Confirm the telemetry.Emitter → WAL write path works: events submitted via the adapter API are persisted durably.

Action:

telemetry_emit_helper --dir <isolated-wal-dir>
autonomy telemetry status --dir <isolated-wal-dir> --json

Evidence files:

  • val05/val05-emit-helper.txt — helper stdout (emitted 3 events to <dir>)

  • val05/val05-wal-status.json{"total":3,"exported":0,"pending":3}

  • val05/val05-wal-inventory.txt — WAL dir, file count, total events, pass flag

Pass criterion: wal_total_events > 0 (status JSON total > 0).


VAL05-05 — telemetry export Produces Non-Empty JSONL

Purpose: Confirm the WAL → JSONL export path works for downstream pipeline consumers that do not use OTLP.

Action:

autonomy telemetry export --dir <isolated-wal-dir> --out val05-export.jsonl

Evidence files:

  • val05/val05-export.jsonl — JSONL output (3 events, one per line)

  • val05/val05-export-stdout.txt — CLI stdout (wrote 3 events to )

  • val05/val05-export-summary.txtexport_lines, pass flag

Pass criterion: val05-export.jsonl contains at least 1 line (≥ 1 event).


VAL05-06 — JSONL Fields Present

Purpose: Verify the JSONL encoding contains all mandatory fields for downstream consumers.

Mandatory fields: "kind", "ts", "seq", "written_at", "attrs"

Action: For each field name, grep val05-export.jsonl.

Evidence file: val05/val05-export-fields.txt

Pass criterion: All 5 mandatory fields are PRESENT.


VAL05-07 — telemetry flush Delivers to Live OTLP Sink

Purpose: Prove the end-to-end OTLP/HTTP delivery path: WAL → telemetry flush → OTLP HTTP POST → autonomy telemetry sink.

Action:

# Start sink in background
autonomy telemetry sink --listen 127.0.0.1:14318 > val05-sink-output.txt &

# Flush WAL to sink
autonomy telemetry flush --dir <isolated-wal-dir> --endpoint http://127.0.0.1:14318

# Kill sink after flush completes
kill $sink_pid

Evidence files:

  • val05/val05-sink-output.txt — OTLP payloads received by the sink (pretty-printed JSON)

  • val05/val05-flush-stdout.txttelemetry flush: OK 3 events sent to

  • val05/val05-flush-summary.txtflush_ok, sink_lines, sink_payloads, pass flag

Pass criterion: flush_ok=true AND sink_payloads > 0 (the sink printed at least one received N log records payload line, not just its startup banner).


VAL05-08 — trace_id and span_id Propagated in JSONL Export

Purpose: Prove that trace_id and span_id set on a telemetry.Event at emit time are preserved through the WAL → JSONL export path.

Action: Grep val05-export.jsonl for the known trace ID:

grep '"trace_id":"4bf92f3577b34da6a3ce929d0e0e4736"' val05-export.jsonl
grep '"span_id":"00f067aa0ba902b7"' val05-export.jsonl

Evidence file: val05/val05-traceid-jsonl.txt

Pass criterion: both trace_id_found=4bf92f3577b34da6a3ce929d0e0e4736 and span_id_found=00f067aa0ba902b7 are reported.


VAL05-09 — traceId and spanId Propagated in OTLP Sink Output

Purpose: Prove the OTLP encoding translates trace_id / span_id from the WAL entry into the traceId / spanId fields expected by OTLP consumers (Jaeger, Grafana Tempo, etc.).

Action: Grep val05-sink-output.txt for the specific known traceId and spanId values from the helper event:

grep -qi '"traceId"[[:space:]]*:[[:space:]]*"4bf92f3577b34da6a3ce929d0e0e4736"' val05-sink-output.txt
grep -qi '"spanId"[[:space:]]*:[[:space:]]*"00f067aa0ba902b7"' val05-sink-output.txt

Evidence file: val05/val05-traceid-otlp.txt

Pass criterion: both traceId_found=true and spanId_found=true.


6. Evidence Files

All files are written to $EVIDENCE_DIR/val05/.

File

Produced by

Contains

val05-prometheus-status.txt

curl -w "%{http_code}"

metrics_url, http_code, pass

val05-prometheus-raw.txt

curl /metrics

Full Prometheus text exposition

val05-prometheus-families.txt

family check loop

PRESENT/ABSENT per metric family

val05-events-ingest.json

POST /v1/events

Ingest response used to exercise cp_events_ingested_total

val05-prometheus-observations.txt

sample grep

Sample lines for non-zero check

val05-emit-helper.txt

telemetry_emit_helper stdout

emitted 3 events to <dir>

val05-wal-status.json

telemetry status --json

{total, exported, pending}

val05-wal-inventory.txt

WAL check

wal_files, wal_total_events, pass

val05-export.jsonl

telemetry export --out

3-event JSONL (one entry per line)

val05-export-stdout.txt

telemetry export stdout

wrote 3 events to

val05-export-summary.txt

line count

export_lines, pass

val05-export-fields.txt

field check loop

PRESENT/ABSENT per mandatory field

val05-sink-output.txt

telemetry sink stdout

OTLP payloads as pretty-printed JSON

val05-flush-stdout.txt

telemetry flush stdout

telemetry flush: OK N events sent

val05-flush-summary.txt

flush check

flush_ok, sink_lines, pass

val05-traceid-jsonl.txt

JSONL trace_id/span_id grep

trace_id_found=<value>, span_id_found=<value>, or ABSENT

val05-traceid-otlp.txt

OTLP traceId/spanId grep

traceId_found=true/false, spanId_found=true/false

val05-report.txt

composite report

9-check PASS/FAIL + summary line

val05-report.json

composite report

Machine-readable JSON with all check statuses


7. Pass/Fail Criteria

Check ID

Name

File

Pass condition

VAL05-01

prometheus_endpoint

val05-prometheus-status.txt

http_code=200

VAL05-02

metric_families_present

val05-prometheus-families.txt

All 4 families PRESENT

VAL05-03

metric_observations

val05-prometheus-observations.txt

All 4 exercised samples non-zero

VAL05-04

wal_populated

val05-wal-inventory.txt

wal_total_events > 0

VAL05-05

export_jsonl_nonempty

val05-export-summary.txt

export_lines > 0

VAL05-06

export_jsonl_fields

val05-export-fields.txt

All 5 fields PRESENT

VAL05-07

otlp_flush_sink

val05-flush-summary.txt

flush_ok=true and sink_payloads > 0

VAL05-08

trace_id_jsonl

val05-traceid-jsonl.txt

Known trace ID and span ID found in JSONL

VAL05-09

traceid_otlp

val05-traceid-otlp.txt

traceId_found=true and spanId_found=true

Overall pass: all 9 checks pass and val05-report.txt reports pass=9 fail=0 total=9.

Failure handling:

  • VAL05-01 fails: the control-plane from run_rollout_lab may have exited; check $WORK_DIR/ for the orchestrator process log

  • VAL05-02 fails (ABSENT family): a code change removed a metric registration; cross-reference orchestrator/metrics.go NewCPMetrics() for the registration

  • VAL05-03 fails (zero observations): either the rollout lab did not exercise the expected API path, or the explicit POST /v1/events ingest in VAL05 did not succeed; check val05-events-ingest.json and the control-plane logs

  • VAL05-04 fails: telemetry_emit_helper may have failed (check val05-emit-helper.txt) or the WAL open failed (new AUTONOMYOPS_WAL_LEGACY_UPGRADE env var may be needed if the WAL directory is reused from a prior run)

  • VAL05-05 fails: if VAL05-04 also failed the WAL is empty; otherwise the telemetry export command itself failed — check val05-export-stdout.txt

  • VAL05-06 fails (ABSENT field): the WAL Entry JSON encoding changed; check telemetry/wal.go Entry struct tags

  • VAL05-07 fails: either flush exited non-zero (check val05-flush-stdout.txt) or the sink never printed a received N log records payload line; a startup banner alone is not sufficient proof of delivery

  • VAL05-08 fails: either the JSONL Entry no longer preserves trace_id / span_id under the nested "event" object, or the helper event set changed

  • VAL05-09 fails: the sink may have received payloads, but the OTLP encoding no longer carried the expected traceId / spanId values; inspect val05-sink-output.txt


8. Report Template

# VAL 05 — OTel Integration Validation Report
timestamp: 2026-03-20T10:00:00Z

## Results
VAL05-01 prometheus_endpoint:     PASS
VAL05-02 metric_families_present: PASS
VAL05-03 metric_observations:     PASS
VAL05-04 wal_populated:           PASS
VAL05-05 export_jsonl_nonempty:   PASS
VAL05-06 export_jsonl_fields:     PASS
VAL05-07 otlp_flush_sink:         PASS
VAL05-08 trace_id_jsonl:          PASS
VAL05-09 traceid_otlp:            PASS

## Summary
pass=9  fail=0  total=9

The runner also prints VAL 05: pass=9 fail=0 total=9 (report: val05-report.txt) to stdout so CI log scanners can grep for VAL 05: pass=.


9. How to Run

VAL05 executes automatically as the last validation slice when the full lab is run:

export GOROOT=/home/ubuntu/.local/go1.25.7
export PATH="$GOROOT/bin:$PATH"
export GOTOOLCHAIN=local

bash scripts/labs/run_cli_audit_lab.sh

To inspect results after a run:

# Quick pass/fail
cat evidence/pr17-cli-audit-local-2026-03-17/val05/val05-report.txt

# Prometheus metric families check
cat evidence/pr17-cli-audit-local-2026-03-17/val05/val05-prometheus-families.txt

# WAL status
cat evidence/pr17-cli-audit-local-2026-03-17/val05/val05-wal-status.json

# Verify trace_id in JSONL
grep '"trace_id"' evidence/pr17-cli-audit-local-2026-03-17/val05/val05-export.jsonl | head -3

# Inspect OTLP sink payload (first OTLP log record)
grep -A5 '"traceId"' evidence/pr17-cli-audit-local-2026-03-17/val05/val05-sink-output.txt | head -20

# Machine-readable report
jq '{pass_count, fail_count, checks}' \
  evidence/pr17-cli-audit-local-2026-03-17/val05/val05-report.json