ROS 2 Governed Bridge¶

Audience: operators turning on, observing, or recovering the long-lived governed_ros2_bridge process — the runtime-owned C++ rclcpp bridge that subscribes on a separate “agent” DDS domain, POSTs every message to the AutonomyOps /v1/tool runtime for policy evaluation, and republishes allowed messages on the “real” DDS domain. This is the per-message counterpart to launch-level governance (the ROS 2 Governance reference covers the launch path).

The bridge is opt-in via --governed-bridge on either autonomy ros2 run (paid) or autonomy run ros2.launch (CE); default off preserves prior AutoRuntime behavior. This page tells you what to do when you turn it on, what to look for, and how to get out of trouble.

Walking through the demo first? Start at ROS 2 Governed Bridge Quickstart; it runs autonomy demo ros2-bridge end-to-end with allow + deny evidence. The runbook below assumes you already have a workload to govern.

Prerequisites¶

docker on PATH. The bridge runs in a container even when the workload runs natively — runtime/ros2bridge.BridgeProcess enforces NetworkMode=host and IPCMode=host, both of which are dispatched by runtime/exec via Docker.
ghcr.io/autonomyops/adk-ros2-runtime:<version> present locally. Pull with docker pull ghcr.io/autonomyops/adk-ros2-runtime:latest, or build from source via docker build -t ghcr.io/autonomyops/adk-ros2-runtime:local -f demo/ros2-runtime/Dockerfile ..
A policy bundle that allows the topics the bridge will republish. The embedded embedded:ros2-bridge-demo policy allows /cmd_vel (and /cmd_vel/*) and denies /disable_safety. Production fleets stage a custom bundle via autonomy bundle pull <ref> and pass --policy <ref> to the launch command.

Procedure¶

1. Pick two ROS_DOMAIN_IDs¶

The bridge subscribes on one domain and republishes on another. They must differ, or the bridge collapses into a loopback that defeats governance entirely (BridgeProcess.Run returns ErrSameDomain immediately).

Conventions used across the demo + docs:

Role	Default	Meaning
`--agent-domain`	`99`	Where the launched workload publishes — the “untrusted” side. Any DDS participant on this domain is intercepted.
`--real-domain`	`42`	Where allowed messages get republished — the “real robot” side. Production subscribers (motor controller, perception stack) live here.

If your fleet already uses a particular ROS_DOMAIN_ID for production traffic, pin it to --real-domain and pick any unused integer in 0..101 for --agent-domain. The runbook below assumes 99 / 42.

2. Enable the bridge on the launch¶

You must pass --bridge-topics to tell the bridge which workload topics to intercept. Without it the bridge falls back to a compiled-in default (/agent_chat typed std_msgs/msg/String) and the workload’s publishes on /cmd_vel / sensor topics / etc. are silently ungoverned — the runner prints a stderr WARN on this combination, but the launch still proceeds (the bridge’s fail-closed posture means no agent publish reaches real ungoverned in that state, just none reach real at all).

CE (no orchestrator required):

autonomy run \
    --image ghcr.io/autonomyops/adk-ros2-runtime:latest \
    --governed-bridge \
    --agent-domain 99 \
    --real-domain 42 \
    --bridge-topics '/cmd_vel:geometry_msgs/msg/Twist,/disable_safety:std_msgs/msg/Bool' \
    ros2.launch launch demo_robot arm_demo.launch.py

Paid tier (same flags, paid-tier surface):

autonomy ros2 run \
    --image ghcr.io/autonomyops/adk-ros2-runtime:latest \
    --governed-bridge \
    --agent-domain 99 \
    --real-domain 42 \
    --bridge-topics '/cmd_vel:geometry_msgs/msg/Twist,/disable_safety:std_msgs/msg/Bool' \
    launch demo_robot arm_demo.launch.py

What happens, in order:

The runtime binds the in-process /v1/tool server to a random 127.0.0.1:<port>. The URL is injected into both the bridge container and the launched workload container as AUTONOMY_RUNTIME_URL.
The bridge container is spawned (--network host --ipc host, subscribing on ROS_DOMAIN_ID=99).
The launch waits for the bridge to print governed_ros2_bridge: ready agent_domain=99 real_domain=42 on stdout. The readiness wait is bounded by --bridge-ready-timeout (default 30s); see Step 5 if it times out.
The launched workload starts with ROS_DOMAIN_ID=99 and --ipc=host injected — so its publishes land on the bridge’s subscription domain and share /dev/shm with the bridge container for FastDDS SHM transport.
Every message the workload publishes on a bridged topic flows: workload → bridge.subscribe → POST /v1/tool → policy → (allow) → bridge.republish → real domain.

3. Confirm the loop is closed¶

In a second terminal, inspect decision frames the bridge has emitted so far (re-run after each publish — autonomy wal inspect reads the WAL file end-to-end on each invocation; there is no streaming wal subcommand):

# Preferred: first-class --bridge-only filter (#939 4-E.a).
autonomy wal inspect --kind autonomy.decision --bridge-only --json

# Equivalent jq form (still works; use this if you need richer projection).
autonomy wal inspect --kind autonomy.decision --json \
  | jq 'select(.event.attrs.bridge_origin == "governed_ros2_bridge")'

You should see one frame per bridged publish, each carrying:

tool=tool.ros2.topic.publish
outcome=allow (or deny if the policy rejected it)
bridge_origin=governed_ros2_bridge (#939 4-E.a marker; absent on direct node POSTs, present on bridge-routed POSTs — pinned by bridgeOriginFromRequest in runtime/server.go)
policy_ref matching the bundle’s manifest.policy_ref

If the marker is absent on bridge-routed POSTs, the bridge is not in fact mediating the publish — the workload is publishing directly on the real domain. Re-check that --agent-domain and --real-domain differ and that the workload’s ROS_DOMAIN_ID env was actually overridden (see Step 5).

4a. Choose a deny-action (safety-critical for actuation)¶

By default a DENY drops the message — the bridge simply doesn’t republish. For a rate-controlled actuation topic like /cmd_vel that is unsafe: the real-side robot holds its last command and keeps moving, so a denied “drive forward” doesn’t stop the robot, it just stops updating it. “Deny the command” ≠ “command a safe state.”

--deny-action controls what the bridge publishes on a deny:

`--deny-action`	On DENY	Use for
`drop` (default)	Nothing republished; sink holds its last value.	Observational / non-actuation topics.
`hold`	Republish the last allowed message (hold last-known-good).	Topics where the last approved value is a sane fallback.
`safe`	Publish a zero-initialized message of the topic type (e.g. zero `Twist`) — a neutral command.	Actuation topics: the robot is commanded to stop rather than coasting through the deny.

autonomy ros2 run --governed-bridge --force-native \
    --bridge-topics '/cmd_vel:geometry_msgs/msg/Twist' \
    --deny-action safe \
    launch demo_robot arm_demo.launch.py

With --deny-action safe the bridge logs DENY … -> SAFE (published zero/neutral value) and the real-side subscriber receives an all-zero message on every deny — no robot-side deadman required. If the zero message can’t be built for a topic’s type (missing typesupport), the bridge refuses to start rather than silently falling back to drop.

Common operator situations¶

5. Troubleshooting the readiness gap¶

Symptom: the launch fails with

ros2: governed bridge did not signal ready within 30s

ros2: governed bridge exited before signaling ready: <wrapped error>

The first case is a soft timeout (the bridge is alive but slow); the second case is a hard exit (#940 fix — pre-fix this used to fall through to the soft timeout and silently corrupt the run). Both abort the launch — the runtime will not start the workload without a ready bridge.

Triage:

Was the image pull cold? Cold pulls of adk-ros2-runtime regularly exceed 30s on slow networks. Pre-pull:
```
docker pull ghcr.io/autonomyops/adk-ros2-runtime:latest
```
then re-run. For pinned-bandwidth environments, raise the timeout on the launch: --bridge-ready-timeout 2m.

Is the binary actually in the image?

docker run --rm --entrypoint /bin/bash \
    ghcr.io/autonomyops/adk-ros2-runtime:latest \
    -c 'which governed_ros2_bridge && governed_ros2_bridge --version'

If the binary is missing, your local image was built before the governed_ros2_bridge colcon target was added to demo/ros2-runtime/Dockerfile. Rebuild:

docker build -t ghcr.io/autonomyops/adk-ros2-runtime:local -f demo/ros2-runtime/Dockerfile .
autonomy run --image ghcr.io/autonomyops/adk-ros2-runtime:local --governed-bridge ...

Is the bridge container exiting on a config error? Check the wrapped error in the abort message — ErrSameDomain and ErrRuntimeURLRequired both surface here. ErrSameDomain means you passed --agent-domain == --real-domain; pick different integers.

6. Recovering from a stuck bridge container¶

Symptom: the launch process has been killed (Ctrl-C, SIGKILL, host reboot) but docker ps shows the bridge container still running. Or: a fresh launch fails with a “port already in use” / “FastDDS already bound” stderr line.

The bridge is spawned as docker run --rm, so a clean shutdown removes the container. A killed launch process may leak it if Docker didn’t get the SIGTERM cascade in time.

List bridge containers:

docker ps --filter ancestor=ghcr.io/autonomyops/adk-ros2-runtime:latest --format 'table {{.ID}}\t{{.Status}}\t{{.Command}}'

Stop with grace (lets the bridge flush its last decisions):
```
docker stop <container-id>
```
If stop hangs >10s, force-remove:
```
docker rm -f <container-id>
```
Re-launch. The runtime starts a fresh bridge on a fresh 127.0.0.1:<random> port; no shared state with the prior process.

If your shell history shows the bridge launched with --keep, the WAL directory under /tmp/autonomyops-demo-wal-* will still be on disk — that’s intended (see Step 7 for how to read it).

7. Inspecting the WAL after the fact¶

Every bridge-mediated publish writes one autonomy.decision frame to the WAL with the bridge_origin=governed_ros2_bridge marker. To pull the per-run audit trail:

# Preferred: first-class flag, no jq needed (#939 4-E.a).
autonomy wal inspect --kind autonomy.decision --bridge-only --json

# Equivalent older form (still supported).
autonomy wal inspect --kind autonomy.decision --json \
  | jq 'select(.event.attrs.bridge_origin == "governed_ros2_bridge")'

To distinguish bridge-routed decisions from direct-node POSTs (e.g. a node inside the launched container that calls /v1/tool itself without going through the bridge):

# Bridge-routed (first-class):
autonomy wal inspect --kind autonomy.decision --bridge-only --json \
  | jq '{tool, outcome, attrs: .event.attrs}'

# Direct node-POSTs (no marker — invert via jq; the negative case is
# rarer than the positive case, so it stays in jq).
autonomy wal inspect --kind autonomy.decision --json \
  | jq 'select(.event.attrs.tool == "tool.ros2.topic.publish" and .event.attrs.bridge_origin == null) | {tool, outcome, attrs: .event.attrs}'

The runtime sets bridge_origin only when the inbound POST’s params._bridge_origin field is the canonical sentinel governed_ros2_bridge AND the request kind is on the closed BridgeRoutableKinds set (#941 fix — pre-fix the marker could be spoofed by a node calling tool.echo with the marker in params).

8. Disabling the bridge cleanly¶

Just drop --governed-bridge from the launch. Without that flag the runtime falls back to:

ExecBridge (the runtime is the publisher of every node-level POST), matching the prior AutoRuntime behavior unchanged.
No bridge container is spawned, no ROS_DOMAIN_ID injection, no IPCMode=host on the workload.

Bridge containers that were already running won’t be terminated — see Step 6.

Multi-topic + generic-type interception (#939 4-A)¶

The bridge accepts arbitrary DDS message types on any number of topics in a single process via rclcpp::GenericSubscription + rclcpp::GenericPublisher. Operator configuration:

--bridge-topics (CLI, on both autonomy ros2 run and autonomy run) — comma-separated topic:type pairs OR repeated flags. The runner forwards these to RunOptions.BridgeTopics, which runtime/ros2.defaultStartGovernedBridge sets on BridgeProcess.Topics, which becomes the GOVERNED_BRIDGE_TOPICS env on the bridge container/native binary.
GOVERNED_BRIDGE_TOPICS env (direct, when invoking the bridge binary outside the runner — e.g. via docker run) — same comma-separated topic:type pairs. Each entry creates one subscription on the agent domain + one publisher on the real domain, typed by the operator-supplied type.
GOVERNED_BRIDGE_TOPIC (singular, back-compat) — one topic; the C++ side hard-defaults its type to std_msgs/msg/String (the pre-4-A behavior). Preferred for single-topic legacy wiring; new callers should use GOVERNED_BRIDGE_TOPICS with an explicit type.
Neither set — falls back to /agent_chat + std_msgs/msg/String, the compiled-in default.

Wire format addition. Every bridge-routed POST now carries params.payload_b64 = base64 of the message’s serialized CDR bytes, alongside params.type and the existing params.topic. The params.data field is still emitted for std_msgs/msg/String only (back-compat with the canonical wire-shape contract test); other types ship the bytes via payload_b64 alone. The runtime currently keys policy on topic + kind; field-level typed-policy via rosidl_typesupport_introspection_cpp decoding of payload_b64 lands in a follow-up.

Native + container dual-path is validated. The same C++ source compiles under both apt install ros-humble-ros-base natively and the adk-ros2-runtime docker image build. The bridge can run either way (Go-side: BridgeProcess.Image empty → native, set → container). End-to-end subscribe → POST → republish is smoke-validated on both paths against 3 different types.

Native execution mode (without Docker)¶

Some robots can’t run Docker — resource-constrained edge devices, hardened deployments that ban container runtimes, kernels without the right cgroup/namespace support. For those autonomy ros2 run ships a native dispatch path (--force-native) that runs the ros2 binary as a host subprocess. Pre-#1124 this was a true reduced- governance path: no container isolation AND no message interception.

Issue #1124 closes the message-interception half. The governed_ros2_bridge C++ binary now ships as a first-class native release artifact, so you can run the FULL --governed-bridge chain without Docker — per-message DDS governance is active in native mode now. Container-level workload isolation (namespace / cgroup / seccomp / LD_PRELOAD shim) is still container-only by design (the hardening cluster errors with ErrHardeningRequiresContainer on native because their security boundary depends on the container).

Install the bridge on the host¶

Three supported install paths (any one of them works):

# Option 1 — build from source. Operator runs:
bash scripts/install-governed-bridge.sh        # uses colcon + apt-installed ROS dev pkgs

# Option 2 — bundled subcommand (recommended). ROS distro + arch are
# auto-detected (via $ROS_DISTRO / /opt/ros/*/setup.bash + uname -m):
autonomy ros2 bridge install                   # fetches release tarball, cosign + sha256 verifies, installs
# The AutonomyOps/adk release repo is private, so pass a GitHub token
# with read access to its releases (or export GITHUB_TOKEN / GH_TOKEN):
autonomy ros2 bridge install --token "$GITHUB_TOKEN"

# Option 3 — manual: download the
# governed_ros2_bridge_<version>_<distro>_<arch>.tar.gz tarball from a
# GitHub release, extract under /opt or ~/.local, source install/setup.bash.

Confirm the binary landed:

which governed_ros2_bridge
# /home/<you>/.local/bin/governed_ros2_bridge       (Option 1/2 default)
# or wherever Option 3's setup.bash put it

Invoke (the #1124 repro shape)¶

export AUTONOMY_RUN_WAL_DIR=/var/log/autonomy/wal   # see "Verify the WAL"
autonomy ros2 run \
    --force-native \
    --allow-reduced-governance \
    --governed-bridge \
    --bridge-topics /cmd_vel:geometry_msgs/msg/Twist \
    --agent-domain 99 --real-domain 42 \
    topic pub /cmd_vel geometry_msgs/msg/Twist '{linear: {x: 0.0}}' --once

--allow-reduced-governance is still required because container- level workload isolation is absent (the audit frame tool.ros2.reduced_governance_accepted still fires for the no- container acknowledgment). The bridge layer is on.

What you should see¶

A clean run prints these to stderr / stdout, in this order:

[INFO] ros2: native mode with --governed-bridge — per-message DDS interception is ACTIVE … — the bridge-native notice (#1133 fix18). If you instead see [WARN] REDUCED-GOVERNANCE native mode … no active interception layer, the bridge isn’t actually running (check --governed-bridge flag spelling + that governed_ros2_bridge is on PATH).
PASS ros2-telemetry-active context=native
PASS ros2-stack-start context=native
PASS ros2-policy-enforced context=native
PASS ros2-wal-recording context=native
PASS governed-bridge-active context=native — proves the bridge gate fired; absent if the bridge sub-process died before it subscribed.
Per intercepted publish: governed_ros2_bridge: ALLOW topic=/cmd_vel type=geometry_msgs/msg/Twist len=N (republished on real) — per-message proof the bridge POSTed to /v1/tool, got ALLOW, and republished on the real DDS domain.

Verify the WAL¶

autonomy ros2 run honors AUTONOMY_RUN_WAL_DIR (#1133 fix15) the same way autonomy run does. Without the env var, the WAL goes to an ephemeral /tmp/adk-ros2-wal-* that gets removed on exit — fine for ad-hoc invocations, not OK if you want audit retention.

# After the run completes, inspect the recorded decisions:
autonomy wal inspect --dir "$AUTONOMY_RUN_WAL_DIR" --kind autonomy.decision

You should see one entry per intercepted publish, with kind=tool.ros2.topic.publish and the bridge metadata carrying origin=governed_ros2_bridge. That’s the auditable record of the DDS governance call — if it’s absent but the stdout ALLOW line is present, your bridge is bypassing the runtime (POST failed or the URL was wrong — check the bridge stderr for the rate-limited “POST failed” line).

Native vs container — what governance you get¶

	container path	native + `--governed-bridge`	native (no bridge)
DDS message interception	yes (bridge inside container)	yes (bridge as host process)	no
WAL audit of each publish	yes	yes	no
Namespace + cgroup isolation	yes	no	no
seccomp profile	yes (`--seccomp-profile`)	no (`ErrHardeningRequiresContainer`)	no
LD_PRELOAD shim	yes	no (host-privilege blast radius)	no
`tool.ros2.reduced_governance_accepted` audit frame	not emitted	emitted (no-container acknowledgment)	emitted

The native + bridge path is the right call when the workload sandboxing layer doesn’t matter for your threat model (you trust the workload binary) but you DO want message-layer governance. If the workload itself is untrusted, the container path is still required for the syscall / fs / network isolation.

Common native-path failure modes¶

governed_ros2_bridge: command not found — installer didn’t put it on PATH. Either source install/setup.bash from where you extracted the tarball, or move the binary to /usr/local/bin / ~/.local/bin. Confirm with which governed_ros2_bridge.
ros2: governed bridge exited before signaling ready — the bridge crashed during start (most often: missing libfastrtps.so because ROS 2 isn’t apt-installed). Run governed_ros2_bridge --help manually; if THAT fails the install is incomplete. Fix the ROS install (apt install ros-<distro>-ros-base) and retry.
bridge starts but publishes never appear on real domain — the --agent-domain and --real-domain IDs match (the bridge needs TWO distinct domains to subscribe + republish across them); pick two different unused IDs (any pair in 0–101 works on default FastDDS).
bridge POSTs fail with connection refused — AutoRuntime didn’t come up. Without AutoRuntime the bridge has no decision channel; runtime errors with ErrGovernedBridgeNeedsAutoRuntime before any side effect. Confirm autonomy ros2 run (not autonomy run) — only the former starts the in-process tool server.

Production hardening¶

Use the bridge for the topics you intend to govern, not for all of them. Direct-node POSTs to /v1/tool (no bridge) are also governed and can carry typed envelopes — split your fleet’s topics between the two paths according to typed-policy needs.
Stage the bridge behind a non-default policy bundle pinned via --policy <ref>. The embedded embedded:ros2-bridge-demo policy is for demos.
Layer SROS 2 / DDS-Security as defense-in-depth via --bridge-keystore + --bridge-enclave + --workload-enclave (the three flags are all-or-nothing, enforced before any side effect). Provision the keystore with autonomy ros2 keystore init/mint/permissions. End-to-end procedure + bypass-resistance verification in the SROS 2 runbook and SROS 2 quickstart.
Monitor the bridge container’s stderr for the rate-limited “POST failed” lines (#942 4-E.c — one line per topic per second, not per message). A sustained burst means the runtime listener died or the bridge can’t reach 127.0.0.1; correlate with autonomy wal status.
QoS by direction. The bridge subscribes on the agent domain with BEST_EFFORT and republishes on the real domain with RELIABLE (VOLATILE durability, depth 10). DDS’s Request-vs-Offered rule: a subscriber’s requested reliability must be <= the publisher’s offered reliability, else they discover each other but exchange no data. BEST_EFFORT on the subscription lets the bridge intercept a workload publishing at either reliability (a RELIABLE subscription would silently never see a BEST_EFFORT workload); RELIABLE on the republisher reaches a subscriber requesting either RELIABLE (e.g. Isaac Sim’s Carter on /cmd_vel) or BEST_EFFORT. (rclcpp::QoS(depth) already defaults to RELIABLE, so this only flips the subscription and makes both explicit.)
When you see governed ALLOW lines but the robot doesn’t move. The verdict reached the bridge and it republished locally, but the real subscriber isn’t receiving. Check, in order: (1) the real subscriber’s QoS with ros2 topic info -v /cmd_vel on the real domain — a durability mismatch (e.g. a TRANSIENT_LOCAL/latched subscriber vs the bridge’s VOLATILE) blocks data; (2) that the bridge’s real-domain participant is actually discovered by the real subscriber across the container boundary — this is the likely remaining cause when QoS is compatible and both are on the same domain (tracked in #1230).
DDS interop note. Container-to-container DDS via --network host --ipc host works reliably (it’s what autonomy demo ros2-bridge uses), and a plain ros2 topic pub from another such container reaches a host subscriber. If the bridge’s republisher still isn’t received by a subscriber in a different container / on the host even with compatible QoS and matching domain, it is a FastDDS discovery/transport boundary issue (shared-memory segment visibility, or locator announcement across the container edge): keep the bridge and the real subscriber reachable over the same transport (shared /dev/shm via --ipc host, or a UDP-capable interface), and rule out loopback-only discovery (lo has no MULTICAST by default). This is the open half of #1230.

Reference¶

runtime/ros2/runner.go — RunGoverned, RunOptions.GovernedBridge, and the ErrGovernedBridgeNeedsAutoRuntime/ErrSameDomain/ErrBridgeExitedBeforeReady/ErrRuntimeURLRequired sentinels.
runtime/ros2bridge/bridge_process.go — BridgeProcess.Run, env contract, IPCMode=host/NetworkMode=host wiring.
runtime/ros2bridge/bridge.go — BridgeOriginRouter, BridgeRoutableKinds, the spoof-resistance contract.
demo/ros2-runtime/ros2_ws/src/governed_ros2_bridge/ — C++ rclcpp source.
Tutorial walkthrough: ROS 2 Governed Bridge Quickstart.
Launch-level governance reference: ROS 2 Governance.