ROS 2 SROS 2 / DDS-Security for the Governed Bridge¶

Audience: operators turning on SROS 2 / DDS-Security as a defense-in-depth layer on top of the application-level governed bridge. With the bridge alone, isolation between the agent and real DDS domains relies on a different ROS_DOMAIN_ID — an adversary with code execution on the agent can override rclcpp::InitOptions::set_domain_id() and join the real domain ungoverned. SROS 2 makes that adversary’s traffic invisible even if they pick the right domain ID: each participant must present a per-identity certificate chain to the keystore CA AND a signed permissions document granting the topic they want to publish, both enforced at the DDS layer.

This page tells you how to provision the keystore, attach it to the bridge, recover from common errors, and verify the bypass-resistance claim end-to-end.

Walking through it first? Start at ROS 2 SROS 2 Quickstart; it runs autonomy ros2 keystore init / mint / permissions end-to-end and proves bypass-resistance via the in-tree regression test. The runbook below assumes you already understand the application-layer bridge from its runbook.

Prerequisites¶

ros-humble-ros-base installed on the operator’s host (provides ros2 security create_keystore / create_enclave / create_permission). Keystore provisioning is a host operation — the durable secret material (CA private key, per-node identity keys) lives on the operator’s filesystem, not inside an ephemeral container.
openssl on PATH. Used for the multi-domain governance/permissions re-sign path; ships with every standard Linux install. The in-CI regression test (TestResignPermissions_ProducesVerifiableSMIME) pins that the produced signatures verify against the keystore CA in both directions.
An understanding of the application-layer bridge — what it does, why agent ≠ real, the --governed-bridge flag. SROS 2 is defense-in-depth on the bridge, not a standalone substitute. See ROS 2 Governed Bridge runbook first.

Mental model¶

Layer	Enforcement	What attacks it stops
Bridge (app-level)	Per-message POST to `/v1/tool`; policy decides allow/deny	Benign agent publishes; bad-actor agent that obeys `ROS_DOMAIN_ID` isolation
SROS 2 (this page)	Per-identity cert + signed permissions; rcl + DDS-Security reject mismatches at participant join + datawriter creation	Adversary that escapes `ROS_DOMAIN_ID` isolation via `rclcpp::InitOptions::set_domain_id()`

The two layers compose: the bridge mediates allowed traffic per-message; SROS 2 ensures only credentialed participants can even attempt to publish on the real domain in the first place.

Procedure¶

1. Pick the DDS domains your enclaves will cover¶

SROS 2 has TWO files that hardcode DDS domain IDs (sros2 generates both for domain 0 by default):

governance.xml — keystore-wide; <domain_rule> lists which DDS domain IDs are governable at all. A participant joining a domain not listed here is rejected with Could not find domain X in governance (code: 141).
permissions.xml — per-enclave; <allow_rule>/<domains> lists the domain each grant applies to. A participant on a domain not granted is rejected with Not found a rule allowing to use the domain_id.

autonomy ros2 keystore init --domain rewrites governance.xml + re-signs; autonomy ros2 keystore permissions --domain does the same for permissions (repeatable for multi-domain). For the bridge (which runs on both agent and real domains in one process), use BOTH domains on BOTH files.

2. Provision the keystore¶

# Keystore root — operator filesystem; treat as secret material.
KEYSTORE=/var/lib/autonomyops/ros2-keystore

# Step 1: create the keystore + governance.xml covering both bridge domains.
autonomy ros2 keystore init "$KEYSTORE" --domain 42 --domain 99

# Step 2: mint per-identity enclaves.
#   - bridge runs on both domains under ONE enclave (ROS_SECURITY_ENCLAVE_OVERRIDE)
#   - each workload subprocess gets its OWN enclave (defense-in-depth:
#     a compromised workload can't impersonate the bridge to publish on real)
autonomy ros2 keystore mint --keystore "$KEYSTORE" /governed_ros2_bridge_real
autonomy ros2 keystore mint --keystore "$KEYSTORE" /demo_robot/arm_controller

# Step 3: synthesize permissions XML per enclave.
#   - bridge enclave: BOTH domains (covers both rclcpp::Context), both
#     directions on every workload topic (bridge subs on agent, pubs on real)
autonomy ros2 keystore permissions /governed_ros2_bridge_real \
    --keystore "$KEYSTORE" \
    --domain 42 --domain 99 \
    --publish  /cmd_vel,/cmd_vel/* \
    --subscribe /cmd_vel,/cmd_vel/*

#   - workload enclave: ONE domain (agent), publishes only what the workload
#     legitimately produces; subscribes to what the bridge republishes back
autonomy ros2 keystore permissions /demo_robot/arm_controller \
    --keystore "$KEYSTORE" \
    --domain 99 \
    --publish  /cmd_vel \
    --subscribe /cmd_vel

Preferred: --from-bundle (#938 3-C.1) — instead of re-typing --publish/--subscribe lists that have to stay in sync with the bundle’s Rego rules, point at the bundle and let the command read the ros2_topics:{publish,subscribe} block out of its manifest.json (schema v1.4+). One source of truth — when the bundle changes its declared surface, the permissions follow without operator edits:
# Bridge enclave covering both domains, topic list resolved from the
# demo bundle's manifest.
autonomy ros2 keystore permissions /governed_ros2_bridge_real \
    --keystore "$KEYSTORE" \
    --domain 42 --domain 99 \
    --from-bundle demo/bundles/ros2-bridge.tar
--from-bundle is mutually exclusive with --publish/--subscribe — mixing would silently widen the bundle’s declared surface, defeating the point of using the bundle as source of truth. Accepts either a .tar bundle (autonomy bundle pull output) or a directory containing manifest.json (the demo-bundle shape). Bundles minted at schema_version < 1.4 don’t carry the block — fall back to explicit --publish/--subscribe until the manifest is bumped.

Layout under $KEYSTORE after all three steps:

$KEYSTORE/
├── public/          ← CA cert (operator-readable, fleet-distributable)
│   ├── ca.cert.pem                ← the single self-signed CA cert
│   ├── identity_ca.cert.pem       → symlink to ca.cert.pem
│   └── permissions_ca.cert.pem    → symlink to ca.cert.pem
├── private/         ← CA key (treat as secret material)
│   ├── ca.key.pem                 ← the single self-signed CA key
│   ├── identity_ca.key.pem        → symlink to ca.key.pem
│   └── permissions_ca.key.pem     → symlink to ca.key.pem
└── enclaves/
    ├── governance.xml              ← rewritten to allow domains 42 + 99
    ├── governance.p7s              ← re-signed via openssl smime
    ├── governed_ros2_bridge_real/
    │   ├── cert.pem                ← bridge identity cert
    │   ├── key.pem                 ← bridge identity key
    │   ├── permissions.xml         ← topics × both domains
    │   └── permissions.p7s         ← signed via openssl smime
    └── demo_robot/arm_controller/
        ├── cert.pem
        ├── key.pem
        ├── permissions.xml         ← topics × one domain
        └── permissions.p7s

One CA, two roles. autonomy ros2 keystore init mints a single self-signed CA (ca.cert.pem / ca.key.pem) and exposes it under both the identity_ca.* and permissions_ca.* names via symlinks — matching sros2’s own create_keystore layout. Both the identity certs (per-enclave cert.pem) and the permissions signatures (permissions.p7s) chain to that one CA. Point verification/re-signing tooling at either name; they resolve to the same key material.

2a. Re-running, verifying, and re-provisioning (#1241)¶

The keystore lifecycle is repeatable:

init is idempotent. Re-running autonomy ros2 keystore init "$KEYSTORE" --domain … on an existing keystore reuses it — the CA is left untouched and only the --domain governance is re-applied. mint and permissions are likewise safe to re-run. So a rebuild/CI flow no longer needs to rm -rf the keystore first.
Verify completeness before launch. autonomy ros2 keystore verify --keystore "$KEYSTORE" reports, per component, whether the keystore is complete: CA, governance, and every enclave’s cert.pem / key.pem / permissions.xml / permissions.p7s. It also catches the most common trap — an enclave minted but whose permissions step never ran: a fresh enclave carries only the sros2 default permissions (DDS domain 0), so on a keystore whose governance covers 42/99 it fails closed. verify flags that as INCOMPLETE — permissions don't cover domain(s) [42 99] and exits non-zero, so it fits a pre-launch gate:
```
autonomy ros2 keystore verify --keystore "$KEYSTORE" || exit 1
```
Re-provisioning the CA is a FLEET-WIDE RESTART, not a live refresh. init --force regenerates the CA + all identity material. Every participant still running against the OLD keystore will FAIL the DDS-Security handshake against the new material (old and new certs don’t chain to the same CA), silently splitting the graph. Only use --force when you can restart every participant in the graph. To add an enclave or update a topic surface, do NOT --force — just mint / permissions again (idempotent, CA preserved).

2b. Provision a whole graph from a manifest + distribute per-node views (#1244)¶

Steps 1–2 above are per-enclave hand-scripting. For a multi-node graph — and especially for real key isolation — declare the whole graph in one manifest and let provision mint, grant, verify, and write a minimal per-node view. This matters because a single shared keystore lets any node read every node’s private key; each node should get only its own material.

# graph.yaml
keystore: /var/lib/autonomyops/ros2-keystore
domains: [42, 99]
enclaves:
  - name: /governed_ros2_bridge_real
    domains: [42, 99]
    publish:   [/cmd_vel, "/cmd_vel/*"]
    subscribe: [/cmd_vel, "/cmd_vel/*"]
  - name: /perception
    domains: [99]
    from_bundle: demo/bundles/perception.tar   # topic surface from the bundle
views:
  - node: perception-container
    enclave: /perception
    out: dist/views/perception

autonomy ros2 keystore provision --manifest graph.yaml

provision inits the keystore (idempotent), mints + permissions each enclave (reusing those commands, from_bundle included), verifies the master, then writes each declared view. A view under out: contains the public CA + governance + that one enclave — and deliberately NOT the CA private key or any other enclave — so the /cmd_vel-capable bridge key never lands in the perception container. Re-running is idempotent (already-minted enclaves are reused); --dry-run prints the plan without touching disk. Each written view is re-verified before it’s declared done.

Distribute dist/views/perception/ to the perception node and point its ROS_SECURITY_KEYSTORE at it (or use autonomy ros2 secure-env / secure-run against the view) — the node joins the graph with only its own key.

3. Wire the keystore into the bridge launch¶

Three new flags on both autonomy ros2 run (paid) and autonomy run (CE):

autonomy run \
    --image ghcr.io/autonomyops/adk-ros2-runtime:latest \
    --governed-bridge \
    --agent-domain 99 --real-domain 42 \
    --bridge-topics '/cmd_vel:std_msgs/msg/String' \
    --bridge-keystore /var/lib/autonomyops/ros2-keystore \
    --bridge-enclave  /governed_ros2_bridge_real \
    --workload-enclave /demo_robot/arm_controller \
    ros2.launch launch demo_robot arm_demo.launch.py

What happens, in order:

Pre-flight: RunGoverned validates the SROS 2 triple — all three of --bridge-keystore, --bridge-enclave, --workload-enclave must be set together (else ErrSecurityIncomplete) AND --governed-bridge must be true (else ErrSecurityNeedsGovernedBridge). Failures here surface BEFORE the tool server starts, before the bridge spawns, before the workload dispatches — fail-closed up front.
The runtime binds /v1/tool to a random 127.0.0.1:<port> (same as the non-SROS-2 bridge flow).
The bridge container is spawned with:
- ROS_SECURITY_KEYSTORE=<keystore>
- ROS_SECURITY_ENABLE=true
- ROS_SECURITY_STRATEGY=Enforce (NOT Permissive — Permissive would log+allow an unenrolled participant, defeating defense-in-depth)
- ROS_SECURITY_ENCLAVE_OVERRIDE=/governed_ros2_bridge_real
- Keystore bind-mounted read-only at the same host path so the in-container ROS_SECURITY_KEYSTORE value resolves
The launched workload subprocess gets the same env shape, but with ROS_SECURITY_ENCLAVE_OVERRIDE=/demo_robot/arm_controller — intentionally separate identity so a compromised workload can’t impersonate the bridge.
Workload container also gets the keystore bind-mounted read-only.

3a. Large image topics: `--large-data` (perception robots)¶

Under Enforce, DDS-Security encrypts every fragment, and the default FastDDS SHM segment + socket buffers are too small to hold multi-MB image frames — so camera topics (RGB ~6 MB, depth ~8 MB) are dropped almost entirely (~0.1 fps), while small topics (/cmd_vel, /tf) are fine. The secure path looks frozen for perception robots out of the box.

Add --large-data to point the bridge and the launched workload at a tuned FastDDS transport profile (64 MB SHM segment, 16 MB socket buffers, ASYNCHRONOUS publish). In HIL this restored a secured 1080p rgb8@30Hz stream from 0.1 → 22.8 fps (228×):

autonomy ros2 run \
    --image ghcr.io/autonomyops/adk-ros2-runtime:latest \
    --governed-bridge --large-data \
    --agent-domain 99 --real-domain 42 \
    --bridge-topics '/camera/image:sensor_msgs/msg/Image' \
    --bridge-keystore /var/lib/autonomyops/ros2-keystore \
    --bridge-enclave /governed_ros2_bridge_real \
    --workload-enclave /demo_robot/perception \
    launch demo_robot perception.launch.py

--large-data sets FASTRTPS_DEFAULT_PROFILES_FILE to /opt/autonomyops/fastdds-large-data.xml, which is baked into the adk-ros2-runtime image. Both ends need the profile — the flag applies it to the bridge and the workload together; a peer outside the launch (e.g. an Isaac Sim publisher) must point at the same profile via FASTRTPS_DEFAULT_PROFILES_FILE. On the native path (--force-native), install the profile at that path (it ships at demo/ros2-runtime/fastdds-large-data.xml) or set the env var yourself.

4. Confirm the loop is closed¶

In a second terminal, watch the bridge container’s stderr:

docker logs -f $(docker ps -q --filter ancestor=ghcr.io/autonomyops/adk-ros2-runtime:latest)

You should see:

[INFO] [rcl]: Found security directory: /var/lib/autonomyops/ros2-keystore/enclaves/governed_ros2_bridge_real
[INFO] [rcl]: Found security directory: /var/lib/autonomyops/ros2-keystore/enclaves/governed_ros2_bridge_real
governed_ros2_bridge: ready  agent_domain=99  real_domain=42  topics=/cmd_vel:std_msgs/msg/String  runtime_url=http://127.0.0.1:<port>

Two “Found security directory” lines (one per rclcpp::Context: agent + real) are normal. The ready line means both contexts successfully created their participants AND created their pub/sub on /cmd_vel AFTER passing the DDS-Security validation gates. The application-layer /v1/tool POST loop continues as before — SROS 2 doesn’t change the message flow, only the participant admission.

5. Diagnose silent failures up front: `ros2 doctor` + run preflight (#1242)¶

The hardest secured-graph failures are silent: participants discover each other but deliver zero messages, with no error anywhere. The classic trap is FastDDS version skew — e.g. Isaac Sim’s bundled ROS 2 ships FastDDS 2.6.10 while the host stack is 2.6.11; under Enforce the DDS-Security handshake never completes across that boundary, so traffic never reaches the secured graph and nothing logs an error. (The fix there is ISAAC_USE_SYSTEM_ROS=1 to point Isaac at the host’s ROS 2.)

Run the doctor before bring-up to catch these as actionable lines instead of a frozen demo:

autonomy ros2 doctor --keystore /var/lib/autonomyops/ros2-keystore --large-data

It checks, from pure filesystem + env inspection (no ros2/docker needed):

SROS 2 security env — ROS_SECURITY_ENABLE/STRATEGY/KEYSTORE are coherent (Enforce + a keystore, not Permissive or keystore-less).
SROS 2 keystore — the keystore is structurally complete + signed (same check as autonomy ros2 keystore verify).
RMW implementation — RMW_IMPLEMENTATION is rmw_fastrtps_cpp (DDS-Security in the governed stack is FastDDS-specific).
FastDDS version — reports the locally-sourced FastDDS version so you can match it across every participant; a peer on a different version silently fails the Enforce handshake.
Large-data transport — the large-data FastDDS profile is installed so encrypted image topics aren’t silently dropped (see 3a).

autonomy ros2 doctor --output json emits a stable schema for CI gating; the command exits non-zero if any check FAILs (SKIPs for unconfigured checks don’t count).

The same checks run automatically as a preflight before every autonomy ros2 run --governed-bridge dispatch: advisory problems (version-skew heads-up, RMW, large-data profile) print as preflight WARN lines, and a demonstrably incomplete keystore fails the run closed — rather than starting a graph that can only deliver zero messages. Pass --skip-preflight to dispatch anyway.

6. Launch individual nodes under an enclave: `secure-env` / `secure-run` (#1245)¶

Every node in the secured graph needs the same four env vars hand-injected to join the SROS 2 partition — ROS_SECURITY_ENABLE / _STRATEGY / _KEYSTORE / _ENCLAVE_OVERRIDE — plus, for perception nodes, the large-data profile. Doing that by hand per node is error-prone: an eval-timing slip can bring a node up silently unsecured (Enforce never applied, no error). Two commands make it a one-liner and fail closed on that trap — they verify the keystore + enclave are complete BEFORE emitting env or launching, so a node never starts against a keystore that would leave it unsecured.

secure-env prints the env block (for env $(...), eval "$(... -o shell)", or docker run -e ...):

env $(autonomy ros2 secure-env \
    --keystore /var/lib/autonomyops/ros2-keystore \
    --enclave /demo_robot/arm_controller) \
  ros2 run demo_nodes_cpp talker

secure-run is the thin wrapper — it verifies, injects the env, then execs the command (separate it with -- so the command keeps its own flags):

autonomy ros2 secure-run \
    --keystore /var/lib/autonomyops/ros2-keystore \
    --enclave /demo_robot/arm_controller --large-data \
    -- ros2 run demo_nodes_cpp talker

Both refuse to proceed if the keystore is incomplete or the named enclave is missing / not fully provisioned — the “is this node actually secured?” question answered before launch, not via tr '\0' '\n' < /proc/<pid>/environ. secure-run runs the command natively (the subprocess inherits the injected env); compose it with autonomy ros2 run --governed-bridge for per-message governance.

Troubleshooting¶

`Could not find domain X in governance (code: 141)`¶

The bridge’s participant is trying to join a DDS domain that isn’t listed in <keystore>/enclaves/governance.xml. sros2’s default create_keystore only covers domain 0. You probably ran autonomy ros2 keystore init without --domain flags.

Fix: re-init the keystore covering both domains:

autonomy ros2 keystore init "$KEYSTORE" --domain 42 --domain 99

(This is idempotent on the CA material — it only rewrites the governance.xml + re-signs the governance.p7s.)

`Not found a rule allowing to use the domain_id`¶

The participant is on a domain governance allows, but the enclave’s permissions.xml doesn’t grant access on that domain. Re-run autonomy ros2 keystore permissions with both --domain flags for any enclave the bridge uses across multiple domains:

autonomy ros2 keystore permissions /governed_ros2_bridge_real \
    --keystore "$KEYSTORE" \
    --domain 42 --domain 99 \
    --publish  /cmd_vel,/cmd_vel/* \
    --subscribe /cmd_vel,/cmd_vel/*

`rt/<topic> topic not found in allow rule (check_create_datawriter)`¶

The enclave has permissions for some topics but not the one the bridge tries to publish/subscribe on. SROS 2 mangles ROS topic names: /cmd_vel becomes rt/cmd_vel on the wire. Verify your --publish / --subscribe lists cover BOTH directions the bridge needs (it subscribes on agent + publishes on real). Re-run autonomy ros2 keystore permissions with the missing topic.

`participant denied by default rule (code: 145)`¶

The participant’s identity certificate’s subject_name doesn’t match any <grant> in the permissions.xml. Almost always a stale or mismatched ROS_SECURITY_ENCLAVE_OVERRIDE env vs the enclave the operator minted. Check that the enclave name passed to --bridge-enclave exactly matches the name passed to autonomy ros2 keystore mint.

`ErrSecurityIncomplete` from RunGoverned¶

Half-configured security flags. The SROS 2 triple (--bridge-keystore + --bridge-enclave + --workload-enclave) is all-or-nothing — any partial set is rejected before any side effect. Either pass all three or none.

`ErrSecurityNeedsGovernedBridge` from RunGoverned¶

You passed SROS 2 flags without --governed-bridge. SROS 2 is defense-in-depth on the bridge, not a standalone substitute. Wiring DDS-Security on the workload subprocess without a bridge actually mediating publishes would govern nothing — the runner refuses this combination explicitly.

Subscriber side: `xmlrpc.client.Fault: !rclpy.ok()`¶

ros2 topic echo / ros2 node list use rclpy (Python) which has a known separate SROS 2 integration issue on Humble that doesn’t affect the bridge’s C++ (rclcpp) path. The bridge itself works; only the Python CLI tooling is broken. For verification, use a C++ subscriber or check the bridge’s log lines directly.

Verifying bypass-resistance¶

The in-tree regression test TestBypassResistance_RogueCannotPublishToSecuredSubscriber proves the load-bearing claim end-to-end on the host:

# from the repo root
source /opt/ros/humble/setup.bash
go test ./cmd/autonomy/commands/... \
    -run TestBypassResistance_RogueCannotPublishToSecuredSubscriber -v

What it does:

Provisions a fresh keystore via the production autonomy ros2 keystore CLI (no shortcuts).
Starts a CREDENTIALED publisher (positive control) and asserts the secured subscriber DOES receive its message within 5s — proves the secured side is a functioning DDS-Security participant, not a dead process. Without this gate, “no rogue data” could mean “no participant”.
Starts an UNCREDENTIALED (rogue) publisher on the same domain WITHOUT any ROS_SECURITY_* env. Asserts the secured subscriber does NOT receive its message — proves DDS-Security rejected the rogue at the discovery / participant-match layer.
Three liveness gates (signal(0) probes) at each phase boundary ensure the subscriber stays alive through both phases. Any failure dumps the sub’s full log so an operator can debug WHY it died.

Test passes on hosts with ros-humble-ros-base installed; skips cleanly on CI runners without ROS 2.

Reference¶

cmd/autonomy/commands/ros2_keystore.go — init + mint CLIs + the rewriteGovernanceDomains helper.
cmd/autonomy/commands/ros2_keystore_permissions.go — permissions CLI, multi-domain merge, openssl re-sign.
runtime/ros2bridge/bridge_process.go — BridgeProcess.Keystore + EnclaveName fields, ROS_SECURITY_* env injection, keystore bind-mount.
runtime/ros2/runner.go — RunOptions.BridgeKeystore / BridgeEnclave / WorkloadEnclave fields and the all-or-nothing-plus-bridge-required gates.
SROS 2 Quickstart — walkthrough of the provisioning + launch flow with the in-tree regression test.
ROS 2 Governed Bridge runbook — the application-layer bridge SROS 2 layers on top of.

ROS 2 SROS 2 / DDS-Security for the Governed Bridge¶

Prerequisites¶

Mental model¶

Procedure¶

1. Pick the DDS domains your enclaves will cover¶

2. Provision the keystore¶

2a. Re-running, verifying, and re-provisioning (#1241)¶

2b. Provision a whole graph from a manifest + distribute per-node views (#1244)¶

3. Wire the keystore into the bridge launch¶

3a. Large image topics: --large-data (perception robots)¶

4. Confirm the loop is closed¶

5. Diagnose silent failures up front: ros2 doctor + run preflight (#1242)¶

6. Launch individual nodes under an enclave: secure-env / secure-run (#1245)¶

Troubleshooting¶

Could not find domain X in governance (code: 141)¶

Not found a rule allowing to use the domain_id¶

rt/<topic> topic not found in allow rule (check_create_datawriter)¶

participant denied by default rule (code: 145)¶

ErrSecurityIncomplete from RunGoverned¶

ErrSecurityNeedsGovernedBridge from RunGoverned¶

Subscriber side: xmlrpc.client.Fault: !rclpy.ok()¶