ROS 2 SROS 2 / DDS-Security for the Governed Bridge

Audience: operators turning on SROS 2 / DDS-Security as a defense-in-depth layer on top of the application-level governed bridge. With the bridge alone, isolation between the agent and real DDS domains relies on a different ROS_DOMAIN_ID — an adversary with code execution on the agent can override rclcpp::InitOptions::set_domain_id() and join the real domain ungoverned. SROS 2 makes that adversary’s traffic invisible even if they pick the right domain ID: each participant must present a per-identity certificate chain to the keystore CA AND a signed permissions document granting the topic they want to publish, both enforced at the DDS layer.

This page tells you how to provision the keystore, attach it to the bridge, recover from common errors, and verify the bypass-resistance claim end-to-end.

Walking through it first? Start at ROS 2 SROS 2 Quickstart; it runs autonomy ros2 keystore init / mint / permissions end-to-end and proves bypass-resistance via the in-tree regression test. The runbook below assumes you already understand the application-layer bridge from its runbook.

Prerequisites

  • ros-humble-ros-base installed on the operator’s host (provides ros2 security create_keystore / create_enclave / create_permission). Keystore provisioning is a host operation — the durable secret material (CA private key, per-node identity keys) lives on the operator’s filesystem, not inside an ephemeral container.

  • openssl on PATH. Used for the multi-domain governance/permissions re-sign path; ships with every standard Linux install. The in-CI regression test (TestResignPermissions_ProducesVerifiableSMIME) pins that the produced signatures verify against the keystore CA in both directions.

  • An understanding of the application-layer bridge — what it does, why agent ≠ real, the --governed-bridge flag. SROS 2 is defense-in-depth on the bridge, not a standalone substitute. See ROS 2 Governed Bridge runbook first.

Mental model

Layer

Enforcement

What attacks it stops

Bridge (app-level)

Per-message POST to /v1/tool; policy decides allow/deny

Benign agent publishes; bad-actor agent that obeys ROS_DOMAIN_ID isolation

SROS 2 (this page)

Per-identity cert + signed permissions; rcl + DDS-Security reject mismatches at participant join + datawriter creation

Adversary that escapes ROS_DOMAIN_ID isolation via rclcpp::InitOptions::set_domain_id()

The two layers compose: the bridge mediates allowed traffic per-message; SROS 2 ensures only credentialed participants can even attempt to publish on the real domain in the first place.

Procedure

1. Pick the DDS domains your enclaves will cover

SROS 2 has TWO files that hardcode DDS domain IDs (sros2 generates both for domain 0 by default):

  • governance.xml — keystore-wide; <domain_rule> lists which DDS domain IDs are governable at all. A participant joining a domain not listed here is rejected with Could not find domain X in governance (code: 141).

  • permissions.xml — per-enclave; <allow_rule>/<domains> lists the domain each grant applies to. A participant on a domain not granted is rejected with Not found a rule allowing to use the domain_id.

autonomy ros2 keystore init --domain rewrites governance.xml + re-signs; autonomy ros2 keystore permissions --domain does the same for permissions (repeatable for multi-domain). For the bridge (which runs on both agent and real domains in one process), use BOTH domains on BOTH files.

2. Provision the keystore

# Keystore root — operator filesystem; treat as secret material.
KEYSTORE=/var/lib/autonomyops/ros2-keystore

# Step 1: create the keystore + governance.xml covering both bridge domains.
autonomy ros2 keystore init "$KEYSTORE" --domain 42 --domain 99

# Step 2: mint per-identity enclaves.
#   - bridge runs on both domains under ONE enclave (ROS_SECURITY_ENCLAVE_OVERRIDE)
#   - each workload subprocess gets its OWN enclave (defense-in-depth:
#     a compromised workload can't impersonate the bridge to publish on real)
autonomy ros2 keystore mint --keystore "$KEYSTORE" /governed_ros2_bridge_real
autonomy ros2 keystore mint --keystore "$KEYSTORE" /demo_robot/arm_controller

# Step 3: synthesize permissions XML per enclave.
#   - bridge enclave: BOTH domains (covers both rclcpp::Context), both
#     directions on every workload topic (bridge subs on agent, pubs on real)
autonomy ros2 keystore permissions /governed_ros2_bridge_real \
    --keystore "$KEYSTORE" \
    --domain 42 --domain 99 \
    --publish  /cmd_vel,/cmd_vel/* \
    --subscribe /cmd_vel,/cmd_vel/*

#   - workload enclave: ONE domain (agent), publishes only what the workload
#     legitimately produces; subscribes to what the bridge republishes back
autonomy ros2 keystore permissions /demo_robot/arm_controller \
    --keystore "$KEYSTORE" \
    --domain 99 \
    --publish  /cmd_vel \
    --subscribe /cmd_vel

Preferred: --from-bundle (#938 3-C.1) — instead of re-typing --publish/--subscribe lists that have to stay in sync with the bundle’s Rego rules, point at the bundle and let the command read the ros2_topics:{publish,subscribe} block out of its manifest.json (schema v1.4+). One source of truth — when the bundle changes its declared surface, the permissions follow without operator edits:

# Bridge enclave covering both domains, topic list resolved from the
# demo bundle's manifest.
autonomy ros2 keystore permissions /governed_ros2_bridge_real \
    --keystore "$KEYSTORE" \
    --domain 42 --domain 99 \
    --from-bundle demo/bundles/ros2-bridge.tar

--from-bundle is mutually exclusive with --publish/--subscribe — mixing would silently widen the bundle’s declared surface, defeating the point of using the bundle as source of truth. Accepts either a .tar bundle (autonomy bundle pull output) or a directory containing manifest.json (the demo-bundle shape). Bundles minted at schema_version < 1.4 don’t carry the block — fall back to explicit --publish/--subscribe until the manifest is bumped.

Layout under $KEYSTORE after all three steps:

$KEYSTORE/
├── public/          ← CA certs (operator-readable, fleet-distributable)
│   ├── identity_ca.cert.pem
│   └── permissions_ca.cert.pem
├── private/         ← CA keys (treat as secret material)
│   ├── identity_ca.key.pem
│   └── permissions_ca.key.pem
└── enclaves/
    ├── governance.xml              ← rewritten to allow domains 42 + 99
    ├── governance.p7s              ← re-signed via openssl smime
    ├── governed_ros2_bridge_real/
    │   ├── cert.pem                ← bridge identity cert
    │   ├── key.pem                 ← bridge identity key
    │   ├── permissions.xml         ← topics × both domains
    │   └── permissions.p7s         ← signed via openssl smime
    └── demo_robot/arm_controller/
        ├── cert.pem
        ├── key.pem
        ├── permissions.xml         ← topics × one domain
        └── permissions.p7s

3. Wire the keystore into the bridge launch

Three new flags on both autonomy ros2 run (paid) and autonomy run (CE):

autonomy run \
    --image ghcr.io/autonomyops/adk-ros2-runtime:latest \
    --governed-bridge \
    --agent-domain 99 --real-domain 42 \
    --bridge-topics '/cmd_vel:std_msgs/msg/String' \
    --bridge-keystore /var/lib/autonomyops/ros2-keystore \
    --bridge-enclave  /governed_ros2_bridge_real \
    --workload-enclave /demo_robot/arm_controller \
    ros2.launch launch demo_robot arm_demo.launch.py

What happens, in order:

  1. Pre-flight: RunGoverned validates the SROS 2 triple — all three of --bridge-keystore, --bridge-enclave, --workload-enclave must be set together (else ErrSecurityIncomplete) AND --governed-bridge must be true (else ErrSecurityNeedsGovernedBridge). Failures here surface BEFORE the tool server starts, before the bridge spawns, before the workload dispatches — fail-closed up front.

  2. The runtime binds /v1/tool to a random 127.0.0.1:<port> (same as the non-SROS-2 bridge flow).

  3. The bridge container is spawned with:

    • ROS_SECURITY_KEYSTORE=<keystore>

    • ROS_SECURITY_ENABLE=true

    • ROS_SECURITY_STRATEGY=Enforce (NOT Permissive — Permissive would log+allow an unenrolled participant, defeating defense-in-depth)

    • ROS_SECURITY_ENCLAVE_OVERRIDE=/governed_ros2_bridge_real

    • Keystore bind-mounted read-only at the same host path so the in-container ROS_SECURITY_KEYSTORE value resolves

  4. The launched workload subprocess gets the same env shape, but with ROS_SECURITY_ENCLAVE_OVERRIDE=/demo_robot/arm_controller — intentionally separate identity so a compromised workload can’t impersonate the bridge.

  5. Workload container also gets the keystore bind-mounted read-only.

4. Confirm the loop is closed

In a second terminal, watch the bridge container’s stderr:

docker logs -f $(docker ps -q --filter ancestor=ghcr.io/autonomyops/adk-ros2-runtime:latest)

You should see:

[INFO] [rcl]: Found security directory: /var/lib/autonomyops/ros2-keystore/enclaves/governed_ros2_bridge_real
[INFO] [rcl]: Found security directory: /var/lib/autonomyops/ros2-keystore/enclaves/governed_ros2_bridge_real
governed_ros2_bridge: ready  agent_domain=99  real_domain=42  topics=/cmd_vel:std_msgs/msg/String  runtime_url=http://127.0.0.1:<port>

Two “Found security directory” lines (one per rclcpp::Context: agent + real) are normal. The ready line means both contexts successfully created their participants AND created their pub/sub on /cmd_vel AFTER passing the DDS-Security validation gates. The application-layer /v1/tool POST loop continues as before — SROS 2 doesn’t change the message flow, only the participant admission.

Troubleshooting

Could not find domain X in governance (code: 141)

The bridge’s participant is trying to join a DDS domain that isn’t listed in <keystore>/enclaves/governance.xml. sros2’s default create_keystore only covers domain 0. You probably ran autonomy ros2 keystore init without --domain flags.

Fix: re-init the keystore covering both domains:

autonomy ros2 keystore init "$KEYSTORE" --domain 42 --domain 99

(This is idempotent on the CA material — it only rewrites the governance.xml + re-signs the governance.p7s.)

Not found a rule allowing to use the domain_id

The participant is on a domain governance allows, but the enclave’s permissions.xml doesn’t grant access on that domain. Re-run autonomy ros2 keystore permissions with both --domain flags for any enclave the bridge uses across multiple domains:

autonomy ros2 keystore permissions /governed_ros2_bridge_real \
    --keystore "$KEYSTORE" \
    --domain 42 --domain 99 \
    --publish  /cmd_vel,/cmd_vel/* \
    --subscribe /cmd_vel,/cmd_vel/*

rt/<topic> topic not found in allow rule (check_create_datawriter)

The enclave has permissions for some topics but not the one the bridge tries to publish/subscribe on. SROS 2 mangles ROS topic names: /cmd_vel becomes rt/cmd_vel on the wire. Verify your --publish / --subscribe lists cover BOTH directions the bridge needs (it subscribes on agent + publishes on real). Re-run autonomy ros2 keystore permissions with the missing topic.

participant denied by default rule (code: 145)

The participant’s identity certificate’s subject_name doesn’t match any <grant> in the permissions.xml. Almost always a stale or mismatched ROS_SECURITY_ENCLAVE_OVERRIDE env vs the enclave the operator minted. Check that the enclave name passed to --bridge-enclave exactly matches the name passed to autonomy ros2 keystore mint.

ErrSecurityIncomplete from RunGoverned

Half-configured security flags. The SROS 2 triple (--bridge-keystore + --bridge-enclave + --workload-enclave) is all-or-nothing — any partial set is rejected before any side effect. Either pass all three or none.

ErrSecurityNeedsGovernedBridge from RunGoverned

You passed SROS 2 flags without --governed-bridge. SROS 2 is defense-in-depth on the bridge, not a standalone substitute. Wiring DDS-Security on the workload subprocess without a bridge actually mediating publishes would govern nothing — the runner refuses this combination explicitly.

Subscriber side: xmlrpc.client.Fault: !rclpy.ok()

ros2 topic echo / ros2 node list use rclpy (Python) which has a known separate SROS 2 integration issue on Humble that doesn’t affect the bridge’s C++ (rclcpp) path. The bridge itself works; only the Python CLI tooling is broken. For verification, use a C++ subscriber or check the bridge’s log lines directly.

Verifying bypass-resistance

The in-tree regression test TestBypassResistance_RogueCannotPublishToSecuredSubscriber proves the load-bearing claim end-to-end on the host:

# from the repo root
source /opt/ros/humble/setup.bash
go test ./cmd/autonomy/commands/... \
    -run TestBypassResistance_RogueCannotPublishToSecuredSubscriber -v

What it does:

  1. Provisions a fresh keystore via the production autonomy ros2 keystore CLI (no shortcuts).

  2. Starts a CREDENTIALED publisher (positive control) and asserts the secured subscriber DOES receive its message within 5s — proves the secured side is a functioning DDS-Security participant, not a dead process. Without this gate, “no rogue data” could mean “no participant”.

  3. Starts an UNCREDENTIALED (rogue) publisher on the same domain WITHOUT any ROS_SECURITY_* env. Asserts the secured subscriber does NOT receive its message — proves DDS-Security rejected the rogue at the discovery / participant-match layer.

  4. Three liveness gates (signal(0) probes) at each phase boundary ensure the subscriber stays alive through both phases. Any failure dumps the sub’s full log so an operator can debug WHY it died.

Test passes on hosts with ros-humble-ros-base installed; skips cleanly on CI runners without ROS 2.

Reference