ROS 2 SROS 2 / DDS-Security for the Governed Bridge¶
Audience: operators turning on SROS 2 / DDS-Security as a defense-in-depth
layer on top of the application-level governed
bridge. With the bridge alone, isolation between the
agent and real DDS domains relies on a different ROS_DOMAIN_ID — an
adversary with code execution on the agent can override
rclcpp::InitOptions::set_domain_id() and join the real domain ungoverned.
SROS 2 makes that adversary’s traffic invisible even if they pick the right
domain ID: each participant must present a per-identity certificate chain to
the keystore CA AND a signed permissions document granting the topic they
want to publish, both enforced at the DDS layer.
This page tells you how to provision the keystore, attach it to the bridge, recover from common errors, and verify the bypass-resistance claim end-to-end.
Walking through it first? Start at ROS 2 SROS 2 Quickstart; it runs
autonomy ros2 keystore init / mint / permissionsend-to-end and proves bypass-resistance via the in-tree regression test. The runbook below assumes you already understand the application-layer bridge from its runbook.
Prerequisites¶
ros-humble-ros-baseinstalled on the operator’s host (providesros2 security create_keystore/create_enclave/create_permission). Keystore provisioning is a host operation — the durable secret material (CA private key, per-node identity keys) lives on the operator’s filesystem, not inside an ephemeral container.opensslonPATH. Used for the multi-domain governance/permissions re-sign path; ships with every standard Linux install. The in-CI regression test (TestResignPermissions_ProducesVerifiableSMIME) pins that the produced signatures verify against the keystore CA in both directions.An understanding of the application-layer bridge — what it does, why agent ≠ real, the
--governed-bridgeflag. SROS 2 is defense-in-depth on the bridge, not a standalone substitute. See ROS 2 Governed Bridge runbook first.
Mental model¶
Layer |
Enforcement |
What attacks it stops |
|---|---|---|
Bridge (app-level) |
Per-message POST to |
Benign agent publishes; bad-actor agent that obeys |
SROS 2 (this page) |
Per-identity cert + signed permissions; rcl + DDS-Security reject mismatches at participant join + datawriter creation |
Adversary that escapes |
The two layers compose: the bridge mediates allowed traffic per-message; SROS 2 ensures only credentialed participants can even attempt to publish on the real domain in the first place.
Procedure¶
1. Pick the DDS domains your enclaves will cover¶
SROS 2 has TWO files that hardcode DDS domain IDs (sros2 generates both for domain 0 by default):
governance.xml— keystore-wide;<domain_rule>lists which DDS domain IDs are governable at all. A participant joining a domain not listed here is rejected withCould not find domain X in governance (code: 141).permissions.xml— per-enclave;<allow_rule>/<domains>lists the domain each grant applies to. A participant on a domain not granted is rejected withNot found a rule allowing to use the domain_id.
autonomy ros2 keystore init --domain rewrites governance.xml + re-signs;
autonomy ros2 keystore permissions --domain does the same for permissions
(repeatable for multi-domain). For the bridge (which runs on both agent and
real domains in one process), use BOTH domains on BOTH files.
2. Provision the keystore¶
# Keystore root — operator filesystem; treat as secret material.
KEYSTORE=/var/lib/autonomyops/ros2-keystore
# Step 1: create the keystore + governance.xml covering both bridge domains.
autonomy ros2 keystore init "$KEYSTORE" --domain 42 --domain 99
# Step 2: mint per-identity enclaves.
# - bridge runs on both domains under ONE enclave (ROS_SECURITY_ENCLAVE_OVERRIDE)
# - each workload subprocess gets its OWN enclave (defense-in-depth:
# a compromised workload can't impersonate the bridge to publish on real)
autonomy ros2 keystore mint --keystore "$KEYSTORE" /governed_ros2_bridge_real
autonomy ros2 keystore mint --keystore "$KEYSTORE" /demo_robot/arm_controller
# Step 3: synthesize permissions XML per enclave.
# - bridge enclave: BOTH domains (covers both rclcpp::Context), both
# directions on every workload topic (bridge subs on agent, pubs on real)
autonomy ros2 keystore permissions /governed_ros2_bridge_real \
--keystore "$KEYSTORE" \
--domain 42 --domain 99 \
--publish /cmd_vel,/cmd_vel/* \
--subscribe /cmd_vel,/cmd_vel/*
# - workload enclave: ONE domain (agent), publishes only what the workload
# legitimately produces; subscribes to what the bridge republishes back
autonomy ros2 keystore permissions /demo_robot/arm_controller \
--keystore "$KEYSTORE" \
--domain 99 \
--publish /cmd_vel \
--subscribe /cmd_vel
Preferred:
--from-bundle(#938 3-C.1) — instead of re-typing--publish/--subscribelists that have to stay in sync with the bundle’s Rego rules, point at the bundle and let the command read theros2_topics:{publish,subscribe}block out of itsmanifest.json(schema v1.4+). One source of truth — when the bundle changes its declared surface, the permissions follow without operator edits:# Bridge enclave covering both domains, topic list resolved from the # demo bundle's manifest. autonomy ros2 keystore permissions /governed_ros2_bridge_real \ --keystore "$KEYSTORE" \ --domain 42 --domain 99 \ --from-bundle demo/bundles/ros2-bridge.tar
--from-bundleis mutually exclusive with--publish/--subscribe— mixing would silently widen the bundle’s declared surface, defeating the point of using the bundle as source of truth. Accepts either a.tarbundle (autonomy bundle pulloutput) or a directory containingmanifest.json(the demo-bundle shape). Bundles minted at schema_version < 1.4 don’t carry the block — fall back to explicit--publish/--subscribeuntil the manifest is bumped.
Layout under $KEYSTORE after all three steps:
$KEYSTORE/
├── public/ ← CA certs (operator-readable, fleet-distributable)
│ ├── identity_ca.cert.pem
│ └── permissions_ca.cert.pem
├── private/ ← CA keys (treat as secret material)
│ ├── identity_ca.key.pem
│ └── permissions_ca.key.pem
└── enclaves/
├── governance.xml ← rewritten to allow domains 42 + 99
├── governance.p7s ← re-signed via openssl smime
├── governed_ros2_bridge_real/
│ ├── cert.pem ← bridge identity cert
│ ├── key.pem ← bridge identity key
│ ├── permissions.xml ← topics × both domains
│ └── permissions.p7s ← signed via openssl smime
└── demo_robot/arm_controller/
├── cert.pem
├── key.pem
├── permissions.xml ← topics × one domain
└── permissions.p7s
3. Wire the keystore into the bridge launch¶
Three new flags on both autonomy ros2 run (paid) and autonomy run (CE):
autonomy run \
--image ghcr.io/autonomyops/adk-ros2-runtime:latest \
--governed-bridge \
--agent-domain 99 --real-domain 42 \
--bridge-topics '/cmd_vel:std_msgs/msg/String' \
--bridge-keystore /var/lib/autonomyops/ros2-keystore \
--bridge-enclave /governed_ros2_bridge_real \
--workload-enclave /demo_robot/arm_controller \
ros2.launch launch demo_robot arm_demo.launch.py
What happens, in order:
Pre-flight:
RunGovernedvalidates the SROS 2 triple — all three of--bridge-keystore,--bridge-enclave,--workload-enclavemust be set together (elseErrSecurityIncomplete) AND--governed-bridgemust be true (elseErrSecurityNeedsGovernedBridge). Failures here surface BEFORE the tool server starts, before the bridge spawns, before the workload dispatches — fail-closed up front.The runtime binds
/v1/toolto a random127.0.0.1:<port>(same as the non-SROS-2 bridge flow).The bridge container is spawned with:
ROS_SECURITY_KEYSTORE=<keystore>ROS_SECURITY_ENABLE=trueROS_SECURITY_STRATEGY=Enforce(NOT Permissive — Permissive would log+allow an unenrolled participant, defeating defense-in-depth)ROS_SECURITY_ENCLAVE_OVERRIDE=/governed_ros2_bridge_realKeystore bind-mounted read-only at the same host path so the in-container
ROS_SECURITY_KEYSTOREvalue resolves
The launched workload subprocess gets the same env shape, but with
ROS_SECURITY_ENCLAVE_OVERRIDE=/demo_robot/arm_controller— intentionally separate identity so a compromised workload can’t impersonate the bridge.Workload container also gets the keystore bind-mounted read-only.
4. Confirm the loop is closed¶
In a second terminal, watch the bridge container’s stderr:
docker logs -f $(docker ps -q --filter ancestor=ghcr.io/autonomyops/adk-ros2-runtime:latest)
You should see:
[INFO] [rcl]: Found security directory: /var/lib/autonomyops/ros2-keystore/enclaves/governed_ros2_bridge_real
[INFO] [rcl]: Found security directory: /var/lib/autonomyops/ros2-keystore/enclaves/governed_ros2_bridge_real
governed_ros2_bridge: ready agent_domain=99 real_domain=42 topics=/cmd_vel:std_msgs/msg/String runtime_url=http://127.0.0.1:<port>
Two “Found security directory” lines (one per rclcpp::Context: agent + real)
are normal. The ready line means both contexts successfully created their
participants AND created their pub/sub on /cmd_vel AFTER passing the
DDS-Security validation gates. The application-layer /v1/tool POST loop
continues as before — SROS 2 doesn’t change the message flow, only the
participant admission.
Troubleshooting¶
Could not find domain X in governance (code: 141)¶
The bridge’s participant is trying to join a DDS domain that isn’t listed in
<keystore>/enclaves/governance.xml. sros2’s default create_keystore only
covers domain 0. You probably ran autonomy ros2 keystore init without
--domain flags.
Fix: re-init the keystore covering both domains:
autonomy ros2 keystore init "$KEYSTORE" --domain 42 --domain 99
(This is idempotent on the CA material — it only rewrites the governance.xml + re-signs the governance.p7s.)
Not found a rule allowing to use the domain_id¶
The participant is on a domain governance allows, but the enclave’s
permissions.xml doesn’t grant access on that domain. Re-run
autonomy ros2 keystore permissions with both --domain flags for any
enclave the bridge uses across multiple domains:
autonomy ros2 keystore permissions /governed_ros2_bridge_real \
--keystore "$KEYSTORE" \
--domain 42 --domain 99 \
--publish /cmd_vel,/cmd_vel/* \
--subscribe /cmd_vel,/cmd_vel/*
rt/<topic> topic not found in allow rule (check_create_datawriter)¶
The enclave has permissions for some topics but not the one the bridge tries
to publish/subscribe on. SROS 2 mangles ROS topic names: /cmd_vel becomes
rt/cmd_vel on the wire. Verify your --publish / --subscribe lists
cover BOTH directions the bridge needs (it subscribes on agent + publishes on
real). Re-run autonomy ros2 keystore permissions with the missing topic.
participant denied by default rule (code: 145)¶
The participant’s identity certificate’s subject_name doesn’t match any
<grant> in the permissions.xml. Almost always a stale or mismatched
ROS_SECURITY_ENCLAVE_OVERRIDE env vs the enclave the operator minted.
Check that the enclave name passed to --bridge-enclave exactly matches
the name passed to autonomy ros2 keystore mint.
ErrSecurityIncomplete from RunGoverned¶
Half-configured security flags. The SROS 2 triple
(--bridge-keystore + --bridge-enclave + --workload-enclave) is
all-or-nothing — any partial set is rejected before any side effect. Either
pass all three or none.
ErrSecurityNeedsGovernedBridge from RunGoverned¶
You passed SROS 2 flags without --governed-bridge. SROS 2 is
defense-in-depth on the bridge, not a standalone substitute. Wiring
DDS-Security on the workload subprocess without a bridge actually mediating
publishes would govern nothing — the runner refuses this combination
explicitly.
Subscriber side: xmlrpc.client.Fault: !rclpy.ok()¶
ros2 topic echo / ros2 node list use rclpy (Python) which has a known
separate SROS 2 integration issue on Humble that doesn’t affect the bridge’s
C++ (rclcpp) path. The bridge itself works; only the Python CLI tooling
is broken. For verification, use a C++ subscriber or check the bridge’s
log lines directly.
Verifying bypass-resistance¶
The in-tree regression test
TestBypassResistance_RogueCannotPublishToSecuredSubscriber proves the
load-bearing claim end-to-end on the host:
# from the repo root
source /opt/ros/humble/setup.bash
go test ./cmd/autonomy/commands/... \
-run TestBypassResistance_RogueCannotPublishToSecuredSubscriber -v
What it does:
Provisions a fresh keystore via the production
autonomy ros2 keystoreCLI (no shortcuts).Starts a CREDENTIALED publisher (positive control) and asserts the secured subscriber DOES receive its message within 5s — proves the secured side is a functioning DDS-Security participant, not a dead process. Without this gate, “no rogue data” could mean “no participant”.
Starts an UNCREDENTIALED (rogue) publisher on the same domain WITHOUT any
ROS_SECURITY_*env. Asserts the secured subscriber does NOT receive its message — proves DDS-Security rejected the rogue at the discovery / participant-match layer.Three liveness gates (signal(0) probes) at each phase boundary ensure the subscriber stays alive through both phases. Any failure dumps the sub’s full log so an operator can debug WHY it died.
Test passes on hosts with ros-humble-ros-base installed; skips cleanly
on CI runners without ROS 2.
Reference¶
cmd/autonomy/commands/ros2_keystore.go—init+mintCLIs + therewriteGovernanceDomainshelper.cmd/autonomy/commands/ros2_keystore_permissions.go—permissionsCLI, multi-domain merge, openssl re-sign.runtime/ros2bridge/bridge_process.go—BridgeProcess.Keystore+EnclaveNamefields,ROS_SECURITY_*env injection, keystore bind-mount.runtime/ros2/runner.go—RunOptions.BridgeKeystore/BridgeEnclave/WorkloadEnclavefields and the all-or-nothing-plus-bridge-required gates.SROS 2 Quickstart — walkthrough of the provisioning + launch flow with the in-tree regression test.
ROS 2 Governed Bridge runbook — the application-layer bridge SROS 2 layers on top of.