NVIDIA GPU Integration¶
The nvidia-demo bundle demonstrates the AutonomyOps governance model for
GPU-accelerated inference workloads. It does not perform real CUDA compute; it
validates that the governance and attribution rules are enforced correctly before
you integrate a real inference runtime.
What the demo covers¶
Container runtime GPU access via CDI (
--device nvidia.com/gpu=all)tool.infer.*attribution enforcement: inference calls require a non-emptymodel_idor they are rejectedThe
nvidia_safety.regopolicy and its mapping to Python governance logicSelf-test mode (
--check) for CI and post-deployment smoke tests
Hardware targets¶
Platform |
Runtime |
Notes |
|---|---|---|
Jetson Orin (aarch64) |
|
|
x86_64 desktop GPU |
|
|
No GPU (CI / dev) |
|
Self-test passes; GPU probe is advisory |
Prerequisites¶
Docker 24+ with NVIDIA container runtime installed (
/etc/docker/daemon.jsonmust include"default-runtime": "nvidia"or use--runtime=nvidiaexplicitly)For Jetson Orin: JetPack 6 with CDI support enabled
Build the image¶
# From repo root — builds python:3.11-slim base; arm64 pulls automatically on Orin
docker build -t autonomyops/nvidia-demo:local demo/nvidia-demo/
Self-test (CI and smoke)¶
docker run --rm --runtime=nvidia autonomyops/nvidia-demo:local --check
Expected output:
[self-test] model_id present → accepted (HTTP 200) ✓
[self-test] model_id absent → rejected (HTTP 400) ✓
[self-test] GPU probe: /dev/nvmap found
[self-test] PASS
On a host without an NVIDIA runtime or GPU devices, the GPU probe line reads:
[self-test] GPU probe: none of /dev/nvmap, /dev/nvidiactl, /dev/nvidia0 present (advisory)
The self-test still exits 0 because GPU access is advisory; the governance logic does not depend on GPU availability.
HTTP server mode¶
docker run --rm --runtime=nvidia -p 8080:8080 autonomyops/nvidia-demo:local
Allowed inference call¶
curl -s -X POST http://localhost:8080/infer \
-H "Content-Type: application/json" \
-d '{"kind":"tool.infer.classify","params":{"model_id":"sonic-v1","input":"..."}}'
Expected:
{"result": "inference accepted", "model_id": "sonic-v1", "node": "sonic-v1"}
Rejected call (no model_id)¶
curl -s -X POST http://localhost:8080/infer \
-H "Content-Type: application/json" \
-d '{"kind":"tool.infer.classify","params":{}}'
Expected (HTTP 400):
{"error": "model_id required — unattributed inference rejected by governance policy"}
This mirrors the allow rule in nvidia_safety.rego:
allow if {
startswith(input.kind, "tool.infer.")
input.params.model_id != ""
}
Inference calls that do not carry model_id are unattributable to a specific
model version and are always denied regardless of other parameters.
Bundle policy¶
The nvidia-demo bundle (demo/bundles/nvidia.tar) carries
demo/bundles/nvidia/policies/nvidia_safety.rego.
Allowed actions:
Kind |
Condition |
|---|---|
|
|
|
Unconditional (node start/stop/health) |
|
Unconditional (WAL drain, OTLP export) |
|
Unconditional (health probes) |
|
Always denied |
anything else |
Denied (fail-closed default) |
Inspecting the bundle¶
autonomy bundle inspect demo/bundles/nvidia.tar --local
# With policy text
autonomy bundle inspect demo/bundles/nvidia.tar --local --show-policy
Key manifest fields:
name: nvidia-demo
version: 0.1.0
channel: dev
context: container
entrypoint: docker run --rm --runtime=nvidia --device nvidia.com/gpu=all
ghcr.io/autonomyops/nvidia-demo:latest --check
CDI device access¶
On modern NVIDIA container runtime installations GPU devices are exposed via CDI (Container Device Interface):
# Check CDI devices available on the host
nvidia-ctk cdi list
# Run with all visible GPUs via CDI
docker run --rm --device nvidia.com/gpu=all autonomyops/nvidia-demo:local --check
# Run with a specific GPU by index
docker run --rm --device nvidia.com/gpu=0 autonomyops/nvidia-demo:local --check
On Jetson Orin the device path is /dev/nvmap; on discrete GPU hosts it is
/dev/nvidiactl and /dev/nvidia0. The infer_server.py probe checks all
three paths and reports advisory results.
Multi-arch images¶
The demo/nvidia-demo/Dockerfile uses python:3.11-slim which has both
linux/amd64 and linux/arm64 variants. For Jetson Orin (aarch64) builds:
# Build and push a multi-arch manifest
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag ghcr.io/autonomyops/nvidia-demo:latest \
--push demo/nvidia-demo/
The CI release pipeline builds both architectures using docker buildx with
QEMU for the cross-platform layer. See build/demo-multiarch-images for the
full Makefile targets.
Adapting for a production inference runtime¶
The demo stub is intentionally minimal — no real CUDA compute is performed. To replace it with a real runtime:
Replace
infer_server.pywith your model serving code. Retain the_handle_inferattribution check or call the AutonomyOps runtime endpoint (/v1/tool) for each inference request.Update
model_idrouting so each request carries the exact model version string that matches your audit requirements.Update
nvidia_safety.regoif your inference kinds deviate fromtool.infer.*; add package/version constraints as needed.Add CUDA base layer to the Dockerfile:
FROM nvcr.io/nvidia/cuda:12.3.0-runtime-ubuntu22.04 # … add your inference server …
Rebuild and verify the bundle before activating on edge nodes:
autonomy bundle push my-inference.tar \ registry.example.com/my-inference:v1.0.0 \ --key demo/keys/cosign.key autonomy bundle verify registry.example.com/my-inference:v1.0.0 \ --pub-key demo/keys/cosign.pub
See also¶
Bundle Workflows — pull, inspect, stage, activate
ROS2 Governance — policy model and dual-path execution
Hardware Adaptation — Jetson Orin HIL setup
demo/nvidia-demo/— Dockerfile and inference server sourcedemo/bundles/nvidia/— manifest and Rego policy source