NVIDIA GPU Integration

The nvidia-demo bundle demonstrates the AutonomyOps governance model for GPU-accelerated inference workloads. It does not perform real CUDA compute; it validates that the governance and attribution rules are enforced correctly before you integrate a real inference runtime.

What the demo covers

  • Container runtime GPU access via CDI (--device nvidia.com/gpu=all)

  • tool.infer.* attribution enforcement: inference calls require a non-empty model_id or they are rejected

  • The nvidia_safety.rego policy and its mapping to Python governance logic

  • Self-test mode (--check) for CI and post-deployment smoke tests

Hardware targets

Platform

Runtime

Notes

Jetson Orin (aarch64)

nvidia container runtime

/dev/nvmap exposed via CDI

x86_64 desktop GPU

nvidia container runtime

/dev/nvidiactl, /dev/nvidia0

No GPU (CI / dev)

--runtime=runc

Self-test passes; GPU probe is advisory

Prerequisites

  • Docker 24+ with NVIDIA container runtime installed (/etc/docker/daemon.json must include "default-runtime": "nvidia" or use --runtime=nvidia explicitly)

  • For Jetson Orin: JetPack 6 with CDI support enabled

Build the image

# From repo root — builds python:3.11-slim base; arm64 pulls automatically on Orin
docker build -t autonomyops/nvidia-demo:local demo/nvidia-demo/

Self-test (CI and smoke)

docker run --rm --runtime=nvidia autonomyops/nvidia-demo:local --check

Expected output:

[self-test] model_id present → accepted (HTTP 200) ✓
[self-test] model_id absent  → rejected (HTTP 400) ✓
[self-test] GPU probe: /dev/nvmap found
[self-test] PASS

On a host without an NVIDIA runtime or GPU devices, the GPU probe line reads:

[self-test] GPU probe: none of /dev/nvmap, /dev/nvidiactl, /dev/nvidia0 present (advisory)

The self-test still exits 0 because GPU access is advisory; the governance logic does not depend on GPU availability.

HTTP server mode

docker run --rm --runtime=nvidia -p 8080:8080 autonomyops/nvidia-demo:local

Allowed inference call

curl -s -X POST http://localhost:8080/infer \
    -H "Content-Type: application/json" \
    -d '{"kind":"tool.infer.classify","params":{"model_id":"sonic-v1","input":"..."}}'

Expected:

{"result": "inference accepted", "model_id": "sonic-v1", "node": "sonic-v1"}

Rejected call (no model_id)

curl -s -X POST http://localhost:8080/infer \
    -H "Content-Type: application/json" \
    -d '{"kind":"tool.infer.classify","params":{}}'

Expected (HTTP 400):

{"error": "model_id required — unattributed inference rejected by governance policy"}

This mirrors the allow rule in nvidia_safety.rego:

allow if {
    startswith(input.kind, "tool.infer.")
    input.params.model_id != ""
}

Inference calls that do not carry model_id are unattributable to a specific model version and are always denied regardless of other parameters.

Bundle policy

The nvidia-demo bundle (demo/bundles/nvidia.tar) carries demo/bundles/nvidia/policies/nvidia_safety.rego.

Allowed actions:

Kind

Condition

tool.infer.*

model_id must be non-empty

lifecycle.*

Unconditional (node start/stop/health)

telemetry.emit

Unconditional (WAL drain, OTLP export)

tool.echo

Unconditional (health probes)

tool.shell

Always denied

anything else

Denied (fail-closed default)

Inspecting the bundle

autonomy bundle inspect demo/bundles/nvidia.tar --local

# With policy text
autonomy bundle inspect demo/bundles/nvidia.tar --local --show-policy

Key manifest fields:

name:       nvidia-demo
version:    0.1.0
channel:    dev
context:    container
entrypoint: docker run --rm --runtime=nvidia --device nvidia.com/gpu=all
            ghcr.io/autonomyops/nvidia-demo:latest --check

CDI device access

On modern NVIDIA container runtime installations GPU devices are exposed via CDI (Container Device Interface):

# Check CDI devices available on the host
nvidia-ctk cdi list

# Run with all visible GPUs via CDI
docker run --rm --device nvidia.com/gpu=all autonomyops/nvidia-demo:local --check

# Run with a specific GPU by index
docker run --rm --device nvidia.com/gpu=0 autonomyops/nvidia-demo:local --check

On Jetson Orin the device path is /dev/nvmap; on discrete GPU hosts it is /dev/nvidiactl and /dev/nvidia0. The infer_server.py probe checks all three paths and reports advisory results.

Multi-arch images

The demo/nvidia-demo/Dockerfile uses python:3.11-slim which has both linux/amd64 and linux/arm64 variants. For Jetson Orin (aarch64) builds:

# Build and push a multi-arch manifest
docker buildx build \
    --platform linux/amd64,linux/arm64 \
    --tag ghcr.io/autonomyops/nvidia-demo:latest \
    --push demo/nvidia-demo/

The CI release pipeline builds both architectures using docker buildx with QEMU for the cross-platform layer. See build/demo-multiarch-images for the full Makefile targets.

Adapting for a production inference runtime

The demo stub is intentionally minimal — no real CUDA compute is performed. To replace it with a real runtime:

  1. Replace infer_server.py with your model serving code. Retain the _handle_infer attribution check or call the AutonomyOps runtime endpoint (/v1/tool) for each inference request.

  2. Update model_id routing so each request carries the exact model version string that matches your audit requirements.

  3. Update nvidia_safety.rego if your inference kinds deviate from tool.infer.*; add package/version constraints as needed.

  4. Add CUDA base layer to the Dockerfile:

    FROM nvcr.io/nvidia/cuda:12.3.0-runtime-ubuntu22.04
    # … add your inference server …
    
  5. Rebuild and verify the bundle before activating on edge nodes:

    autonomy bundle push my-inference.tar \
        registry.example.com/my-inference:v1.0.0 \
        --key demo/keys/cosign.key
    autonomy bundle verify registry.example.com/my-inference:v1.0.0 \
        --pub-key demo/keys/cosign.pub
    

See also

  • Bundle Workflows — pull, inspect, stage, activate

  • ROS2 Governance — policy model and dual-path execution

  • Hardware Adaptation — Jetson Orin HIL setup

  • demo/nvidia-demo/ — Dockerfile and inference server source

  • demo/bundles/nvidia/ — manifest and Rego policy source