GPUs example

We will follow the guide from Nvidia to deploy the gpu-operator into a Claudie-built Kubernetes cluster. Make sure you fulfill the necessary listed requirements in prerequisites before continuing, if you decide to use a different cloud provider.

AWS GPU Example¶

In this example we will be using AWS as our provider. AWS GPU instances (like g4dn.xlarge) come with GPUs attached, so no additional machineSpec configuration is needed:

apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: aws-gpu-example
  labels:
    app.kubernetes.io/part-of: claudie
spec:
  providers:
    - name: aws-1
      providerType: aws
      secretRef:
        name: aws-secret
        namespace: secrets

  nodePools:
    dynamic:
    - name: control-aws
      providerSpec:
        name: aws-1
        region: eu-central-1
        zone: eu-central-1a
      count: 1
      serverType: t3.medium
      # AMI ID of the image Ubuntu 24.04.
      # Make sure to update it according to the region.
      image: ami-07eef52105e8a2059

    - name: gpu-aws
      providerSpec:
        name: aws-1
        region: eu-central-1
        zone: eu-central-1a
      count: 2
      serverType: g4dn.xlarge
      # AMI ID of the image Ubuntu 24.04.
      # Make sure to update it according to the region.
      image: ami-07eef52105e8a2059
      storageDiskSize: 50

  kubernetes:
    clusters:
      - name: gpu-example
        version: v1.34.0
        network: 172.16.2.0/24
        pools:
          control:
            - control-aws
          compute:
            - gpu-aws

GCP GPU Example¶

For GCP, you must explicitly specify the GPU type and count using the machineSpec block. GCP requires both nvidiaGpuCount and nvidiaGpuType to attach GPUs to instances:

apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: gcp-gpu-example
  labels:
    app.kubernetes.io/part-of: claudie
spec:
  providers:
    - name: gcp-1
      providerType: gcp
      # GCP Spot VM support is available from claudie-config v0.11.4+
      templates:
        repository: "https://github.com/berops/claudie-config"
        tag: v0.11.4
        path: "templates/terraformer/gcp"
      secretRef:
        name: gcp-secret
        namespace: secrets

  nodePools:
    dynamic:
    - name: control-gcp
      providerSpec:
        name: gcp-1
        region: us-central1
        zone: us-central1-a
      count: 1
      serverType: e2-medium
      image: ubuntu-2404-noble-amd64-v20251001

    - name: gpu-gcp
      providerSpec:
        name: gcp-1
        region: us-central1
        zone: us-central1-a
      count: 2
      # Use n1-standard machine types for GPU attachment
      serverType: n1-standard-4
      image: ubuntu-2404-noble-amd64-v20251001
      storageDiskSize: 50
      # GPU configuration required for GCP
      machineSpec:
        nvidiaGpuCount: 1
        nvidiaGpuType: nvidia-tesla-t4

  kubernetes:
    clusters:
      - name: gpu-example
        version: v1.34.0
        network: 172.16.2.0/24
        pools:
          control:
            - control-gcp
          compute:
            - gpu-gcp

GCP GPU Requirements

The nvidiaGpuType field is required when nvidiaGpuCount > 0 for GCP providers
Available GPU types vary by zone. Check GCP GPU regions and zones for availability
Common GPU types: nvidia-tesla-t4, nvidia-tesla-v100, nvidia-tesla-a100, nvidia-l4
GPU instances cannot be live migrated, so they will be terminated during maintenance events

GCP Spot GPU Inference Example (Autoscaled)¶

GCP Spot VMs offer 60–91% cost savings over on-demand pricing, in exchange for possible reclamation with about 30 seconds of notice. Combined with GPU attachment and scale-from-zero autoscaling, this is a common pattern for cost-effective GPU inference: the nodepool scales up when work arrives and back down to zero when idle.

To request spot nodes, set spot: true on a GCP dynamic nodepool. Spot is only supported on worker (compute) nodepools and is rejected by the webhook on control-plane nodepools or unsupported providers. Claudie automatically applies the label claudie.io/spot=true and the taint claudie.io/spot=true:NoSchedule to every node in the pool, so only pods with a matching toleration are scheduled there.

apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: gcp-spot-gpu-autoscaled
  labels:
    app.kubernetes.io/part-of: claudie
spec:
  providers:
    - name: gcp-1
      providerType: gcp
      # GCP Spot VM support is available from claudie-config v0.11.4+
      templates:
        repository: "https://github.com/berops/claudie-config"
        tag: v0.11.4
        path: "templates/terraformer/gcp"
      secretRef:
        name: gcp-secret
        namespace: secrets

  nodePools:
    dynamic:
    - name: control-gcp
      providerSpec:
        name: gcp-1
        region: us-central1
        zone: us-central1-a
      count: 1
      serverType: e2-medium
      image: ubuntu-2404-noble-amd64-v20251001

    - name: spot-gpu-workers
      providerSpec:
        name: gcp-1
        region: us-central1
        zone: us-central1-a
      # Use autoscaler instead of a fixed count; scales to zero when idle.
      autoscaler:
        min: 0
        max: 4
      serverType: n1-standard-4
      image: ubuntu-2404-noble-amd64-v20251001
      storageDiskSize: 50
      machineSpec:
        nvidiaGpuCount: 1
        nvidiaGpuType: nvidia-tesla-t4
      # GCP Spot VMs — significant cost savings for interruptible inference workloads.
      spot: true

  kubernetes:
    clusters:
      - name: spot-gpu-cluster
        version: v1.34.0
        network: 172.16.4.0/24
        pools:
          control:
            - control-gcp
          compute:
            - spot-gpu-workers

Spot reclamation

GCP may reclaim spot instances with approximately 30 seconds of notice. Design workloads on spot nodepools to handle abrupt termination gracefully (e.g. checkpoint frequently, use job restart policies).

Pods that need to run on this nodepool must include both a spot toleration (at the pod spec level) and a GPU resource request (under spec.containers[]):

apiVersion: v1
kind: Pod
metadata:
  name: inference
spec:
  # Tolerate the spot taint so the pod is allowed onto spot nodes.
  tolerations:
    - key: claudie.io/spot
      operator: Equal
      value: "true"
      effect: NoSchedule
  containers:
    - name: inference
      image: my-inference:latest
      # Request a GPU so the scheduler (and the autoscaler) place this on the GPU pool.
      resources:
        limits:
          nvidia.com/gpu: 1

GPU Operator on spot nodepools

The spot taint claudie.io/spot=true:NoSchedule also keeps the NVIDIA GPU Operator components off spot nodes unless they tolerate it. When installing the operator, add a toleration for claudie.io/spot so its driver, device-plugin and toolkit daemonsets schedule on spot GPU nodes (otherwise nvidia.com/gpu is never advertised). For example, with Helm:

helm install gpu-operator nvidia/gpu-operator -n gpu-operator --create-namespace \
  --set-json 'daemonsets.tolerations=[{"key":"nvidia.com/gpu","operator":"Exists","effect":"NoSchedule"},{"key":"claudie.io/spot","operator":"Exists","effect":"NoSchedule"}]'

Exoscale GPU Example¶

For Exoscale, GPU instances have the GPU built into the instance type (like AWS), so no additional machineSpec configuration is needed. Simply use a GPU instance type such as gpu2.small as the serverType:

apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: exoscale-gpu-example
  labels:
    app.kubernetes.io/part-of: claudie
spec:
  providers:
    - name: exoscale-1
      providerType: exoscale
      # Exoscale templates are supported from claudie-config v0.9.18+
      templates:
        repository: "https://github.com/berops/claudie-config"
        tag: v0.9.18
        path: "templates/terraformer/exoscale"
      secretRef:
        name: exoscale-secret
        namespace: secrets

  nodePools:
    dynamic:
    - name: control-exo
      providerSpec:
        name: exoscale-1
        region: ch-gva-2
      count: 1
      serverType: standard.medium
      image: "Linux Ubuntu 24.04 LTS 64-bit"

    - name: gpu-exo
      providerSpec:
        name: exoscale-1
        region: at-vie-1
      count: 1
      serverType: gpu2.small
      image: "Linux Ubuntu 24.04 LTS 64-bit"
      storageDiskSize: 50

  kubernetes:
    clusters:
      - name: gpu-example
        version: v1.34.0
        network: 172.16.2.0/24
        pools:
          control:
            - control-exo
          compute:
            - gpu-exo

Exoscale GPU Requirements

GPU instance types require account authorization from Exoscale. Contact Exoscale support to enable GPU quota.
Available GPU types and zones may change. List current offerings with exo compute instance-type list --verbose | grep -i gpu or check the Exoscale pricing page.

Deploying the GPU Operator¶

After the InputManifest has been successfully built by Claudie, deploy the gpu-operator to the gpu-example Kubernetes cluster.

Create a namespace for the gpu-operator.

kubectl create ns gpu-operator

kubectl label --overwrite ns gpu-operator pod-security.kubernetes.io/enforce=privileged

Add Nvidia Helm repository.

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update

Install the operator.

helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator --set cdi.enabled=true --set cdi.nriPluginEnabled=true

Claudie overrides /etc/containerd/config.toml on every reconciliation loop. To avoid conflicts with these overrides the cdi and nri plugins are enabled. This bypasses the conflict with Claudie-reconciled /etc/containerd/config.toml for the operator.

Wait for the pods in the gpu-operator namespace to be ready.

NAME                                                              READY   STATUS      RESTARTS      AGE
gpu-feature-discovery-4lrbz                                       1/1     Running     0              10m
gpu-feature-discovery-5x88d                                       1/1     Running     0              10m
gpu-operator-1708080094-node-feature-discovery-gc-84ff8f47tn7cd   1/1     Running     0              10m
gpu-operator-1708080094-node-feature-discovery-master-757c27tm6   1/1     Running     0              10m
gpu-operator-1708080094-node-feature-discovery-worker-495z2       1/1     Running     0              10m
gpu-operator-1708080094-node-feature-discovery-worker-n8fl6       1/1     Running     0              10m
gpu-operator-1708080094-node-feature-discovery-worker-znsk4       1/1     Running     0              10m
gpu-operator-6dfb9bd487-2gxzr                                     1/1     Running     0              10m
nvidia-container-toolkit-daemonset-jnqwn                          1/1     Running     0              10m
nvidia-container-toolkit-daemonset-x9t56                          1/1     Running     0              10m
nvidia-cuda-validator-l4w85                                       0/1     Completed   0              10m
nvidia-cuda-validator-lqxhq                                       0/1     Completed   0              10m
nvidia-dcgm-exporter-l9nzt                                        1/1     Running     0              10m
nvidia-dcgm-exporter-q7c2x                                        1/1     Running     0              10m
nvidia-device-plugin-daemonset-dbjjl                              1/1     Running     0              10m
nvidia-device-plugin-daemonset-x5kfs                              1/1     Running     0              10m
nvidia-driver-daemonset-dcq4g                                     1/1     Running     0              10m
nvidia-driver-daemonset-sjjlb                                     1/1     Running     0              10m
nvidia-operator-validator-jbc7r                                   1/1     Running     0              10m
nvidia-operator-validator-q59mc                                   1/1     Running     0              10m

When all pods are ready, you should be able to verify if the GPUs can be used.

kubectl get nodes -o json | jq -r '.items[] | {name:.metadata.name, gpus:.status.capacity."nvidia.com/gpu"}'

Deploy an example manifest that uses one of the available GPUs from the worker nodes.

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vectoradd
      image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
      resources:
        limits:
          nvidia.com/gpu: 1

From the logs of the pods you should be able to see

kubectl logs cuda-vectoradd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done