Install Nvidia drivers + CUDA on Debian 12 (bookworm) + nvidia-smi + ollama and Docker

When I set out to install NVIDIA drivers alongside CUDA on my machine, I often find myself grappling with frustrating version mismatches that complicate the entire process. One of the most critical tools in this ecosystem is nvidia-smi, which provides vital insights into GPU processes and resource consumption. However, I frequently encounter a common issue: the version of nvidia-smi tends to lag behind the latest NVIDIA drivers. For example, as of April 2025, I might be working with driver version 570 while nvidia-smi is still at version 560. This discrepancy can lead to significant problems, as blindly installing the latest drivers without ensuring compatibility can render nvidia-smi inoperable. Therefore, I’ve learned the importance of carefully navigating these version dependencies to ensure a smooth installation and optimal performance of my GPU resources.

Restrict Debian 12 to install too fresh packages

On the latest Debian (12 — Bookworm), the standard NVIDIA drivers typically install only up to version 535. However, when I install the cuda-keyring to set up for the CUDA driver later, the official NVIDIA repository gets added to my sources, prompting my system to suggest an update to the latest drivers (currently version 570). While there’s nothing inherently wrong with this, and I can proceed with the installation—including the latest CUDA drivers—my nvidia-smi tool, which is at version 560, will end up being outdated.

To prevent this issue, I need to configure my system to avoid installing certain packages with version numbers higher than 560. After experimenting with the various packages involved in the installation process, I’ve found the necessary steps to ensure a functioning system with the most up-to-date drivers and tools.

Create a policy for all nvidia-related drivers

# nano /etc/apt/preferences.d/nvidia-drivers

You can name this file however you want, then put these rules into it:

Package: *nvidia*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: cuda-drivers*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libcuda*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libxnvctrl* 
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libnv*
Pin: version 560.35.05-1
Pin-Priority: 1001

As you can see, I’ve restricted all packages containing the “nvidia” string to a maximum version of 560.35.05–1. I’ve applied the same approach to the other relevant packages, but in this case, I know that these packages start with specific strings, so there’s no need to use wildcards for the first character(s).

# apt update

We need to run apt update to get the policy in effect.

Install Nvidia drivers and tools

# apt install nvidia-driver nvidia-smi nvidia-driver

Install Cuda drivers

# wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
# dpkg -i cuda-keyring_1.1-1_all.deb
# apt-get updatesudo apt-get -y install cuda-toolkit-12-8
# apt-get install -y cuda-drivers

The installation process should go smoothly. Once everything is complete, reboot your machine and enjoy!

+1 extra — Want nvidia and CUDA in Docker?

Luckily, this does not depend much on our hacks above. Just install the Nvidia Container Toolkit and test docker’s access to the GPU.

# curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# apt-get update
# apt-get install -y nvidia-container-toolkit

Next, we need to specify the Nvidia runtime for Docker. We need to add the following extra definitions to docker’s daemon.json (note, the file might not exist, so just create it):

# nano /etc/docker/daemon.json

Then, add the following lines:

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

Then, save and restart docker.

# systemctl restart docker

Now test a simple cuda container and nvidia-smi command inside:

# docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi
Tue Apr  8 06:11:34 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:17:00.0 Off |                    0 |
| N/A   31C    P0             44W /  300W |      14MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

If you see the same as above, you are good to go 🙂

Ollama container with WebUI

Here, I just provide you a docker-compose.yml file that does the job for you 🙂

services:
  ollama:
    container_name: ollama
    hostname: ollama
    image: ollama/ollama:latest
    restart: unless-stopped
#    ports:
#      - "11434:11434"
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - CUDA_VISIBLE_DEVICES=0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
   volumes: 
   #use your own path for permanent storage (e.g., models)
      - /docker/ollama/data:/root/.ollama
    runtime: nvidia
#    command: ollama serve

  ollama-webui:
    container_name: ollama-webui
    hostname: ollama-webui
    restart: unless-stopped
    image: ghcr.io/open-webui/open-webui:latest
    depends_on:
      - ollama
    ports:
      - 3000:8080
    volumes:
      - /docker/ollama/open-webui-data:/app/backend/data
    environment:
      - OLLAMA_URL=http://ollama:11434

April 10, 2025 cslev

ai docker gpu linux nvidia