Install Nvidia drivers + CUDA on Debian 12 (bookworm) + nvidia-smi + ollama and Docker

Tux trying to figure out which version to install

When I set out to install NVIDIA drivers alongside CUDA on my machine, I often find myself grappling with frustrating version mismatches that complicate the entire process. One of the most critical tools in this ecosystem is nvidia-smi, which provides vital insights into GPU processes and resource consumption. However, I frequently encounter a common issue: the version of nvidia-smi tends to lag behind the latest NVIDIA drivers. For example, as of April 2025, I might be working with driver version 570 while nvidia-smi is still at version 560. This discrepancy can lead to significant problems, as blindly installing the latest drivers without ensuring compatibility can render nvidia-smi inoperable. Therefore, I’ve learned the importance of carefully navigating these version dependencies to ensure a smooth installation and optimal performance of my GPU resources.

Restrict Debian 12 to install too fresh packages

On the latest Debian (12 — Bookworm), the standard NVIDIA drivers typically install only up to version 535. However, when I install the cuda-keyring to set up for the CUDA driver later, the official NVIDIA repository gets added to my sources, prompting my system to suggest an update to the latest drivers (currently version 570). While there’s nothing inherently wrong with this, and I can proceed with the installation—including the latest CUDA drivers—my nvidia-smi tool, which is at version 560, will end up being outdated.

To prevent this issue, I need to configure my system to avoid installing certain packages with version numbers higher than 560. After experimenting with the various packages involved in the installation process, I’ve found the necessary steps to ensure a functioning system with the most up-to-date drivers and tools.

Create a policy for all nvidia-related drivers

# nano /etc/apt/preferences.d/nvidia-drivers

You can name this file however you want, then put these rules into it:

Package: *nvidia*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: cuda-drivers*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libcuda*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libxnvctrl*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libnv*
Pin: version 560.35.05-1
Pin-Priority: 1001

As you can see, I’ve restricted all packages containing the “nvidia” string to a maximum version of 560.35.05–1. I’ve applied the same approach to the other relevant packages, but in this case, I know that these packages start with specific strings, so there’s no need to use wildcards for the first character(s).

# apt update

We need to run apt update to get the policy in effect.

Install Nvidia drivers and tools

# apt install nvidia-driver nvidia-smi nvidia-driver

Install Cuda drivers

# wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
# dpkg -i cuda-keyring_1.1-1_all.deb
# apt-get updatesudo apt-get -y install cuda-toolkit-12-8
# apt-get install -y cuda-drivers

The installation process should go smoothly. Once everything is complete, reboot your machine and enjoy!

+1 extra — Want nvidia and CUDA in Docker?

Luckily, this does not depend much on our hacks above. Just install the Nvidia Container Toolkit and test docker’s access to the GPU.

# curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# apt-get update
# apt-get install -y nvidia-container-toolkit

Next, we need to specify the Nvidia runtime for Docker. We need to add the following extra definitions to docker’s daemon.json (note, the file might not exist, so just create it):

# nano /etc/docker/daemon.json

Then, add the following lines:

{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}

Then, save and restart docker.

# systemctl restart docker

Now test a simple cuda container and nvidia-smi command inside:

# docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi
Tue Apr 8 06:11:34 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:17:00.0 Off | 0 |
| N/A 31C P0 44W / 300W | 14MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

If you see the same as above, you are good to go 🙂

Ollama container with WebUI

Here, I just provide you a docker-compose.yml file that does the job for you 🙂

services:
ollama:
container_name: ollama
hostname: ollama
image: ollama/ollama:latest
restart: unless-stopped
# ports:
# - "11434:11434"
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- CUDA_VISIBLE_DEVICES=0
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
#use your own path for permanent storage (e.g., models)
- /docker/ollama/data:/root/.ollama
runtime: nvidia
# command: ollama serve

ollama-webui:
container_name: ollama-webui
hostname: ollama-webui
restart: unless-stopped
image: ghcr.io/open-webui/open-webui:latest
depends_on:
- ollama
ports:
- 3000:8080
volumes:
- /docker/ollama/open-webui-data:/app/backend/data
environment:
- OLLAMA_URL=http://ollama:11434

Leave a Comment