1. PVE驱动

在pve中安装:

1.注释企业订阅源, 添加非订阅源

vim /etc/apt/sources.list.d/pve-enterprise.list

# deb https://enterprise.proxmox.com/debian/pve bookworm pve-enterprise

deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

然后执行 apt update && apt install build-essential pve-headers-$(uname -r)

2.禁用nouveau

# 屏蔽nouveau 添加一句 blacklist nouveau
vim /etc/modprobe.d/blacklist.conf
# 修改生效
update-initramfs -u
# 重启
reboot

3.直接安装CADU Toolkit

根据官网教程一步步安装

验证驱动:

nvidia-smi

验证cuda:

nvcc --version

如果没有 nvcc命令,需要将 /usr/local/cuda/bin添加到环境变量

export PATH=/usr/local/cuda/bin:$PATH

检查内容

ls -la /dev/dri/

ls -la /dev/dri/ 输出:

total 0
drwxr-xr-x  3 root root        120 Apr 27 21:02 .
drwxr-xr-x 23 root root       5760 Apr 28 01:00 ..
drwxr-xr-x  2 root root        100 Apr 27 21:02 by-path
crw-rw----  1 root video  226,   0 Apr 27 19:56 card0
crw-rw----  1 root video  226,   1 Apr 27 21:02 card1
crw-rw----  1 root render 226, 128 Apr 27 21:02 renderD128
ls -la /dev/dri/by-path/

ls -la /dev/dri/by-path/ 输出:

total 0
drwxr-xr-x 2 root root 100 Apr 27 21:02 .
drwxr-xr-x 3 root root 120 Apr 27 21:02 ..
lrwxrwxrwx 1 root root   8 Apr 27 21:02 pci-0000:81:00.0-card -> ../card1
lrwxrwxrwx 1 root root  13 Apr 27 21:02 pci-0000:81:00.0-render -> ../renderD128
lrwxrwxrwx 1 root root   8 Apr 27 19:56 pci-0000:c2:00.0-card -> ../card0
ls -la /dev/nvidia*

ls -la /dev/nvidia* 输出:

crw-rw-rw- 1 root root 511,   0 Apr 28 16:07 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Apr 28 16:07 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195, 254 Apr 28 16:05 /dev/nvidia-modeset
crw-rw-rw- 1 root root 195,   0 Apr 28 16:05 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 28 16:05 /dev/nvidiactl

/dev/nvidia-caps:
total 0
cr--r--r-- 1 root root 237, 2 Apr 28 16:07 nvidia-cap2
cr-------- 1 root root 237, 1 Apr 28 16:07 nvidia-cap1

2. LXC驱动

修改lxc配置文件

vim /etc/pve/lxc/xxx.conf

加入以下内容

其中195对应上面 ls -la /dev/nvidia*输出的内容, 以此类推

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 511:* rwm
lxc.cgroup2.devices.allow: c 236:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps nvidia-caps none bind,optional,create=dir
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file

重启lxc容器

lxc内安装驱动:
直接安装CADU Toolkit

验证驱动:

nvidia-smi

验证cuda:

git clone https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples/Samples/1_Utilities/deviceQuery/
make
./deviceQuery
cuda-samples/Samples/1_Utilities/deviceQuery/deviceQuery

3. LXC容器内使用 Docker

安装 NVIDIA Container Toolkit.