Mijn vraag
Ik probeer een script te maken wat mijn grafische omgeving de nek omdraait en de GPU vrijspeelt om door een VM te gebruiken
Als caveat staat de VM image op een NTFS partitie (steam drive) die ik ook nog eens toegankelijk wil hebben in de VM. Vandaar al het vechten met permissies; dat gedeelte van het script werkt en is getest. Alleen het vrijgeven van de GPU wil maar niet lukken. De host is TTY3 output net voor het script faalt wel kwijt (monitor gaat stand-by) dus ik moet dichtbij zijn.
Relevante software en hardware die ik gebruik
Arch Ryzen 7 3000 series x en een nvidia 2070 Super met de nvidia-open packages
Wat ik al gevonden:
[INFO] === Starting GPU passthrough VM ===
[INFO] Stopping sddm.service to free GPU
[INFO] Killing GPU-using processes
[INFO] Killed PID 844 using GPU
[INFO] Killed PID 877 using GPU
[INFO] Killed PID 966 using GPU
[INFO] Killed PID 1011 using GPU
[INFO] Killed PID 1240 using GPU
[INFO] Killed PID 1241 using GPU
[INFO] Killed PID 1373 using GPU
[INFO] Killed PID 1501 using GPU
[INFO] Killed PID 1780 using GPU
[INFO] Killed PID 2001 using GPU
[INFO] Stopping nvidia-persistenced service
[INFO] Stopped nvidia-persistenced successfully
[INFO] Attempting to remove module nvidia_drm
[INFO] Removed module nvidia_drm successfully
[INFO] Attempting to remove module nvidia_modeset
[INFO] Removed module nvidia_modeset successfully
[INFO] Attempting to remove module nvidia_uvm
[INFO] Removed module nvidia_uvm successfully
[INFO] Attempting to remove module nvidia
[INFO] Removed module nvidia successfully
[INFO] Processing device 0000:08:00.0 for unbinding and vfio-pci binding
[INFO] Attempting to unbind 0000:08:00.0 from current driver <---Hier gaat het mis!
[INFO] Restoring system state before unbinding
[INFO] Preparing to remount /run/media/Daan/Opslagmonster with uid=1000, gid=1000 and permissions...
[INFO] Successfully unmounted /run/media/Daan/Opslagmonster
[INFO] Mounted /dev/sda2 at /run/media/Daan/Opslagmonster with user permissions
[INFO] [DEBUG] Listing image file ACL and permissions:
-rwxrwxrwx 1 Daan Daan 107390828544 May 28 14:15 /run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# file: run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# owner: Daan
# group: Daan
user::rwx
group::rwx
other::rwx
[INFO] Reloading NVIDIA modules
[INFO] Loaded module nvidia
[INFO] Loaded module nvidia_uvm
[INFO] Loaded module nvidia_modeset
[INFO] Loaded module nvidia_drm
[INFO] Starting sddm
[INFO] plasmashell not running, attempting manual launch...
[INFO] plasmashell launched successfully
[INFO] Restoring system state before unbinding
Enig idee waarom modprobe -r "$driver_name hier faalt?
Is het beter een kant en klaar script aan te passen dan zelf het wiel uit willen vinden?
Of krijg ik nog eerder een steen aan het bloeden dan dat de nvidia driver mee gaat werken?
Ik probeer een script te maken wat mijn grafische omgeving de nek omdraait en de GPU vrijspeelt om door een VM te gebruiken
Als caveat staat de VM image op een NTFS partitie (steam drive) die ik ook nog eens toegankelijk wil hebben in de VM. Vandaar al het vechten met permissies; dat gedeelte van het script werkt en is getest. Alleen het vrijgeven van de GPU wil maar niet lukken. De host is TTY3 output net voor het script faalt wel kwijt (monitor gaat stand-by) dus ik moet dichtbij zijn.
Relevante software en hardware die ik gebruik
Arch Ryzen 7 3000 series x en een nvidia 2070 Super met de nvidia-open packages
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
| #!/bin/bash
set -e
LOGFILE="/var/log/gameVm.log"
log() {
echo "[INFO] $1" | tee -a "$LOGFILE"
}
warn() {
echo "[WARN] $1" >&2 | tee -a "$LOGFILE" >&2
}
err() {
echo "[ERROR] $1" >&2 | tee -a "$LOGFILE" >&2
}
trap 'restore_before_unbinding' ERR
VM_NAME="windows-gaming"
ISO_PATH="/run/media/Daan/Opslagmonster/Downloads/Software/win10/Win10_22H2_Dutch_x64v1.iso"
DISK_IMAGE="/run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2"
OVMF_CODE="/usr/share/edk2/x64/OVMF_CODE.4m.fd"
OVMF_VARS="/usr/share/edk2/x64/OVMF_VARS.4m.fd"
MOUNT_DEVICE="/dev/sda2"
MOUNT_POINT="/run/media/Daan/Opslagmonster"
USER_UID=1000
USER_GID=1000
VM_UID=958
VM_GID=958
SHARE_PATH="$MOUNT_POINT"
SHARE_NAME="opslagmonster"
set_perms_recursive() {
local path="$1"
local user="$2"
log "Starting set_perms_recursive for user $user on path $path"
while [ "$path" != "/" ] && [ -n "$path" ]; do
log "Setting execute permission ACL for user $user on $path"
if setfacl -m u:$user:x "$path"; then
log "Successfully setfacl for $user on $path"
else
warn "Failed to setfacl for $user on $path"
fi
path=$(dirname "$path")
done
log "Finished set_perms_recursive for user $user"
}
remount_drive_for_vm() {
log "Preparing to remount $MOUNT_POINT with uid=$VM_UID, gid=$VM_GID and permissions..."
if umount "$MOUNT_POINT"; then
log "Successfully unmounted $MOUNT_POINT"
else
warn "Failed to unmount $MOUNT_POINT (might not be mounted)"
fi
if mount -o uid=$VM_UID,gid=$VM_GID "$MOUNT_DEVICE" "$MOUNT_POINT"; then
log "Mounted $MOUNT_DEVICE at $MOUNT_POINT with VM user permissions"
else
err "Failed to mount $MOUNT_DEVICE at $MOUNT_POINT"
fi
set_perms_recursive "$DISK_IMAGE" libvirt-qemu
set_perms_recursive "$DISK_IMAGE" Daan
log "Setting ACLs for $DISK_IMAGE"
setfacl -m u:libvirt-qemu:rwX "$DISK_IMAGE"
setfacl -m u:Daan:rwX "$DISK_IMAGE"
chown libvirt-qemu:kvm "$DISK_IMAGE"
chmod 660 "$DISK_IMAGE"
log "[DEBUG] Listing image file ACL and permissions:"
ls -l "$DISK_IMAGE" | tee -a "$LOGFILE"
getfacl "$DISK_IMAGE" | tee -a "$LOGFILE"
}
remount_drive_for_user() {
log "Preparing to remount $MOUNT_POINT with uid=$USER_UID, gid=$USER_GID and permissions..."
if umount "$MOUNT_POINT"; then
log "Successfully unmounted $MOUNT_POINT"
else
warn "Failed to unmount $MOUNT_POINT (might not be mounted)"
fi
if mount -o uid=$USER_UID,gid=$USER_GID "$MOUNT_DEVICE" "$MOUNT_POINT"; then
log "Mounted $MOUNT_DEVICE at $MOUNT_POINT with user permissions"
else
err "Failed to mount $MOUNT_DEVICE at $MOUNT_POINT"
fi
log "[DEBUG] Listing image file ACL and permissions:"
ls -l "$DISK_IMAGE" | tee -a "$LOGFILE"
getfacl "$DISK_IMAGE" | tee -a "$LOGFILE"
}
reload_modules(){
log "Reloading NVIDIA modules"
for mod in nvidia nvidia_uvm nvidia_modeset nvidia_drm; do
if modprobe $mod; then
log "Loaded module $mod"
else
warn "Failed to load module $mod"
fi
done
}
restore_before_unbinding(){
log "Restoring system state before unbinding"
remount_drive_for_user
reload_modules
log "Starting sddm"
systemctl start sddm
if ! pgrep -x plasmashell > /dev/null; then
log "plasmashell not running, attempting manual launch..."
if sudo -u daan DISPLAY=:0 XDG_SESSION_TYPE=wayland /usr/bin/plasmashell & then
log "plasmashell launched successfully"
else
warn "Failed to launch plasmashell manually"
fi
else
log "plasmashell already running"
fi
}
unbind_device() {
local dev="$1"
local unbind_path="/sys/bus/pci/devices/$dev/driver/unbind"
log "Attempting to unbind $dev from current driver"
if [ ! -e "/sys/bus/pci/devices/$dev/driver" ]; then
warn "No driver bound to $dev"
restore_before_unbinding
return 1
fi
if command -v nvidia-smi >/dev/null 2>&1 && [[ "$dev" =~ ^0000:08:00 ]]; then
log "Checking NVIDIA GPU compute processes for $dev"
local pids
pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader 2>/dev/null)
if [ -n "$pids" ]; then
log "GPU is currently in use by PIDs: $pids"
log "Attempting to kill GPU-using processes gracefully"
for pid in $pids; do
kill -15 "$pid" 2>/dev/null && log "Sent SIGTERM to PID $pid"
done
local wait_time=0
while [ $wait_time -lt 5 ]; do
sleep 1
wait_time=$((wait_time + 1))
pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader 2>/dev/null)
[ -z "$pids" ] && break
done
if [ -n "$pids" ]; then
log "Processes still running after SIGTERM, sending SIGKILL"
for pid in $pids; do
kill -9 "$pid" 2>/dev/null && log "Killed PID $pid with SIGKILL"
done
sleep 1
else
log "All GPU processes terminated gracefully"
fi
else
log "No GPU processes detected by nvidia-smi"
fi
fi
local driver_path
driver_path=$(readlink -f "/sys/bus/pci/devices/$dev/driver")
local driver_name
driver_name=$(basename "$driver_path")
log "Current driver for $dev is $driver_name"
if lsmod | grep -q "^${driver_name}"; then
local usage
usage=$(lsmod | awk -v mod="$driver_name" '$1 == mod {print $3}')
if [ "$usage" -gt 0 ]; then
warn "Driver $driver_name is in use by $usage modules. Attempting process cleanup."
log "Modules depending on $driver_name:"
lsmod | awk -v mod="$driver_name" '$1 == mod {print $4}' | tr ',' ' ' | while read -r dep; do
[ -n "$dep" ] && log " - $dep"
done
if command -v fuser >/dev/null 2>&1; then
log "Checking /dev/nvidia* usage with fuser"
for node in /dev/nvidia*; do
[ -e "$node" ] || continue
fuser_output=$(fuser -v "$node" 2>/dev/null)
[ -n "$fuser_output" ] || continue
log "Processes using $node:"
echo "$fuser_output" | tail -n +2 | while read -r line; do
pid=$(echo "$line" | awk '{print $2}')
if [ -n "$pid" ]; then
ps -p "$pid" -o pid,user,cmd --no-headers | while read -r proc; do
log " - $proc"
kill -9 "$pid" && log "Killed PID $pid using $node"
done
fi
done
done
else
warn "fuser not found; cannot check /dev/nvidia* usage"
fi
if lsmod | grep -q "^${driver_name}"; then
log "Attempting to unload kernel module $driver_name"
if modprobe -r "$driver_name"; then
log "Successfully unloaded $driver_name"
else
err "Failed to unload $driver_name after cleanup"
restore_before_unbinding
return 1
fi
fi
fi
fi
if [ -w "$unbind_path" ]; then
if echo "$dev" > "$unbind_path"; then
log "Unbound $dev successfully"
else
warn "Unbind of $dev failed (write error)"
restore_before_unbinding
return 1
fi
else
warn "Unbind path $unbind_path not writable or missing"
restore_before_unbinding
return 1
fi
return 0
}
unload_nvidia() {
log "Stopping sddm.service to free GPU"
systemctl stop sddm
log "Killing GPU-using processes"
pids=$(nvidia-smi | awk '/[0-9]+ +G/ {print $5}')
if [ -z "$pids" ]; then
log "No GPU-using processes found by nvidia-smi"
else
for pid in $pids; do
if kill -9 "$pid" 2>/dev/null; then
log "Killed PID $pid using GPU"
else
warn "Failed to kill PID $pid using GPU"
fi
done
fi
sleep 10
log "Stopping nvidia-persistenced service"
if sudo systemctl stop nvidia-persistenced; then
log "Stopped nvidia-persistenced successfully"
else
warn "Failed to stop nvidia-persistenced"
fi
for mod in nvidia_drm nvidia_modeset nvidia_uvm nvidia; do
log "Attempting to remove module $mod"
if sudo rmmod "$mod"; then
log "Removed module $mod successfully"
else
warn "Failed to remove module $mod"
fi
done
GPU_PCI_IDS=("0000:08:00.0" "0000:08:00.1" "0000:08:00.2" "0000:08:00.3")
for dev in "${GPU_PCI_IDS[@]}"; do
log "Processing device $dev for unbinding and vfio-pci binding"
if ! unbind_device "$dev"; then
err "Failed to unbind $dev"
restore_before_unbinding
exit 1
fi
echo "vfio-pci" > "/sys/bus/pci/devices/$dev/driver_override"
log "Set driver override to vfio-pci for $dev"
echo "$dev" > "/sys/bus/pci/drivers/vfio-pci/bind"
log "Bound $dev to vfio-pci"
done
for mod in nvidia_drm nvidia_modeset nvidia_uvm nvidia; do
if lsmod | grep -q "$mod"; then
log "Unloading $mod"
if ! modprobe -r "$mod"; then
warn "Failed to unload $mod"
if [[ "$mod" == "nvidia_drm" ]]; then
log "Attempting runtime override (may be unstable)..."
echo 0 > /sys/class/vtconsole/vtcon1/bind
log "Unbinding virtual console from framebuffer"
sleep 1
if ! modprobe -r "$mod"; then
err "Still failed to unload $mod after override. Aborting."
restore_before_unbinding
exit 1
fi
else
err "Failed to unload $mod. Aborting."
restore_before_unbinding
exit 1
fi
else
log "Unloaded $mod successfully after override"
fi
fi
done
if lspci -nnk | grep -A 2 "08:00.0" | grep -q "Kernel driver in use: nvidia"; then
err "GPU still bound to NVIDIA driver. Aborting."
restore_before_unbinding
exit 1
else
log "GPU successfully unbound from NVIDIA driver"
fi
}
reload_nvidia() {
GPU_PCI_IDS=("0000:08:00.0" "0000:08:00.1" "0000:08:00.2" "0000:08:00.3")
for dev in "${GPU_PCI_IDS[@]}"; do
if [ -e "/sys/bus/pci/devices/$dev/driver" ]; then
log "Unbinding $dev from vfio-pci"
if echo "$dev" > "/sys/bus/pci/devices/$dev/driver/unbind"; then
log "Unbound $dev from vfio-pci"
else
warn "Failed to unbind $dev from vfio-pci"
fi
fi
echo "nvidia" > "/sys/bus/pci/devices/$dev/driver_override"
log "Set driver override to nvidia for $dev"
echo "$dev" > "/sys/bus/pci/drivers/nvidia/bind"
log "Bound $dev to nvidia driver"
done
for mod in nvidia nvidia_uvm nvidia_modeset nvidia_drm; do
if modprobe "$mod"; then
log "Loaded module $mod"
else
warn "Failed to load module $mod"
fi
done
log "Starting nvidia-persistenced service"
if systemctl start nvidia-persistenced; then
log "Started nvidia-persistenced successfully"
else
warn "Failed to start nvidia-persistenced"
fi
log "Starting sddm"
systemctl start sddm
if ! pgrep -x plasmashell > /dev/null; then
log "plasmashell not running, attempting manual launch..."
if sudo -u daan DISPLAY=:0 XDG_SESSION_TYPE=wayland /usr/bin/plasmashell & then
log "plasmashell launched successfully"
else
warn "Failed to launch plasmashell manually"
fi
else
log "plasmashell already running"
fi
}
start_vm() {
log "Starting VM $VM_NAME"
sudo -u daan /usr/bin/virt-install \
--name "$VM_NAME" \
--ram 30720 \
--vcpus 12,sockets=1,cores=6,threads=2 \
--cpu host-passthrough \
--os-variant win10-2k19 \
--virt-type kvm \
--hvm \
--accelerate \
--graphics spice,listen=127.0.0.1,port=5930 \
--soundhw hda \
--video qxl \
--cdrom "$ISO_PATH" \
--disk path="$DISK_IMAGE",format=qcow2,bus=virtio,cache=none \
--host-device 08:00.0 \
--host-device 08:00.1 \
--host-device 08:00.2 \
--host-device 08:00.3 \
--network bridge=virbr0,model=virtio \
--boot menu=on \
--noautoconsole \
--cpu host-passthrough
log "VM $VM_NAME started (or exited virt-install)"
}
if [ "$1" == "start" ]; then
log "=== Starting GPU passthrough VM ==="
unload_nvidia
remount_drive_for_vm
start_vm
reload_nvidia
remount_drive_for_user
log "=== Done ==="
else
echo "Usage: $0 start"
fi |
Wat ik al gevonden:
[INFO] === Starting GPU passthrough VM ===
[INFO] Stopping sddm.service to free GPU
[INFO] Killing GPU-using processes
[INFO] Killed PID 844 using GPU
[INFO] Killed PID 877 using GPU
[INFO] Killed PID 966 using GPU
[INFO] Killed PID 1011 using GPU
[INFO] Killed PID 1240 using GPU
[INFO] Killed PID 1241 using GPU
[INFO] Killed PID 1373 using GPU
[INFO] Killed PID 1501 using GPU
[INFO] Killed PID 1780 using GPU
[INFO] Killed PID 2001 using GPU
[INFO] Stopping nvidia-persistenced service
[INFO] Stopped nvidia-persistenced successfully
[INFO] Attempting to remove module nvidia_drm
[INFO] Removed module nvidia_drm successfully
[INFO] Attempting to remove module nvidia_modeset
[INFO] Removed module nvidia_modeset successfully
[INFO] Attempting to remove module nvidia_uvm
[INFO] Removed module nvidia_uvm successfully
[INFO] Attempting to remove module nvidia
[INFO] Removed module nvidia successfully
[INFO] Processing device 0000:08:00.0 for unbinding and vfio-pci binding
[INFO] Attempting to unbind 0000:08:00.0 from current driver <---Hier gaat het mis!
[INFO] Restoring system state before unbinding
[INFO] Preparing to remount /run/media/Daan/Opslagmonster with uid=1000, gid=1000 and permissions...
[INFO] Successfully unmounted /run/media/Daan/Opslagmonster
[INFO] Mounted /dev/sda2 at /run/media/Daan/Opslagmonster with user permissions
[INFO] [DEBUG] Listing image file ACL and permissions:
-rwxrwxrwx 1 Daan Daan 107390828544 May 28 14:15 /run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# file: run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# owner: Daan
# group: Daan
user::rwx
group::rwx
other::rwx
[INFO] Reloading NVIDIA modules
[INFO] Loaded module nvidia
[INFO] Loaded module nvidia_uvm
[INFO] Loaded module nvidia_modeset
[INFO] Loaded module nvidia_drm
[INFO] Starting sddm
[INFO] plasmashell not running, attempting manual launch...
[INFO] plasmashell launched successfully
[INFO] Restoring system state before unbinding
Enig idee waarom modprobe -r "$driver_name hier faalt?
Is het beter een kant en klaar script aan te passen dan zelf het wiel uit willen vinden?
Of krijg ik nog eerder een steen aan het bloeden dan dat de nvidia driver mee gaat werken?
[ Voor 15% gewijzigd door switchboy op 28-05-2025 22:34 ]
My Steam Profile (Name Switch) Worth: 889€ (225€ with sales)Games owned: 83