Mijn vraag
Ik probeer een script te maken wat mijn grafische omgeving de nek omdraait en de GPU vrijspeelt om door een VM te gebruiken
Als caveat staat de VM image op een NTFS partitie (steam drive) die ik ook nog eens toegankelijk wil hebben in de VM. Vandaar al het vechten met permissies; dat gedeelte van het script werkt en is getest. Alleen het vrijgeven van de GPU wil maar niet lukken. De host is TTY3 output net voor het script faalt wel kwijt (monitor gaat stand-by) dus ik moet dichtbij zijn.
Relevante software en hardware die ik gebruik
Arch Ryzen 7 3000 series x en een nvidia 2070 Super met de nvidia-open packages
Wat ik al gevonden:
[INFO] === Starting GPU passthrough VM ===
[INFO] Stopping sddm.service to free GPU
[INFO] Killing GPU-using processes
[INFO] Killed PID 844 using GPU
[INFO] Killed PID 877 using GPU
[INFO] Killed PID 966 using GPU
[INFO] Killed PID 1011 using GPU
[INFO] Killed PID 1240 using GPU
[INFO] Killed PID 1241 using GPU
[INFO] Killed PID 1373 using GPU
[INFO] Killed PID 1501 using GPU
[INFO] Killed PID 1780 using GPU
[INFO] Killed PID 2001 using GPU
[INFO] Stopping nvidia-persistenced service
[INFO] Stopped nvidia-persistenced successfully
[INFO] Attempting to remove module nvidia_drm
[INFO] Removed module nvidia_drm successfully
[INFO] Attempting to remove module nvidia_modeset
[INFO] Removed module nvidia_modeset successfully
[INFO] Attempting to remove module nvidia_uvm
[INFO] Removed module nvidia_uvm successfully
[INFO] Attempting to remove module nvidia
[INFO] Removed module nvidia successfully
[INFO] Processing device 0000:08:00.0 for unbinding and vfio-pci binding
[INFO] Attempting to unbind 0000:08:00.0 from current driver <---Hier gaat het mis!
[INFO] Restoring system state before unbinding
[INFO] Preparing to remount /run/media/Daan/Opslagmonster with uid=1000, gid=1000 and permissions...
[INFO] Successfully unmounted /run/media/Daan/Opslagmonster
[INFO] Mounted /dev/sda2 at /run/media/Daan/Opslagmonster with user permissions
[INFO] [DEBUG] Listing image file ACL and permissions:
-rwxrwxrwx 1 Daan Daan 107390828544 May 28 14:15 /run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# file: run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# owner: Daan
# group: Daan
user::rwx
group::rwx
other::rwx
[INFO] Reloading NVIDIA modules
[INFO] Loaded module nvidia
[INFO] Loaded module nvidia_uvm
[INFO] Loaded module nvidia_modeset
[INFO] Loaded module nvidia_drm
[INFO] Starting sddm
[INFO] plasmashell not running, attempting manual launch...
[INFO] plasmashell launched successfully
[INFO] Restoring system state before unbinding
Enig idee waarom modprobe -r "$driver_name hier faalt?
Is het beter een kant en klaar script aan te passen dan zelf het wiel uit willen vinden?
Of krijg ik nog eerder een steen aan het bloeden dan dat de nvidia driver mee gaat werken?
Ik probeer een script te maken wat mijn grafische omgeving de nek omdraait en de GPU vrijspeelt om door een VM te gebruiken
Als caveat staat de VM image op een NTFS partitie (steam drive) die ik ook nog eens toegankelijk wil hebben in de VM. Vandaar al het vechten met permissies; dat gedeelte van het script werkt en is getest. Alleen het vrijgeven van de GPU wil maar niet lukken. De host is TTY3 output net voor het script faalt wel kwijt (monitor gaat stand-by) dus ik moet dichtbij zijn.
Relevante software en hardware die ik gebruik
Arch Ryzen 7 3000 series x en een nvidia 2070 Super met de nvidia-open packages
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
| #!/bin/bash set -e LOGFILE="/var/log/gameVm.log" log() { echo "[INFO] $1" | tee -a "$LOGFILE" } warn() { echo "[WARN] $1" >&2 | tee -a "$LOGFILE" >&2 } err() { echo "[ERROR] $1" >&2 | tee -a "$LOGFILE" >&2 } trap 'restore_before_unbinding' ERR VM_NAME="windows-gaming" ISO_PATH="/run/media/Daan/Opslagmonster/Downloads/Software/win10/Win10_22H2_Dutch_x64v1.iso" DISK_IMAGE="/run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2" OVMF_CODE="/usr/share/edk2/x64/OVMF_CODE.4m.fd" OVMF_VARS="/usr/share/edk2/x64/OVMF_VARS.4m.fd" MOUNT_DEVICE="/dev/sda2" MOUNT_POINT="/run/media/Daan/Opslagmonster" USER_UID=1000 USER_GID=1000 VM_UID=958 VM_GID=958 SHARE_PATH="$MOUNT_POINT" SHARE_NAME="opslagmonster" set_perms_recursive() { local path="$1" local user="$2" log "Starting set_perms_recursive for user $user on path $path" while [ "$path" != "/" ] && [ -n "$path" ]; do log "Setting execute permission ACL for user $user on $path" if setfacl -m u:$user:x "$path"; then log "Successfully setfacl for $user on $path" else warn "Failed to setfacl for $user on $path" fi path=$(dirname "$path") done log "Finished set_perms_recursive for user $user" } remount_drive_for_vm() { log "Preparing to remount $MOUNT_POINT with uid=$VM_UID, gid=$VM_GID and permissions..." if umount "$MOUNT_POINT"; then log "Successfully unmounted $MOUNT_POINT" else warn "Failed to unmount $MOUNT_POINT (might not be mounted)" fi if mount -o uid=$VM_UID,gid=$VM_GID "$MOUNT_DEVICE" "$MOUNT_POINT"; then log "Mounted $MOUNT_DEVICE at $MOUNT_POINT with VM user permissions" else err "Failed to mount $MOUNT_DEVICE at $MOUNT_POINT" fi set_perms_recursive "$DISK_IMAGE" libvirt-qemu set_perms_recursive "$DISK_IMAGE" Daan log "Setting ACLs for $DISK_IMAGE" setfacl -m u:libvirt-qemu:rwX "$DISK_IMAGE" setfacl -m u:Daan:rwX "$DISK_IMAGE" chown libvirt-qemu:kvm "$DISK_IMAGE" chmod 660 "$DISK_IMAGE" log "[DEBUG] Listing image file ACL and permissions:" ls -l "$DISK_IMAGE" | tee -a "$LOGFILE" getfacl "$DISK_IMAGE" | tee -a "$LOGFILE" } remount_drive_for_user() { log "Preparing to remount $MOUNT_POINT with uid=$USER_UID, gid=$USER_GID and permissions..." if umount "$MOUNT_POINT"; then log "Successfully unmounted $MOUNT_POINT" else warn "Failed to unmount $MOUNT_POINT (might not be mounted)" fi if mount -o uid=$USER_UID,gid=$USER_GID "$MOUNT_DEVICE" "$MOUNT_POINT"; then log "Mounted $MOUNT_DEVICE at $MOUNT_POINT with user permissions" else err "Failed to mount $MOUNT_DEVICE at $MOUNT_POINT" fi log "[DEBUG] Listing image file ACL and permissions:" ls -l "$DISK_IMAGE" | tee -a "$LOGFILE" getfacl "$DISK_IMAGE" | tee -a "$LOGFILE" } reload_modules(){ log "Reloading NVIDIA modules" for mod in nvidia nvidia_uvm nvidia_modeset nvidia_drm; do if modprobe $mod; then log "Loaded module $mod" else warn "Failed to load module $mod" fi done } restore_before_unbinding(){ log "Restoring system state before unbinding" remount_drive_for_user reload_modules log "Starting sddm" systemctl start sddm if ! pgrep -x plasmashell > /dev/null; then log "plasmashell not running, attempting manual launch..." if sudo -u daan DISPLAY=:0 XDG_SESSION_TYPE=wayland /usr/bin/plasmashell & then log "plasmashell launched successfully" else warn "Failed to launch plasmashell manually" fi else log "plasmashell already running" fi } unbind_device() { local dev="$1" local unbind_path="/sys/bus/pci/devices/$dev/driver/unbind" log "Attempting to unbind $dev from current driver" if [ ! -e "/sys/bus/pci/devices/$dev/driver" ]; then warn "No driver bound to $dev" restore_before_unbinding return 1 fi if command -v nvidia-smi >/dev/null 2>&1 && [[ "$dev" =~ ^0000:08:00 ]]; then log "Checking NVIDIA GPU compute processes for $dev" local pids pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader 2>/dev/null) if [ -n "$pids" ]; then log "GPU is currently in use by PIDs: $pids" log "Attempting to kill GPU-using processes gracefully" for pid in $pids; do kill -15 "$pid" 2>/dev/null && log "Sent SIGTERM to PID $pid" done local wait_time=0 while [ $wait_time -lt 5 ]; do sleep 1 wait_time=$((wait_time + 1)) pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader 2>/dev/null) [ -z "$pids" ] && break done if [ -n "$pids" ]; then log "Processes still running after SIGTERM, sending SIGKILL" for pid in $pids; do kill -9 "$pid" 2>/dev/null && log "Killed PID $pid with SIGKILL" done sleep 1 else log "All GPU processes terminated gracefully" fi else log "No GPU processes detected by nvidia-smi" fi fi local driver_path driver_path=$(readlink -f "/sys/bus/pci/devices/$dev/driver") local driver_name driver_name=$(basename "$driver_path") log "Current driver for $dev is $driver_name" if lsmod | grep -q "^${driver_name}"; then local usage usage=$(lsmod | awk -v mod="$driver_name" '$1 == mod {print $3}') if [ "$usage" -gt 0 ]; then warn "Driver $driver_name is in use by $usage modules. Attempting process cleanup." log "Modules depending on $driver_name:" lsmod | awk -v mod="$driver_name" '$1 == mod {print $4}' | tr ',' ' ' | while read -r dep; do [ -n "$dep" ] && log " - $dep" done if command -v fuser >/dev/null 2>&1; then log "Checking /dev/nvidia* usage with fuser" for node in /dev/nvidia*; do [ -e "$node" ] || continue fuser_output=$(fuser -v "$node" 2>/dev/null) [ -n "$fuser_output" ] || continue log "Processes using $node:" echo "$fuser_output" | tail -n +2 | while read -r line; do pid=$(echo "$line" | awk '{print $2}') if [ -n "$pid" ]; then ps -p "$pid" -o pid,user,cmd --no-headers | while read -r proc; do log " - $proc" kill -9 "$pid" && log "Killed PID $pid using $node" done fi done done else warn "fuser not found; cannot check /dev/nvidia* usage" fi if lsmod | grep -q "^${driver_name}"; then log "Attempting to unload kernel module $driver_name" if modprobe -r "$driver_name"; then log "Successfully unloaded $driver_name" else err "Failed to unload $driver_name after cleanup" restore_before_unbinding return 1 fi fi fi fi if [ -w "$unbind_path" ]; then if echo "$dev" > "$unbind_path"; then log "Unbound $dev successfully" else warn "Unbind of $dev failed (write error)" restore_before_unbinding return 1 fi else warn "Unbind path $unbind_path not writable or missing" restore_before_unbinding return 1 fi return 0 } unload_nvidia() { log "Stopping sddm.service to free GPU" systemctl stop sddm log "Killing GPU-using processes" pids=$(nvidia-smi | awk '/[0-9]+ +G/ {print $5}') if [ -z "$pids" ]; then log "No GPU-using processes found by nvidia-smi" else for pid in $pids; do if kill -9 "$pid" 2>/dev/null; then log "Killed PID $pid using GPU" else warn "Failed to kill PID $pid using GPU" fi done fi sleep 10 log "Stopping nvidia-persistenced service" if sudo systemctl stop nvidia-persistenced; then log "Stopped nvidia-persistenced successfully" else warn "Failed to stop nvidia-persistenced" fi for mod in nvidia_drm nvidia_modeset nvidia_uvm nvidia; do log "Attempting to remove module $mod" if sudo rmmod "$mod"; then log "Removed module $mod successfully" else warn "Failed to remove module $mod" fi done GPU_PCI_IDS=("0000:08:00.0" "0000:08:00.1" "0000:08:00.2" "0000:08:00.3") for dev in "${GPU_PCI_IDS[@]}"; do log "Processing device $dev for unbinding and vfio-pci binding" if ! unbind_device "$dev"; then err "Failed to unbind $dev" restore_before_unbinding exit 1 fi echo "vfio-pci" > "/sys/bus/pci/devices/$dev/driver_override" log "Set driver override to vfio-pci for $dev" echo "$dev" > "/sys/bus/pci/drivers/vfio-pci/bind" log "Bound $dev to vfio-pci" done for mod in nvidia_drm nvidia_modeset nvidia_uvm nvidia; do if lsmod | grep -q "$mod"; then log "Unloading $mod" if ! modprobe -r "$mod"; then warn "Failed to unload $mod" if [[ "$mod" == "nvidia_drm" ]]; then log "Attempting runtime override (may be unstable)..." echo 0 > /sys/class/vtconsole/vtcon1/bind log "Unbinding virtual console from framebuffer" sleep 1 if ! modprobe -r "$mod"; then err "Still failed to unload $mod after override. Aborting." restore_before_unbinding exit 1 fi else err "Failed to unload $mod. Aborting." restore_before_unbinding exit 1 fi else log "Unloaded $mod successfully after override" fi fi done if lspci -nnk | grep -A 2 "08:00.0" | grep -q "Kernel driver in use: nvidia"; then err "GPU still bound to NVIDIA driver. Aborting." restore_before_unbinding exit 1 else log "GPU successfully unbound from NVIDIA driver" fi } reload_nvidia() { GPU_PCI_IDS=("0000:08:00.0" "0000:08:00.1" "0000:08:00.2" "0000:08:00.3") for dev in "${GPU_PCI_IDS[@]}"; do if [ -e "/sys/bus/pci/devices/$dev/driver" ]; then log "Unbinding $dev from vfio-pci" if echo "$dev" > "/sys/bus/pci/devices/$dev/driver/unbind"; then log "Unbound $dev from vfio-pci" else warn "Failed to unbind $dev from vfio-pci" fi fi echo "nvidia" > "/sys/bus/pci/devices/$dev/driver_override" log "Set driver override to nvidia for $dev" echo "$dev" > "/sys/bus/pci/drivers/nvidia/bind" log "Bound $dev to nvidia driver" done for mod in nvidia nvidia_uvm nvidia_modeset nvidia_drm; do if modprobe "$mod"; then log "Loaded module $mod" else warn "Failed to load module $mod" fi done log "Starting nvidia-persistenced service" if systemctl start nvidia-persistenced; then log "Started nvidia-persistenced successfully" else warn "Failed to start nvidia-persistenced" fi log "Starting sddm" systemctl start sddm if ! pgrep -x plasmashell > /dev/null; then log "plasmashell not running, attempting manual launch..." if sudo -u daan DISPLAY=:0 XDG_SESSION_TYPE=wayland /usr/bin/plasmashell & then log "plasmashell launched successfully" else warn "Failed to launch plasmashell manually" fi else log "plasmashell already running" fi } start_vm() { log "Starting VM $VM_NAME" sudo -u daan /usr/bin/virt-install \ --name "$VM_NAME" \ --ram 30720 \ --vcpus 12,sockets=1,cores=6,threads=2 \ --cpu host-passthrough \ --os-variant win10-2k19 \ --virt-type kvm \ --hvm \ --accelerate \ --graphics spice,listen=127.0.0.1,port=5930 \ --soundhw hda \ --video qxl \ --cdrom "$ISO_PATH" \ --disk path="$DISK_IMAGE",format=qcow2,bus=virtio,cache=none \ --host-device 08:00.0 \ --host-device 08:00.1 \ --host-device 08:00.2 \ --host-device 08:00.3 \ --network bridge=virbr0,model=virtio \ --boot menu=on \ --noautoconsole \ --cpu host-passthrough log "VM $VM_NAME started (or exited virt-install)" } if [ "$1" == "start" ]; then log "=== Starting GPU passthrough VM ===" unload_nvidia remount_drive_for_vm start_vm reload_nvidia remount_drive_for_user log "=== Done ===" else echo "Usage: $0 start" fi |
Wat ik al gevonden:
[INFO] === Starting GPU passthrough VM ===
[INFO] Stopping sddm.service to free GPU
[INFO] Killing GPU-using processes
[INFO] Killed PID 844 using GPU
[INFO] Killed PID 877 using GPU
[INFO] Killed PID 966 using GPU
[INFO] Killed PID 1011 using GPU
[INFO] Killed PID 1240 using GPU
[INFO] Killed PID 1241 using GPU
[INFO] Killed PID 1373 using GPU
[INFO] Killed PID 1501 using GPU
[INFO] Killed PID 1780 using GPU
[INFO] Killed PID 2001 using GPU
[INFO] Stopping nvidia-persistenced service
[INFO] Stopped nvidia-persistenced successfully
[INFO] Attempting to remove module nvidia_drm
[INFO] Removed module nvidia_drm successfully
[INFO] Attempting to remove module nvidia_modeset
[INFO] Removed module nvidia_modeset successfully
[INFO] Attempting to remove module nvidia_uvm
[INFO] Removed module nvidia_uvm successfully
[INFO] Attempting to remove module nvidia
[INFO] Removed module nvidia successfully
[INFO] Processing device 0000:08:00.0 for unbinding and vfio-pci binding
[INFO] Attempting to unbind 0000:08:00.0 from current driver <---Hier gaat het mis!
[INFO] Restoring system state before unbinding
[INFO] Preparing to remount /run/media/Daan/Opslagmonster with uid=1000, gid=1000 and permissions...
[INFO] Successfully unmounted /run/media/Daan/Opslagmonster
[INFO] Mounted /dev/sda2 at /run/media/Daan/Opslagmonster with user permissions
[INFO] [DEBUG] Listing image file ACL and permissions:
-rwxrwxrwx 1 Daan Daan 107390828544 May 28 14:15 /run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# file: run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# owner: Daan
# group: Daan
user::rwx
group::rwx
other::rwx
[INFO] Reloading NVIDIA modules
[INFO] Loaded module nvidia
[INFO] Loaded module nvidia_uvm
[INFO] Loaded module nvidia_modeset
[INFO] Loaded module nvidia_drm
[INFO] Starting sddm
[INFO] plasmashell not running, attempting manual launch...
[INFO] plasmashell launched successfully
[INFO] Restoring system state before unbinding
Enig idee waarom modprobe -r "$driver_name hier faalt?
Is het beter een kant en klaar script aan te passen dan zelf het wiel uit willen vinden?
Of krijg ik nog eerder een steen aan het bloeden dan dat de nvidia driver mee gaat werken?

[ Voor 15% gewijzigd door switchboy op 28-05-2025 22:34 ]
My Steam Profile (Name Switch) Worth: 889€ (225€ with sales)Games owned: 83