Systeem met 1 GPU (nvidia) passtrough naar VM (windows 10)

Pagina: 1
Acties:

Vraag


Acties:
  • 0 Henk 'm!

  • switchboy
  • Registratie: September 2002
  • Laatst online: 29-05 16:22

switchboy

-ruimte te huur-

Topicstarter
Mijn vraag
Ik probeer een script te maken wat mijn grafische omgeving de nek omdraait en de GPU vrijspeelt om door een VM te gebruiken

Als caveat staat de VM image op een NTFS partitie (steam drive) die ik ook nog eens toegankelijk wil hebben in de VM. Vandaar al het vechten met permissies; dat gedeelte van het script werkt en is getest. Alleen het vrijgeven van de GPU wil maar niet lukken. De host is TTY3 output net voor het script faalt wel kwijt (monitor gaat stand-by) dus ik moet dichtbij zijn.

Relevante software en hardware die ik gebruik

Arch Ryzen 7 3000 series x en een nvidia 2070 Super met de nvidia-open packages

code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
#!/bin/bash

set -e

LOGFILE="/var/log/gameVm.log"

log() {
  echo "[INFO] $1" | tee -a "$LOGFILE"
}

warn() {
  echo "[WARN] $1" >&2 | tee -a "$LOGFILE" >&2
}

err() {
  echo "[ERROR] $1" >&2 | tee -a "$LOGFILE" >&2
}
trap 'restore_before_unbinding' ERR

VM_NAME="windows-gaming"
ISO_PATH="/run/media/Daan/Opslagmonster/Downloads/Software/win10/Win10_22H2_Dutch_x64v1.iso"
DISK_IMAGE="/run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2"
OVMF_CODE="/usr/share/edk2/x64/OVMF_CODE.4m.fd"
OVMF_VARS="/usr/share/edk2/x64/OVMF_VARS.4m.fd"
MOUNT_DEVICE="/dev/sda2"
MOUNT_POINT="/run/media/Daan/Opslagmonster"
USER_UID=1000
USER_GID=1000
VM_UID=958
VM_GID=958
SHARE_PATH="$MOUNT_POINT"
SHARE_NAME="opslagmonster"

set_perms_recursive() {
  local path="$1"
  local user="$2"
  
  log "Starting set_perms_recursive for user $user on path $path"
  while [ "$path" != "/" ] && [ -n "$path" ]; do
    log "Setting execute permission ACL for user $user on $path"
    if setfacl -m u:$user:x "$path"; then
      log "Successfully setfacl for $user on $path"
    else
      warn "Failed to setfacl for $user on $path"
    fi
    path=$(dirname "$path")
  done
  log "Finished set_perms_recursive for user $user"
}

remount_drive_for_vm() {
  log "Preparing to remount $MOUNT_POINT with uid=$VM_UID, gid=$VM_GID and permissions..."
  if umount "$MOUNT_POINT"; then
    log "Successfully unmounted $MOUNT_POINT"
  else
    warn "Failed to unmount $MOUNT_POINT (might not be mounted)"
  fi
  if mount -o uid=$VM_UID,gid=$VM_GID "$MOUNT_DEVICE" "$MOUNT_POINT"; then
    log "Mounted $MOUNT_DEVICE at $MOUNT_POINT with VM user permissions"
  else
    err "Failed to mount $MOUNT_DEVICE at $MOUNT_POINT"
  fi
  set_perms_recursive "$DISK_IMAGE" libvirt-qemu
  set_perms_recursive "$DISK_IMAGE" Daan
  log "Setting ACLs for $DISK_IMAGE"
  setfacl -m u:libvirt-qemu:rwX "$DISK_IMAGE"
  setfacl -m u:Daan:rwX "$DISK_IMAGE"
  chown libvirt-qemu:kvm "$DISK_IMAGE"
  chmod 660 "$DISK_IMAGE"
  log "[DEBUG] Listing image file ACL and permissions:"
  ls -l "$DISK_IMAGE" | tee -a "$LOGFILE"
  getfacl "$DISK_IMAGE" | tee -a "$LOGFILE"
}

remount_drive_for_user() {
  log "Preparing to remount $MOUNT_POINT with uid=$USER_UID, gid=$USER_GID and permissions..."
  if umount "$MOUNT_POINT"; then
    log "Successfully unmounted $MOUNT_POINT"
  else
    warn "Failed to unmount $MOUNT_POINT (might not be mounted)"
  fi
  if mount -o uid=$USER_UID,gid=$USER_GID "$MOUNT_DEVICE" "$MOUNT_POINT"; then
    log "Mounted $MOUNT_DEVICE at $MOUNT_POINT with user permissions"
  else
    err "Failed to mount $MOUNT_DEVICE at $MOUNT_POINT"
  fi
  log "[DEBUG] Listing image file ACL and permissions:"
  ls -l "$DISK_IMAGE" | tee -a "$LOGFILE"
  getfacl "$DISK_IMAGE" | tee -a "$LOGFILE"
}

reload_modules(){
  log "Reloading NVIDIA modules"
  for mod in nvidia nvidia_uvm nvidia_modeset nvidia_drm; do
    if modprobe $mod; then
      log "Loaded module $mod"
    else
      warn "Failed to load module $mod"
    fi
  done
}

restore_before_unbinding(){
  log "Restoring system state before unbinding"
  remount_drive_for_user
  reload_modules
  log "Starting sddm"
  systemctl start sddm
  if ! pgrep -x plasmashell > /dev/null; then
    log "plasmashell not running, attempting manual launch..."
    if sudo -u daan DISPLAY=:0 XDG_SESSION_TYPE=wayland /usr/bin/plasmashell & then
      log "plasmashell launched successfully"
    else
      warn "Failed to launch plasmashell manually"
    fi
  else
    log "plasmashell already running"
  fi
}


unbind_device() {
  local dev="$1"
  local unbind_path="/sys/bus/pci/devices/$dev/driver/unbind"

  log "Attempting to unbind $dev from current driver"

  if [ ! -e "/sys/bus/pci/devices/$dev/driver" ]; then
    warn "No driver bound to $dev"
    restore_before_unbinding
    return 1
  fi

  if command -v nvidia-smi >/dev/null 2>&1 && [[ "$dev" =~ ^0000:08:00 ]]; then
    log "Checking NVIDIA GPU compute processes for $dev"
    local pids
    pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader 2>/dev/null)
    if [ -n "$pids" ]; then
      log "GPU is currently in use by PIDs: $pids"
      log "Attempting to kill GPU-using processes gracefully"
      for pid in $pids; do
        kill -15 "$pid" 2>/dev/null && log "Sent SIGTERM to PID $pid"
      done
      local wait_time=0
      while [ $wait_time -lt 5 ]; do
        sleep 1
        wait_time=$((wait_time + 1))
        pids=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader 2>/dev/null)
        [ -z "$pids" ] && break
      done
      if [ -n "$pids" ]; then
        log "Processes still running after SIGTERM, sending SIGKILL"
        for pid in $pids; do
          kill -9 "$pid" 2>/dev/null && log "Killed PID $pid with SIGKILL"
        done
        sleep 1
      else
        log "All GPU processes terminated gracefully"
      fi
    else
      log "No GPU processes detected by nvidia-smi"
    fi
  fi

  local driver_path
  driver_path=$(readlink -f "/sys/bus/pci/devices/$dev/driver")
  local driver_name
  driver_name=$(basename "$driver_path")
  log "Current driver for $dev is $driver_name"

  if lsmod | grep -q "^${driver_name}"; then
    local usage
    usage=$(lsmod | awk -v mod="$driver_name" '$1 == mod {print $3}')
    if [ "$usage" -gt 0 ]; then
      warn "Driver $driver_name is in use by $usage modules. Attempting process cleanup."

      log "Modules depending on $driver_name:"
      lsmod | awk -v mod="$driver_name" '$1 == mod {print $4}' | tr ',' ' ' | while read -r dep; do
        [ -n "$dep" ] && log " - $dep"
      done

      if command -v fuser >/dev/null 2>&1; then
        log "Checking /dev/nvidia* usage with fuser"
        for node in /dev/nvidia*; do
          [ -e "$node" ] || continue
          fuser_output=$(fuser -v "$node" 2>/dev/null)
          [ -n "$fuser_output" ] || continue
          log "Processes using $node:"
          echo "$fuser_output" | tail -n +2 | while read -r line; do
            pid=$(echo "$line" | awk '{print $2}')
            if [ -n "$pid" ]; then
              ps -p "$pid" -o pid,user,cmd --no-headers | while read -r proc; do
                log " - $proc"
                kill -9 "$pid" && log "Killed PID $pid using $node"
              done
            fi
          done
        done
      else
        warn "fuser not found; cannot check /dev/nvidia* usage"
      fi

      if lsmod | grep -q "^${driver_name}"; then
        log "Attempting to unload kernel module $driver_name"
        if modprobe -r "$driver_name"; then
          log "Successfully unloaded $driver_name"
        else
          err "Failed to unload $driver_name after cleanup"
          restore_before_unbinding
          return 1
        fi
      fi
    fi
  fi

  if [ -w "$unbind_path" ]; then
    if echo "$dev" > "$unbind_path"; then
      log "Unbound $dev successfully"
    else
      warn "Unbind of $dev failed (write error)"
      restore_before_unbinding
      return 1
    fi
  else
    warn "Unbind path $unbind_path not writable or missing"
    restore_before_unbinding
    return 1
  fi

  return 0
}


unload_nvidia() {
  log "Stopping sddm.service to free GPU"
  systemctl stop sddm

  log "Killing GPU-using processes"
  pids=$(nvidia-smi | awk '/[0-9]+ +G/ {print $5}')
  if [ -z "$pids" ]; then
    log "No GPU-using processes found by nvidia-smi"
  else
    for pid in $pids; do
      if kill -9 "$pid" 2>/dev/null; then
        log "Killed PID $pid using GPU"
      else
        warn "Failed to kill PID $pid using GPU"
      fi
    done
  fi

  sleep 10
  log "Stopping nvidia-persistenced service"
  if sudo systemctl stop nvidia-persistenced; then
    log "Stopped nvidia-persistenced successfully"
  else
    warn "Failed to stop nvidia-persistenced"
  fi

  for mod in nvidia_drm nvidia_modeset nvidia_uvm nvidia; do
    log "Attempting to remove module $mod"
    if sudo rmmod "$mod"; then
      log "Removed module $mod successfully"
    else
      warn "Failed to remove module $mod"
    fi
  done

  GPU_PCI_IDS=("0000:08:00.0" "0000:08:00.1" "0000:08:00.2" "0000:08:00.3")
  for dev in "${GPU_PCI_IDS[@]}"; do
    log "Processing device $dev for unbinding and vfio-pci binding"
    if ! unbind_device "$dev"; then
      err "Failed to unbind $dev"
      restore_before_unbinding
      exit 1
    fi
    echo "vfio-pci" > "/sys/bus/pci/devices/$dev/driver_override"
    log "Set driver override to vfio-pci for $dev"
    echo "$dev" > "/sys/bus/pci/drivers/vfio-pci/bind"
    log "Bound $dev to vfio-pci"
  done

  for mod in nvidia_drm nvidia_modeset nvidia_uvm nvidia; do
    if lsmod | grep -q "$mod"; then
      log "Unloading $mod"
      if ! modprobe -r "$mod"; then
        warn "Failed to unload $mod"
        if [[ "$mod" == "nvidia_drm" ]]; then
          log "Attempting runtime override (may be unstable)..."
          echo 0 > /sys/class/vtconsole/vtcon1/bind
          log "Unbinding virtual console from framebuffer"
          sleep 1
          if ! modprobe -r "$mod"; then
            err "Still failed to unload $mod after override. Aborting."
            restore_before_unbinding
            exit 1
          fi
        else
          err "Failed to unload $mod. Aborting."
          restore_before_unbinding
          exit 1
        fi
      else
        log "Unloaded $mod successfully after override"
      fi
    fi
  done

  if lspci -nnk | grep -A 2 "08:00.0" | grep -q "Kernel driver in use: nvidia"; then
    err "GPU still bound to NVIDIA driver. Aborting."
    restore_before_unbinding
    exit 1
  else
    log "GPU successfully unbound from NVIDIA driver"
  fi
}



reload_nvidia() {
  GPU_PCI_IDS=("0000:08:00.0" "0000:08:00.1" "0000:08:00.2" "0000:08:00.3")
  for dev in "${GPU_PCI_IDS[@]}"; do
    if [ -e "/sys/bus/pci/devices/$dev/driver" ]; then
      log "Unbinding $dev from vfio-pci"
      if echo "$dev" > "/sys/bus/pci/devices/$dev/driver/unbind"; then
        log "Unbound $dev from vfio-pci"
      else
        warn "Failed to unbind $dev from vfio-pci"
      fi
    fi
    echo "nvidia" > "/sys/bus/pci/devices/$dev/driver_override"
    log "Set driver override to nvidia for $dev"
    echo "$dev" > "/sys/bus/pci/drivers/nvidia/bind"
    log "Bound $dev to nvidia driver"
  done

  for mod in nvidia nvidia_uvm nvidia_modeset nvidia_drm; do
    if modprobe "$mod"; then
      log "Loaded module $mod"
    else
      warn "Failed to load module $mod"
    fi
  done

  log "Starting nvidia-persistenced service"
  if systemctl start nvidia-persistenced; then
    log "Started nvidia-persistenced successfully"
  else
    warn "Failed to start nvidia-persistenced"
  fi

  log "Starting sddm"
  systemctl start sddm

  if ! pgrep -x plasmashell > /dev/null; then
    log "plasmashell not running, attempting manual launch..."
    if sudo -u daan DISPLAY=:0 XDG_SESSION_TYPE=wayland /usr/bin/plasmashell & then
      log "plasmashell launched successfully"
    else
      warn "Failed to launch plasmashell manually"
    fi
  else
    log "plasmashell already running"
  fi
}


start_vm() {
  log "Starting VM $VM_NAME"
  sudo -u daan /usr/bin/virt-install \
    --name "$VM_NAME" \
    --ram 30720 \
    --vcpus 12,sockets=1,cores=6,threads=2 \
    --cpu host-passthrough \
    --os-variant win10-2k19 \
    --virt-type kvm \
    --hvm \
    --accelerate \
    --graphics spice,listen=127.0.0.1,port=5930 \
    --soundhw hda \
    --video qxl \
    --cdrom "$ISO_PATH" \
    --disk path="$DISK_IMAGE",format=qcow2,bus=virtio,cache=none \
    --host-device 08:00.0 \
    --host-device 08:00.1 \
    --host-device 08:00.2 \
    --host-device 08:00.3 \
    --network bridge=virbr0,model=virtio \
    --boot menu=on \
    --noautoconsole \
    --cpu host-passthrough
  log "VM $VM_NAME started (or exited virt-install)"
}

if [ "$1" == "start" ]; then
  log "=== Starting GPU passthrough VM ==="
  unload_nvidia
  remount_drive_for_vm
  start_vm
  reload_nvidia
  remount_drive_for_user
  log "=== Done ==="
else
  echo "Usage: $0 start"
fi


Wat ik al gevonden:
[INFO] === Starting GPU passthrough VM ===
[INFO] Stopping sddm.service to free GPU
[INFO] Killing GPU-using processes
[INFO] Killed PID 844 using GPU
[INFO] Killed PID 877 using GPU
[INFO] Killed PID 966 using GPU
[INFO] Killed PID 1011 using GPU
[INFO] Killed PID 1240 using GPU
[INFO] Killed PID 1241 using GPU
[INFO] Killed PID 1373 using GPU
[INFO] Killed PID 1501 using GPU
[INFO] Killed PID 1780 using GPU
[INFO] Killed PID 2001 using GPU
[INFO] Stopping nvidia-persistenced service
[INFO] Stopped nvidia-persistenced successfully
[INFO] Attempting to remove module nvidia_drm
[INFO] Removed module nvidia_drm successfully
[INFO] Attempting to remove module nvidia_modeset
[INFO] Removed module nvidia_modeset successfully
[INFO] Attempting to remove module nvidia_uvm
[INFO] Removed module nvidia_uvm successfully
[INFO] Attempting to remove module nvidia
[INFO] Removed module nvidia successfully
[INFO] Processing device 0000:08:00.0 for unbinding and vfio-pci binding
[INFO] Attempting to unbind 0000:08:00.0 from current driver <---Hier gaat het mis! :(
[INFO] Restoring system state before unbinding
[INFO] Preparing to remount /run/media/Daan/Opslagmonster with uid=1000, gid=1000 and permissions...
[INFO] Successfully unmounted /run/media/Daan/Opslagmonster
[INFO] Mounted /dev/sda2 at /run/media/Daan/Opslagmonster with user permissions
[INFO] [DEBUG] Listing image file ACL and permissions:
-rwxrwxrwx 1 Daan Daan 107390828544 May 28 14:15 /run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# file: run/media/Daan/Opslagmonster/vm/images/windows-gaming.qcow2
# owner: Daan
# group: Daan
user::rwx
group::rwx
other::rwx

[INFO] Reloading NVIDIA modules
[INFO] Loaded module nvidia
[INFO] Loaded module nvidia_uvm
[INFO] Loaded module nvidia_modeset
[INFO] Loaded module nvidia_drm
[INFO] Starting sddm
[INFO] plasmashell not running, attempting manual launch...
[INFO] plasmashell launched successfully
[INFO] Restoring system state before unbinding


Enig idee waarom modprobe -r "$driver_name hier faalt?

Is het beter een kant en klaar script aan te passen dan zelf het wiel uit willen vinden?
Of krijg ik nog eerder een steen aan het bloeden dan dat de nvidia driver mee gaat werken? :X

[ Voor 15% gewijzigd door switchboy op 28-05-2025 22:34 ]

My Steam Profile (Name Switch) Worth: 889€ (225€ with sales)Games owned: 83

Alle reacties


Acties:
  • 0 Henk 'm!

  • Hero of Time
  • Registratie: Oktober 2004
  • Laatst online: 22:26

Hero of Time

Moderator LNX

There is only one Legend

Beter zet je op je grub config al dat de kaart voor redirect wordt gebruikt. Dan gaat er helemaal niks van het systeem iets doen met de kaart. Als je dan ook nog de VM op auto-start on system boot zet met de redirect in de config, heb je snel door of het werkt of niet als je beeld krijgt met Windows.

Het hele punt is eigenlijk dat je helemaal geen driver of wat dan ook wilt laden. Het is daarna nog wel de vraag of de kaart ook echt gaat werken. Paar jaar geleden toen ik het probeerde, kreeg ik de beruchte error 10: unable to start device. Hebben ze wellicht al wel gefixt in een nieuwere driver, dacht zoiets langs te zien komen op nieuwssites.

Voor het punt dat ik denk waarom het faalt, is omdat je processen killed en dat zorgt er niet altijd voor dat system handles ook worden vrijgegeven.

Commandline FTW | Tweakt met mate


Acties:
  • 0 Henk 'm!

  • switchboy
  • Registratie: September 2002
  • Laatst online: 29-05 16:22

switchboy

-ruimte te huur-

Topicstarter
Dat kan, maar ik had juist gehoopt soort van on the fly de GPU te kunnen doorsluizen en erna weer een wayland sessie te starten. Een reboot nodig hebben haalt heel het punt van een VM onderuit. Dan kan ik net zo goed of misschien zelfs beter een dual boot setup maken.

My Steam Profile (Name Switch) Worth: 889€ (225€ with sales)Games owned: 83


Acties:
  • 0 Henk 'm!

  • Hero of Time
  • Registratie: Oktober 2004
  • Laatst online: 22:26

Hero of Time

Moderator LNX

There is only one Legend

Wat doe je nog meer waar je de videokaart op je host voor nodig hebt? Want een tweede videokaart voor Linux is dan veel eenvoudiger te doen.

Commandline FTW | Tweakt met mate