I’m currently having issues unbinding the GPU driver on my notebook which is running Fedora 30 and has Bumblebee installed and working. It used to work fine in this configuration and I can’t remember having changed anything related to bumblebee, my kernel parameters or my GPU drivers since.
I’m not sure when the issue started to happen because I only just noticed. It could have easily been there for over 3 months now.
Here’s some info on my system (kernel params, lspci output): system info · GitHub
My issue is that as soon as I run the following:
sudo optirun bash -c "echo '0000:01:00.0' > '/sys/bus/pci/devices/0000:01:00.0/driver/unbind'"
the bash process never exits. As soon as I hit Ctrl+C to cancel, the process uses 100% of the CPU core it is running on. Killing the process has no effect. I can’t even shut down normally anymore when that happens, I always have to push the power button for 5 seconds to forcefully turn it off.
(12% because I have 8 cores and 100/8 = ~12)
I think in the past it used to work just fine by running it without optirun:
sudo bash -c "echo '0000:01:00.0' > '/sys/bus/pci/devices/0000:01:00.0/driver/unbind'"
This now results in the following though:
bash: /sys/bus/pci/devices/0000:01:00.0/driver/unbind: No such file or directory
After running with optirun, which as I said, results in a process “freeze”, the following lines got added to my dmesg:
[ 4922.022633] bbswitch: enabling discrete graphics
[ 4922.525022] IPMI message handler: version 39.2
[ 4922.526962] ipmi device interface
[ 4922.579341] nvidia: module license 'NVIDIA' taints kernel.
[ 4922.579344] Disabling lock debugging due to kernel taint
[ 4922.591888] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 4922.794587] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 440.44 Sun Dec 8 03:38:56 UTC 2019
[ 4922.809585] nvidia-uvm: Loaded the UVM driver, major device number 235.
[ 4923.759096] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 440.44 Sun Dec 8 03:29:48 UTC 2019
[ 4923.957104] NVRM: Attempting to remove minor device 0 with non-zero usage count!
[ 4923.957106] ------------[ cut here ]------------
[ 4923.957258] WARNING: CPU: 1 PID: 3940 at /tmp/akmodsbuild.hOemv1NH/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.15-200.fc30.x86_64/nvidia/nv-pci.c:560 nv_pci_remove+0x343/0x370 [nvidia]
[ 4923.957261] Modules linked in: nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) ipmi_devintf ipmi_msghandler ccm xt_nat veth nf_conntrack_netlink xt_addrtype br_netfilter rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set overlay nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep bbswitch(OE) sunrpc vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp iwlmvm coretemp kvm_intel mac80211 uvcvideo kvm raid0 snd_hda_codec_realtek libarc4 snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio videobuf2_vmalloc snd_hda_intel iwlwifi btusb videobuf2_memops snd_soc_rt5640 irqbypass iTCO_wdt snd_hda_codec btrtl intel_cstate mei_hdcp btbcm intel_uncore
[ 4923.957290] videobuf2_v4l2 btintel videobuf2_common iTCO_vendor_support snd_soc_rl6231 bluetooth snd_soc_core videodev intel_rapl_perf cfg80211 snd_hda_core asus_wmi snd_compress input_polldev sparse_keymap snd_hwdep ac97_bus snd_pcm_dmaengine snd_seq i2c_i801 mc acpi_als rtsx_pci_ms ecdh_generic snd_seq_device lpc_ich kfifo_buf mei_me memstick rfkill ecc mei snd_pcm industrialio snd_timer snd soundcore acpi_pad ip_tables dm_crypt i915 rtsx_pci_sdmmc mmc_core crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_kms_helper crc32c_intel mxm_wmi drm ghash_clmulni_intel serio_raw rtsx_pci r8169 video wmi fuse
[ 4923.957313] CPU: 1 PID: 3940 Comm: bash Tainted: P W OE 5.3.15-200.fc30.x86_64 #1
[ 4923.957314] Hardware name: GIGABYTE P35V4/P35V4, BIOS FD0B 11/06/2017
[ 4923.957470] RIP: 0010:nv_pci_remove+0x343/0x370 [nvidia]
[ 4923.957473] Code: 4c 0b c3 eb 9f 41 8b 94 24 70 04 00 00 48 c7 c6 70 34 1b c2 bf 04 00 00 00 e8 89 86 00 00 48 c7 c7 b8 34 1b c2 e8 5b 4a 0c c3 <0f> 0b e8 d6 8c 00 00 eb f9 4c 89 e6 48 89 ef e8 59 7b 75 00 e9 23
[ 4923.957475] RSP: 0018:ffffb9c40aa37dd8 EFLAGS: 00010246
[ 4923.957477] RAX: 0000000000000024 RBX: ffff9aef53a8a000 RCX: 0000000000000006
[ 4923.957478] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff9aef56a57900
[ 4923.957479] RBP: ffff9aed4b7bb008 R08: ffffb9c40aa37c95 R09: 00000000000004e1
[ 4923.957481] R10: ffffb9c40aa37c90 R11: ffffb9c40aa37c95 R12: ffff9aeef9796000
[ 4923.957482] R13: ffff9aef53a8a000 R14: ffffb9c40aa37f00 R15: ffff9aef4a8bcaa0
[ 4923.957484] FS: 00007f4d9b03b740(0000) GS:ffff9aef56a40000(0000) knlGS:0000000000000000
[ 4923.957486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4923.957487] CR2: 0000561908dc38c8 CR3: 00000002197e0004 CR4: 00000000003606e0
[ 4923.957488] Call Trace:
[ 4923.957497] pci_device_remove+0x3b/0xa0
[ 4923.957501] device_release_driver_internal+0xd8/0x1b0
[ 4923.957504] unbind_store+0xef/0x120
[ 4923.957508] kernfs_fop_write+0x10e/0x190
[ 4923.957511] vfs_write+0xb6/0x1a0
[ 4923.957514] ksys_write+0x5f/0xe0
[ 4923.957518] do_syscall_64+0x5f/0x1a0
[ 4923.957522] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4923.957524] RIP: 0033:0x7f4d9b6d3218
[ 4923.957526] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 45 83 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
[ 4923.957527] RSP: 002b:00007fff5c274058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 4923.957529] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f4d9b6d3218
[ 4923.957530] RDX: 000000000000000d RSI: 0000561908dc28c0 RDI: 0000000000000001
[ 4923.957532] RBP: 0000561908dc28c0 R08: 0000561908dc28c0 R09: 000000000000000a
[ 4923.957533] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000d
[ 4923.957534] R13: 00007f4d9b7a76c0 R14: 000000000000000d R15: 00007f4d9b7a2800
[ 4923.957536] ---[ end trace ad6a1a5d61f73b3d ]---
Any ideas how I can fix that?
This is how I’m trying to rebind my dGPU in order to switch from the nvidia driver to the vfio driver:
DGPU_PCI_ADDRESS="01:00.0"
fedora@linux:~$ echo "> Retrieving and parsing dGPU IDs..."
> Retrieving and parsing dGPU IDs...
fedora@linux:~$ DGPU_IDS=$(sudo ${OPTIRUN_PREFIX}lspci -n -s "${DGPU_PCI_ADDRESS}" | grep -oP "\w+:\w+" | tail -1)
fedora@linux:~$ DGPU_VENDOR_ID=$(echo "${DGPU_IDS}" | cut -d ":" -f1)
fedora@linux:~$ DGPU_DEVICE_ID=$(echo "${DGPU_IDS}" | cut -d ":" -f2)
fedora@linux:~$ echo "> DGPU_IDS: $DGPU_IDS"
> DGPU_IDS: 10de:13d7
fedora@linux:~$ echo "> DGPU_VENDOR_ID: $DGPU_VENDOR_ID"
> DGPU_VENDOR_ID: 10de
fedora@linux:~$ echo "> DGPU_DEVICE_ID: $DGPU_DEVICE_ID"
> DGPU_DEVICE_ID: 13d7
echo "> Unbinding dGPU nvidia driver..."
> Unbinding dGPU nvidia driver...
sudo optirun bash -c "echo '0000:${DGPU_PCI_ADDRESS}' > '/sys/bus/pci/devices/0000:${DGPU_PCI_ADDRESS}/driver/unbind'"
# This line is never reached because it gets stuck in the previous line:
echo "> Binding dGPU to VFIO driver..."
sudo bash -c "echo '${DGPU_VENDOR_ID} ${DGPU_DEVICE_ID}' > '/sys/bus/pci/drivers/vfio-pci/new_id'"
(I need this in order to pass the dGPU through to a VM btw. and as I said, this used to work just fine.)