Regular crashing and failed boots under Kernels 6.1.x on Fedora 37 KDE Spin

Using Fedora 37’s KDE spin on Wayland on kernels 6.1.5 through 6.1.9, regularly the system will just blackscreen after entering my LUKS password. Once it blackscreens, hitting escape or any other key combination does not do anything, and I have to forcibly reboot by holding down the power button.

With my NVIDIA eGPU as primary, I’d say less than 50% of the time will it actually boot to the desktop. Even then, it frequently hangs shortly after boot. Additionally, opening electron apps frequently causes the system to hang.

With my Intel iGPU as primary, my system has always hangs, either blackscreening before boot or hanging shortly after boot.

I did not have these problems on Kernel 6.0.7 which shipped with the distro, but reinstalling it does not seem viable as the kernel headers necessary for the NVIDIA driver aren’t in the repos. How should I go about debugging and solving these problems?

Thanks!

Lets see if we can track down the problem.

Please post the output of the following
dnf list installed '*nvidia*' kernel* and inxi -Fzxx

Also, please verify the bios on your laptop is at the latest version available.

While tracking this down please do not use any electron apps so we can rule out one possibility. Using electron apps, since they are 3rd party, simply adds to the difficulty in tracking the problem.

Hi Jeff, thanks for the quick reply. I doubled checked that my laptop is on its latest BIOS version. Here is the output of dnf list installed '*nvidia*' kernel*:

Installed Packages
kernel.x86_64                       6.0.7-301.fc37         @fedora              
kernel.x86_64                       6.1.8-200.fc37         @updates             
kernel.x86_64                       6.1.9-200.fc37         @updates             
kernel-core.x86_64                  6.0.7-301.fc37         @fedora              
kernel-core.x86_64                  6.1.8-200.fc37         @updates             
kernel-core.x86_64                  6.1.9-200.fc37         @updates             
kernel-devel.x86_64                 6.1.7-200.fc37         @updates             
kernel-devel.x86_64                 6.1.8-200.fc37         @updates             
kernel-devel.x86_64                 6.1.9-200.fc37         @updates             
kernel-devel-matched.x86_64         6.1.9-200.fc37         @updates             
kernel-headers.x86_64               6.1.5-200.fc37         @updates             
kernel-modules.x86_64               6.0.7-301.fc37         @fedora              
kernel-modules.x86_64               6.1.8-200.fc37         @updates             
kernel-modules.x86_64               6.1.9-200.fc37         @updates             
kernel-modules-extra.x86_64         6.1.8-200.fc37         @updates             
kernel-modules-extra.x86_64         6.1.9-200.fc37         @updates             
kernel-srpm-macros.noarch           1.0-15.fc37            @fedora              
kmod-nvidia-latest-dkms.x86_64      3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-driver.x86_64                3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-driver-NVML.x86_64           3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-driver-NvFBCOpenGL.x86_64    3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-driver-cuda.x86_64           3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-driver-cuda-libs.x86_64      3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-driver-devel.x86_64          3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-driver-libs.x86_64           3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-kmod-common.noarch           3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-libXNVCtrl.x86_64            3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-libXNVCtrl-devel.x86_64      3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-modprobe.x86_64              3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-persistenced.x86_64          3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-settings.x86_64              3:525.85.12-1.fc37     @cuda-fedora37-x86_64
nvidia-xconfig.x86_64               3:525.85.12-1.fc37     @cuda-fedora37-x86_64

Here is the output of inxi -Fzxx. I’m not sure if this is related, but running this command had a tendency to also hang the system.

System:
  Kernel: 6.1.9-200.fc37.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.38-25.fc37 Desktop: KDE Plasma v: 5.26.5 tk: Qt v: 5.15.8
    wm: kwin_wayland dm: SDDM Distro: Fedora release 37 (Thirty Seven)
Machine:
  Type: Laptop System: Dell product: XPS 15 9500 v: N/A
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: Dell model: 0RDX6T v: A00 serial: <superuser required> UEFI: Dell
    v: 1.19.0 date: 09/06/2022
Battery:
  ID-1: BAT0 charge: 48.6 Wh (69.4%) condition: 70.0/84.3 Wh (83.0%)
    volts: 12.3 min: 11.4 model: SMP DELL 70N2F95 serial: <filter>
    status: not charging
  Device-1: wacom_battery_0 model: Wacom Intuos S serial: N/A charge: 0%
    status: N/A
CPU:
  Info: 8-core model: Intel Core i7-10875H bits: 64 type: MT MCP
    arch: Comet Lake rev: 2 cache: L1: 512 KiB L2: 2 MiB L3: 16 MiB
  Speed (MHz): avg: 3099 high: 3602 min/max: 800/5100 cores: 1: 3600 2: 3599
    3: 3600 4: 3205 5: 3600 6: 3598 7: 3580 8: 3601 9: 2300 10: 3599 11: 2300
    12: 3602 13: 2300 14: 2511 15: 2300 16: 2300 bogomips: 73598
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel CometLake-H GT2 [UHD Graphics] vendor: Dell driver: i915
    v: kernel arch: Gen-9.5 ports: active: none off: eDP-1 empty: DP-1,DP-2,DP-3
    bus-ID: 00:02.0 chip-ID: 8086:9bc4
  Device-2: NVIDIA TU117M [GeForce GTX 1650 Ti Mobile] vendor: Dell
    driver: vfio-pci v: N/A arch: Turing pcie: speed: 8 GT/s lanes: 8
    bus-ID: 01:00.0 chip-ID: 10de:1f95
  Device-3: NVIDIA GA106 [GeForce RTX 3060 Lite Hash Rate] vendor: eVga.com.
    driver: nvidia v: 525.85.12 arch: Ampere pcie: speed: 8 GT/s lanes: 4 ports:
    active: none off: DP-4,HDMI-A-1 empty: DP-5,DP-6 bus-ID: 08:00.0
    chip-ID: 10de:2504
  Device-4: Realtek Integrated_Webcam_HD type: USB driver: uvcvideo
    bus-ID: 1-11:3 chip-ID: 0bda:5510
  Device-5: MacroSilicon USB Video type: USB
    driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus-ID: 5-4:4
    chip-ID: 534d:2109
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.7
    compositor: kwin_wayland driver: X: loaded: modesetting,nvidia
    unloaded: fbdev,nouveau,vesa alternate: nv dri: iris
    gpu: i915,vfio-pci,nvidia d-rect: 4480x1440 display-ID: 0
  Monitor-1: DP-4 pos: primary,left res: 2560x1440 size: N/A
  Monitor-2: HDMI-A-1 pos: right res: 1920x1080 size: N/A
  API: OpenGL v: 4.6.0 NVIDIA 525.85.12 renderer: NVIDIA GeForce RTX
    3060/PCIe/SSE2 direct render: Yes
Audio:
  Device-1: Intel Comet Lake PCH cAVS vendor: Dell driver: snd_hda_intel
    v: kernel bus-ID: 5-2:2 bus-ID: 00:1f.3 chip-ID: 0930:0414
    chip-ID: 8086:06c8
  Device-2: NVIDIA GA106 High Definition Audio vendor: eVga.com.
    driver: snd_hda_intel v: kernel pcie: speed: 8 GT/s lanes: 4 bus-ID: 08:00.1
    chip-ID: 10de:228e
  Device-3: Toshiba Thunderbolt3 Dock Audio type: USB
    driver: hid-generic,snd-usb-audio,usbhid
  Device-4: Apple USB-C to 3.5mm Headphone Jack Adapter type: USB
    driver: hid-generic,snd-usb-audio,usbhid bus-ID: 5-3:3 chip-ID: 05ac:110a
  Device-5: MacroSilicon USB Video type: USB
    driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus-ID: 5-4:4
    chip-ID: 534d:2109
  Sound API: ALSA v: k6.1.9-200.fc37.x86_64 running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.65 running: yes
Network:
  Device-1: Intel Comet Lake PCH CNVi WiFi vendor: Rivet Networks
    driver: iwlwifi v: kernel bus-ID: 00:14.3 chip-ID: 8086:06f0
  IF: wlp0s20f3 state: up mac: <filter>
  Device-2: Intel I210 Gigabit Network vendor: Toshiba driver: igb v: kernel
    pcie: speed: 2.5 GT/s lanes: 1 port: 3000 bus-ID: 3c:00.0 chip-ID: 8086:1533
  IF: enp60s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: tailscale0 state: unknown speed: -1 duplex: full mac: N/A
Bluetooth:
  Device-1: Intel AX201 Bluetooth type: USB driver: btusb v: 0.8
    bus-ID: 1-14:4 chip-ID: 8087:0026
  Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends
Drives:
  Local Storage: total: 13.64 TiB used: 9.7 TiB (71.1%)
  ID-1: /dev/nvme0n1 vendor: Smart Modular Tech. model: SHGP31-1000GM-2
    size: 931.51 GiB speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 50.9 C
  ID-2: /dev/nvme1n1 vendor: Smart Modular Tech. model: SHGP31-2000GM
    size: 1.82 TiB speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 49.9 C
  ID-3: /dev/sda type: USB vendor: Western Digital model: WD120EMFZ-11A6JA0
    size: 10.91 TiB serial: <filter>
Partition:
  ID-1: / size: 1.82 TiB used: 170.42 GiB (9.2%) fs: btrfs dev: /dev/dm-0
    mapped: luks-7c8bb3c7-057d-4a60-b204-b23155e88758
  ID-2: /boot size: 973.4 MiB used: 280.7 MiB (28.8%) fs: ext4
    dev: /dev/nvme1n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 46.5 MiB (7.8%) fs: vfat
    dev: /dev/nvme1n1p1
  ID-4: /home size: 1.82 TiB used: 170.42 GiB (9.2%) fs: btrfs
    dev: /dev/dm-0 mapped: luks-7c8bb3c7-057d-4a60-b204-b23155e88758
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  Src: /sys System Temperatures: cpu: 70.0 C pch: 58.0 C mobo: 50.0 C
  Fan Speeds (RPM): fan-1: 3239 fan-2: 3513
  Power: 12v: N/A 5v: 5 3.3v: N/A vbat: N/A
Info:
  Processes: 442 Uptime: 2m Memory: 62.43 GiB used: 3.87 GiB (6.2%)
  Init: systemd v: 251 target: graphical (5) default: graphical Compilers:
  gcc: 12.2.1 Packages: pm: rpm pkgs: N/A note: see --rpm pm: flatpak pkgs: 36
  Shell: Bash v: 5.2.15 running-in: konsole inxi: 3.3.24

Looking at the nvidia drivers I would venture that the major portion of your issues is located there.
Every nvidia package you have was installed from the cuda-fedora37-x86_64 repo. I have seen numerous individuals that have reported problems with software from the cuda repo.

I would suggest that you seriously consider removing those drivers and instead install the nvidia drivers, including cuda, from the rpmfusion repo. Very few seem to report errors with nvidia drivers installed from rpmfusion.
I do not see the nvidia-gpu-firmware package in that list and it is required for the nvidia GPU. The output of inxi shows the nvidia GTX 1650 GPU is using the vfio-pci driver and not the nvidia driver, though it does show the nvidia driver for the RTX 3060 card. That laptop appears to have one intel iGPU and two nvidia dGPUs.

The following should do this should you decide to try it.

  1. enable the rpmfusion-nonfree-nvidia-driver repo through the gnome software center.

  2. dnf remove '*nvidia*' which will remove all the installed nvidia packages.

  3. dnf install nvidia-gpu-firmware and dnf reinstall linux-firmware to retrieve the missing firmware package for the nvidia GPUs. and any other missing firmware there may be.

  4. dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda --disablerepo=cuda-fedora37-x86_64 which should pull in the needed packages from rpmfusion along with any necessary dependencies.

  5. Allow at least 5 minutes or more after the above install completes then verify that the modules have built properly with dnf list installed kmod-nvidia* which should show the kmod-nvidia package for the running kernel once it has completed the build.

  6. Once the modules have been built properly reboot and then verify the modules have loaded properly with lsmod | grep nvidia. The return of about 5 lines should confirm the drivers are properly loaded and functional.

  7. Finally, the output of inxi -Gxx should show the nvidia driver in use and nvidia-smi should show details about the graphics.

I hope this helps and that you may consider switching over from the cuda repo to the rpmfusion repo.

Okay, thanks for the tips! I will give these a try. Here are a couple of things for reference regarding my setup:

  • The reason I used the NVIDIA CUDA repo’s drivers instead of the rpmfusion ones is that the rpmfusion ones seemed to never detect my external 3060 and I had some issues getting CUDA to work nicely. This may have been fixed though, so I will try rpmfusion again.
  • The dGPU (1650 Ti) is using vfio-pci intentionally to prevent the NVIDIA driver from setting it as primary, otherwise I can’t use the external GPU for rendering. The driver wouldn’t respect any configuration pointing to the eGPU as primary so this was the easiest option to effectively disable the dGPU.

Thanks again, I will report back on how RPMFusion works this time!

Installing the RPMFusion drivers appears to have completed successfully. The system is at least able to boot now with the NVIDIA eGPU plugged in, but the issues still persist - occassionally black screens, and sometimes freezing when running inxi -Fzxx.

The problem is much worse on the iGPU, which freezes during boot up still.

I have a similar issue, but no nvidia gpu, I have an AMD APU. Fedora hangs while booting 6.1 but works fine on 6.0. I stoped and disabled NetworkManager service and was able to boot in 6.1, but if I try to start NetworkManager I get no WiFi. I guess it’s a driver issue. The WiFi card I have is a Realtek RTL8723DE