Nvidia driver not working, preventing fedora from booting - but only sometimes


I’ve just switched from running Arch Linux on my PC to running Fedora 38. I’m now seeing some really weird behaviour in my graphics card driver (Nvidia driver). My Graphics Card is a 3080. I’ve had to jump through some (expected) hoops to get Fedora to install at all (adding nomodeset nouveau.nomodeset=1 to the grub command line. After that, it installed fine.

I’ve installed the proprietary Nvidia driver through rpmfusion (akmod-nvidia xorg-x11-drv-nvidia-cuda). Once installed, after a few minutes (if I don’t reboot) the system sometimes freezes (no errors found using journalctl). However, if I then press the reset button of my PC, it reboots successfully to SDDM and I can log in and it’s all working fine. As long as I only reboot after that, it keeps working fine. As soon as I shut down though and turn the PC back on, it stops booting (black screen, boot not listed in journalctl) and I have to add nomodeset and change nvidia_drm.modeset from 1 to 0 (it would boot without this change, but there’s some weird behaviour too, like the boot screen showing from time to time and it would sometimes crash, or show a black screen).

When it is working, it’s working great. Games run fine, and there’s no Problem. I’m out of clues as to what might be the problem. I’ve not had any of these kinds of issues running Arch with the proprietary Nvidia driver, using SDDM, KDE and Wayland, it would always boot, with Nvidia drm enabled, with or without Plymouth.

The Nvidia driver being the issue here is not confirmed, it’s just an educated guess on my part because of all the points listed above.

Additional note: I’ve got secure-boot enabled and the Nvidia driver is signed and loaded without any issues - the problems occur both with secure-boot turned on and off. They also occur, with all 3 boot options (kernel 6.4, 6.4 and rescue) and also with a different boot loaders (e.g. rEFInd)

Does anyone here have ideas on what i could try or what might be the problem here?

This sounds like an issue with the kernel command line options during boot that may cause the issue.

Please post the output of the following commands
cat /proc/cmdline
cat /etc/default/grub
inxi -Fzxx
dnf list installed \*nvidia\*

Hi Jeff,

thanks for the quick reply. Here’s the outout of the commands:

cat /proc/cmdline
cat /proc/cmdline

BOOT_IMAGE=(hd4,gpt2)/vmlinuz-6.4.12-200.fc38.x86_64 root=UUID=a0015b0e-e1e6-4c96-ba9a-dbc4c0f2b930 ro rootflags=subvol=root00 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet
cat /etc/default/grub
cat /etc/default/grub

GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet"
inxi -Fzxx
  Kernel: 6.4.12-200.fc38.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.39-9.fc38 Desktop: KDE Plasma v: 5.27.7 tk: Qt v: 5.15.10
    wm: kwin_wayland dm: SDDM Distro: Fedora release 38 (Thirty Eight)
  Type: Desktop Mobo: Micro-Star model: MPG Z490 GAMING CARBON WIFI (MS-7C73)
    v: 1.0 serial: <superuser required> UEFI: American Megatrends v: 1.70
    date: 02/04/2021
  Info: 8-core model: Intel Core i7-10700K bits: 64 type: MT MCP
    arch: Comet Lake rev: 5 cache: L1: 512 KiB L2: 2 MiB L3: 16 MiB
  Speed (MHz): avg: 4881 high: 4918 min/max: 800/12000 cores: 1: 4900
    2: 4893 3: 4900 4: 4901 5: 4918 6: 4589 7: 4904 8: 4900 9: 4901 10: 4900
    11: 4900 12: 4900 13: 4901 14: 4900 15: 4900 16: 4900 bogomips: 121596
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Device-1: Intel CometLake-S GT2 [UHD Graphics 630] vendor: Micro-Star MSI
    driver: i915 v: kernel arch: Gen-9.5 ports: active: none empty: DP-1,
    HDMI-A-1, HDMI-A-2, HDMI-A-3 bus-ID: 00:02.0 chip-ID: 8086:9bc5
  Device-2: NVIDIA GA102 [GeForce RTX 3080] vendor: Micro-Star MSI
    driver: nvidia v: 535.98 arch: Ampere pcie: speed: 5 GT/s lanes: 16 ports:
    active: none off: DP-2,DP-3,HDMI-A-4 empty: DP-4 bus-ID: 01:00.0
    chip-ID: 10de:2206
  Device-3: Logitech BRIO Ultra HD Webcam driver: snd-usb-audio,uvcvideo
    type: USB rev: 2.1 speed: 480 Mb/s lanes: 1 bus-ID: 1-3.3:10
    chip-ID: 046d:085e
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.9
    compositor: kwin_wayland driver: X: loaded: modesetting,nvidia
    unloaded: fbdev,nouveau,vesa alternate: nv dri: iris gpu: i915,nvidia
    d-rect: 6400x2520 display-ID: 0
  Monitor-1: DP-2 pos: top-center res: 2560x1440 size: N/A
  Monitor-2: DP-3 pos: bottom-l res: 1920x1080 size: N/A
  Monitor-3: HDMI-A-4 pos: bottom-r res: 1920x1080 size: N/A
  API: OpenGL v: 4.6.0 NVIDIA 535.98 renderer: NVIDIA GeForce RTX
    3080/PCIe/SSE2 direct-render: Yes
  Device-1: Intel Comet Lake PCH cAVS vendor: Micro-Star MSI
    driver: snd_hda_intel v: kernel bus-ID: 00:1f.3 chip-ID: 8086:06c8
  Device-2: NVIDIA GA102 High Definition Audio vendor: Micro-Star MSI
    driver: snd_hda_intel v: kernel pcie: speed: 8 GT/s lanes: 16
    bus-ID: 01:00.1 chip-ID: 10de:1aef
  Device-3: Logitech BRIO Ultra HD Webcam driver: snd-usb-audio,uvcvideo
    type: USB rev: 2.1 speed: 480 Mb/s lanes: 1 bus-ID: 1-3.3:10
    chip-ID: 046d:085e
  Device-4: Yamaha AG06/AG03 driver: snd-usb-audio type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 bus-ID: 1-4.3.1:14 chip-ID: 0499:170d
  Device-5: Corsair ST100 Headset Output
    driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
    lanes: 1 bus-ID: 1-4.4.1:18 chip-ID: 1b1c:0a32
  API: ALSA v: k6.4.12-200.fc38.x86_64 status: kernel-api
  Server-1: PipeWire v: 0.3.78 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin
  Device-1: Intel Comet Lake PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3 chip-ID: 8086:06f0
  IF: wlo1 state: up mac: <filter>
  Device-2: Realtek RTL8125 2.5GbE vendor: Micro-Star MSI driver: r8169
    v: kernel pcie: speed: 5 GT/s lanes: 1 port: 3000 bus-ID: 6f:00.0
    chip-ID: 10ec:8125
  IF: enp111s0 state: down mac: <filter>
  Device-1: Intel AX201 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 bus-ID: 1-14:20 chip-ID: 8087:0026
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
    lmp-v: 11
  Local Storage: total: 8.59 TiB used: 2.31 TiB (26.8%)
  ID-1: /dev/nvme0n1 vendor: Corsair model: Force MP510 size: 894.25 GiB
    speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 40.9 C
  ID-2: /dev/nvme1n1 vendor: Samsung model: SSD 970 EVO 1TB size: 931.51 GiB
    speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 38.9 C
  ID-3: /dev/sda vendor: Samsung model: SSD 860 EVO 1TB size: 931.51 GiB
    speed: 6.0 Gb/s serial: <filter>
  ID-4: /dev/sdb vendor: Samsung model: SSD 870 QVO 4TB size: 3.64 TiB
    speed: 6.0 Gb/s serial: <filter>
  ID-5: /dev/sdc vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB
    speed: 6.0 Gb/s serial: <filter>
  ID-6: /dev/sdd vendor: SanDisk model: SSD PLUS 480GB size: 447.14 GiB
    speed: 6.0 Gb/s serial: <filter>
  ID-1: / size: 929.93 GiB used: 36.66 GiB (3.9%) fs: btrfs
    dev: /dev/nvme1n1p3
  ID-2: /boot size: 973.4 MiB used: 245.4 MiB (25.2%) fs: ext4
    dev: /dev/nvme1n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 18.6 MiB (3.1%) fs: vfat
    dev: /dev/nvme1n1p1
  ID-4: /home size: 929.93 GiB used: 36.66 GiB (3.9%) fs: btrfs
    dev: /dev/nvme1n1p3
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
  System Temperatures: cpu: 43.0 C pch: 51.0 C mobo: N/A
  Fan Speeds (rpm): N/A
  Processes: 478 Uptime: 11m Memory: total: 64 GiB note: est.
  available: 62.51 GiB used: 6.42 GiB (10.3%) Init: systemd v: 253
  target: graphical (5) default: graphical Compilers: gcc: 13.2.1 Packages:
  pm: rpm pkgs: N/A note: see --rpm pm: flatpak pkgs: 13 Shell: fish v: 3.6.1
  running-in: konsole inxi: 3.3.29
dnf list installed \*nvidia\*
Installierte Pakete
akmod-nvidia.x86_64                                                           3:535.98-1.fc38                                     @rpmfusion-nonfree-updates
kmod-nvidia-6.4.12-200.fc38.x86_64.x86_64                                     3:535.98-1.fc38                                     @@commandline             
nvidia-persistenced.x86_64                                                    3:535.98-1.fc38                                     @rpmfusion-nonfree-updates
nvidia-settings.x86_64                                                        3:535.98-1.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia.x86_64                                                    3:535.98-2.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda.x86_64                                               3:535.98-2.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda-libs.i686                                            3:535.98-2.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda-libs.x86_64                                          3:535.98-2.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-kmodsrc.x86_64                                            3:535.98-2.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-libs.i686                                                 3:535.98-2.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-libs.x86_64                                               3:535.98-2.fc38                                     @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-power.x86_64                                              3:535.98-2.fc38                                     @rpmfusion-nonfree-updates

You seem to be missing one very critical package.
The nvidia-gpu-firmware package is missing from that list of installed packages.

I would run sudo dnf install nvidia-gpu-firmware followed by sudo dnf reinstall linux-firmware to ensure that all the firmware packages are up to date, then reboot.

This fix should solve the problem.

It seems to work reliably now, thanks.