Fedora 40: Nvidia driver running at random boots with kernel 6.8.9-300.fc40.x86_64, with 6.8.10 or 6.8.11 it doesn’t run at all

Like I said, Nvidia driver it’s running at random boots with kernel 6.8.9-300.fc40.x86_64 or 6.9.4, with latest kernels like 6.8.10 or 6.8.11 it doesn’t run at all. It’s not consistent, I need to reboot until nvidia-smi shows me working processes, the latest stable kernel was on F39, the 6.6.2-201, it worked at every boot, and now in F40 I can’t use that one anymore. I’ve made this ticket so Fedora will know that the problem persisted in F40, too.

[drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
fedora kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed!
❯ nvidia-smi
Sat Jun 15 12:33:59 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   43C    P8             11W /  115W |       1MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
❯ lspci | grep NVIDIA
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA104M [Geforce RTX 3070 Ti Laptop GPU] (rev a1)
0000:01:00.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)
❯  sudo dmesg | grep -i nvidia\\\|nvrm
[sudo] password for vnm_rzv: 
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau initcall_blacklist=simpledrm_platform_driver_init rhgb quiet initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    0.043870] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau initcall_blacklist=simpledrm_platform_driver_init rhgb quiet initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    7.420335] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input23
[    7.420417] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input24
[    7.420549] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input25
[    7.420591] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input26
[    8.091362] nvidia: loading out-of-tree module taints kernel.
[    8.091367] nvidia: module license 'NVIDIA' taints kernel.
[    8.091369] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    8.091369] nvidia: module license taints kernel.
[    8.217964] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[    8.218747] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[    8.218892] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    8.266490] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.90.07  Fri May 31 09:35:42 UTC 2024
[    8.322021] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[    8.389983] nvidia-uvm: Loaded the UVM driver, major device number 508.
[    8.427117] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.90.07  Fri May 31 09:30:47 UTC 2024
[    8.431627] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    8.869549] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1556)
[    8.869573] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    8.869628] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[    8.869752] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
❯ uname -a
Linux fedora 6.8.9-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May  2 18:59:06 UTC 2024 x86_64 GNU/Linux

Results when the driver is running as expected:

❯ nvidia-smi
Sat Jun 15 12:46:02 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8             12W /  115W |       7MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3892      G   /usr/bin/gnome-shell                            3MiB |
+-----------------------------------------------------------------------------------------+
❯  sudo dmesg | grep -i nvidia\\\|nvrm
[sudo] password for vnm_rzv: 
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau initcall_blacklist=simpledrm_platform_driver_init rhgb quiet initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    0.043966] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau initcall_blacklist=simpledrm_platform_driver_init rhgb quiet initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    7.324268] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input23
[    7.324311] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input24
[    7.324342] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input25
[    7.324377] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input26
[    7.997756] nvidia: loading out-of-tree module taints kernel.
[    7.997761] nvidia: module license 'NVIDIA' taints kernel.
[    7.997763] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    7.997764] nvidia: module license taints kernel.
[    8.116668] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[    8.117222] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[    8.117329] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    8.168591] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.90.07  Fri May 31 09:35:42 UTC 2024
[    8.221563] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[    8.639915] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1556)
[    8.639946] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    8.640198] nvidia-uvm: Loaded the UVM driver, major device number 508.
[    8.676849] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.90.07  Fri May 31 09:30:47 UTC 2024
[    8.681115] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   10.108689] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-4
[   10.117914] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-4
[   10.118552] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[   10.118804] nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes

Thank you for your time!

This may be a falacy. Nvidia-smi only shows processes if there are apps running that use the GPU. You did not say if this is a laptop or desktop, nor whether you are running X11 or Wayland.

Please post the output of inxi -Fzxx so we can see more info about the system.

1 Like

It’s a laptop, I’m using Wayland and gnome-shell should be a working process from the start, Nvidia GPU works as expected only when there are no more errors with drm like in the initial post, first quote, and gnome-shell being displayed in the processes in nvidia-smi is a sign that there are no more errors with drm and the gpu it’s working.

❯ inxi -Fzxx
System:
  Kernel: 6.8.9-300.fc40.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.41-34.fc40
  Desktop: GNOME v: 46.2 tk: GTK v: 3.24.42 wm: gnome-shell dm: GDM
    Distro: Fedora Linux 40 (Workstation Edition)
Machine:
  Type: Laptop System: LENOVO product: 82RF v: Legion 5 Pro 16IAH7H
    serial: <superuser required> Chassis: type: 10 v: Legion 5 Pro 16IAH7H
    serial: <superuser required>
  Mobo: LENOVO model: LNVNB161216 v: NO DPK serial: <superuser required>
    part-nu: LENOVO_MT_82RF_BU_idea_FM_Legion 5 Pro 16IAH7H UEFI: LENOVO
    v: J2CN40WW date: 04/15/2022
Battery:
  ID-1: BAT0 charge: 57.0 Wh (78.0%) condition: 73.1/80.0 Wh (91.3%)
    volts: 15.9 min: 15.4 model: Celxpert L21C4PC1 serial: <filter>
    status: not charging
  Device-1: hid-dc:2c:26:0d:5c:ca-battery model: Keychron K2 serial: N/A
    charge: N/A status: discharging
CPU:
  Info: 14-core (6-mt/8-st) model: 12th Gen Intel Core i9-12900H bits: 64
    type: MST AMCP arch: Alder Lake rev: 3 cache: L1: 1.2 MiB L2: 11.5 MiB
    L3: 24 MiB
  Speed (MHz): avg: 719 high: 1361 min/max: 400/4900:5000:3800 cores:
    1: 1361 2: 1211 3: 1015 4: 400 5: 1323 6: 400 7: 400 8: 1152 9: 1300 10: 619
    11: 400 12: 400 13: 400 14: 400 15: 1140 16: 860 17: 400 18: 400 19: 400
    20: 400 bogomips: 116736
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel Alder Lake-P GT2 [Iris Xe Graphics] vendor: Lenovo
    driver: i915 v: kernel arch: Gen-12.2 ports: active: eDP-1 empty: DP-1,DP-2
    bus-ID: 0000:00:02.0 chip-ID: 8086:46a6
  Device-2: NVIDIA GA104M [Geforce RTX 3070 Ti Laptop GPU] vendor: Lenovo
    driver: nvidia v: 550.90.07 arch: Ampere bus-ID: 0000:01:00.0
    chip-ID: 10de:24e0
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 24.1.0
    compositor: gnome-shell driver: X: loaded: modesetting,nvidia
    unloaded: fbdev,nouveau,vesa alternate: nv dri: iris gpu: i915
    display-ID: 0
  Monitor-1: eDP-1 model: BOE Display 0x0a1f res: 2560x1600 dpi: 189
    diag: 406mm (16")
  API: OpenGL v: 4.6 vendor: intel mesa v: 24.1.2 glx-v: 1.4 es-v: 3.2
    direct-render: yes renderer: Mesa Intel Graphics (ADL GT2)
    device-ID: 8086:46a6 display-ID: :0.0
  API: EGL Message: EGL data requires eglinfo. Check --recommends.
Audio:
  Device-1: Intel Alder Lake PCH-P High Definition Audio vendor: Lenovo
    driver: snd_hda_intel v: kernel bus-ID: 0000:00:1f.3 chip-ID: 8086:51c8
  Device-2: NVIDIA GA104 High Definition Audio vendor: Lenovo
    driver: snd_hda_intel v: kernel bus-ID: 0000:01:00.1 chip-ID: 10de:228b
  API: ALSA v: k6.8.9-300.fc40.x86_64 status: kernel-api
  Server-1: PipeWire v: 1.0.7 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin
Network:
  Device-1: Intel Alder Lake-P PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 0000:00:14.3 chip-ID: 8086:51f0
  IF: wlp0s20f3 state: up mac: <filter>
  Device-2: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
    vendor: Lenovo driver: r8169 v: kernel port: 3000 bus-ID: 0000:32:00.0
    chip-ID: 10ec:8168
  IF: enp50s0 state: down mac: <filter>
Bluetooth:
  Device-1: Intel AX211 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 bus-ID: 3-10:6 chip-ID: 8087:0033
  Report: btmgmt ID: hci0 rfk-id: 3 state: up address: <filter> bt-v: 5.3
    lmp-v: 12
RAID:
  Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
    v: 0.6 bus-ID: 0000:00:0e.0 chip-ID: 8086:467f
Drives:
  Local Storage: total: 1.86 TiB used: 927.48 GiB (48.6%)
  ID-1: /dev/nvme0n1 vendor: Micron model: MTFDKBA1T0TFH size: 953.87 GiB
    speed: 63.2 Gb/s lanes: 4 serial: <filter> temp: 46.9 C
  ID-2: /dev/nvme1n1 vendor: Micron model: MTFDKBA1T0TFH size: 953.87 GiB
    speed: 63.2 Gb/s lanes: 4 serial: <filter> temp: 48.9 C
Partition:
  ID-1: / size: 952.28 GiB used: 737.66 GiB (77.5%) fs: btrfs
    dev: /dev/nvme1n1p3
  ID-2: /boot size: 973.4 MiB used: 404.6 MiB (41.6%) fs: ext4
    dev: /dev/nvme1n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 23.2 MiB (3.9%) fs: vfat
    dev: /dev/nvme1n1p1
  ID-4: /home size: 952.28 GiB used: 737.66 GiB (77.5%) fs: btrfs
    dev: /dev/nvme1n1p3
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 45.0 C mobo: N/A
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 32 GiB note: est. available: 31.07 GiB used: 9.2 GiB (29.6%)
  Processes: 497 Power: uptime: 20m wakeups: 0 Init: systemd v: 255
    target: graphical (5) default: graphical
  Packages: pm: flatpak pkgs: 44 Compilers: gcc: 14.1.1 Shell: Zsh v: 5.9
    running-in: gnome-terminal inxi: 3.3.34

This is so annoying! I have 15 minutes of rebooting to get the right session where nvidia works out, maybe the next one will be the one… :angry:

You’re on a laptop ? You know how Prime works ? The GPU is only used when an application request it or is set up to explicitly use the GPU.

Go ahead and test the GPU with a software like Blender, Inkscape, Krita, Steam

Like @computersavvy say’s :

Screenshot from 2024-06-21 16-01-16
Screenshot from 2024-06-21 16-01-29

In this example Blender automatically launches with the GPU, Krita has the option to do so.

Yes, I know about Prime, this is related to this: F39 kernels problem.

Nvidia GPU works just like I said in the description, on random boots with that specific kernel. When those drm errors appear like I said before, I will not have the option to launch with the discrete card, they will launch only with the integrated card.

[drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
fedora kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed!

I wonder what is in the kernel command line for booting.
Please post cat /proc/cmdline

1 Like

I am now in a working Nvidia session, but I don’t think it matters for the outcome of this command.

❯ cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau initcall_blacklist=simpledrm_platform_driver_init rhgb quiet initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau

In that command line you have the above following the rhgb quiet and the below preceding those options.

I suggest that you completely remove that quoted part preceding the rhgb quiet and the initcall_blacklist=simpledrm_platform_driver_init following those options. When done you should have only nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau following the rhgb quiet.

Do this by editing the /etc/default/grub file and removing the unwanted parts from the line that begins with GRUB_CMDLINE_LINUX=.
Once that change has been made and the file saved run sudo grub2-mkconfig -o /boot/grub2/grub.cfg and then reboot

2 Likes

Let me get that straight, it should be looking like this:
before:

GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau initcall_blacklist=simpledrm_platform_driver_init rhgb quiet initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau"

after:

GRUB_CMDLINE_LINUX="rhgb quiet nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau"

?

Later edit:
I’ve done it like this, but no luck.

❯ cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rhgb quiet nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau

Those errors are still present. Should I change the GRUB_CMDLINE_LINUX as it was before or should I let it like this?

Note that reporting a problem here will get assistance in attempting to identify and fix the issue.

Also note that reporting a problem here does not report it to the developers where a bug may be identified and repaired. Bugs should be reported at bugzilla.redhat.com so the developers are able to 1. know there is an issue and 2. attempt a fix.

Can you now post the errors you are getting after making the change to the kernel command line. (yes you should leave it as is now)

2 Likes

I understand, I’ve already made a ticked to bugzilla from the last thread I’ve started, but I got no response.

These errors are from the present session where nvidia works as expected, I’ll edit and add the errors from the session where it doesn’t work when it will happen next time.

❯ sudo dmesg | grep -i nvidia\\\|nvrm
[sudo] password for vnm_rzv: 
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rhgb quiet nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    0.044430] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.9-300.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rhgb quiet nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    7.145188] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
[    7.145228] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22
[    7.145259] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input23
[    7.145291] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input24
[    7.813617] nvidia: loading out-of-tree module taints kernel.
[    7.813622] nvidia: module license 'NVIDIA' taints kernel.
[    7.813624] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    7.813625] nvidia: module license taints kernel.
[    8.116363] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[    8.117303] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[    8.117498] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    8.165581] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.90.07  Fri May 31 09:35:42 UTC 2024
[    8.220874] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[    8.289102] nvidia-uvm: Loaded the UVM driver, major device number 508.
[    8.327592] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.90.07  Fri May 31 09:30:47 UTC 2024
[    8.332488] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    9.820348] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-4
[    9.829432] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-4
[    9.830055] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[    9.830268] nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes

Those errors are still present. Should I change the GRUB_CMDLINE_LINUX as it was before or should I let it like this?

Here I was reffering to these errors:

[drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
fedora kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed!

I’ve upgraded the kernel and it’s the same, these are the errors:

❯ uname -a
Linux fedora 6.9.6-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 21 15:48:21 UTC 2024 x86_64 GNU/Linux
❯ sudo dmesg | grep -i nvidia\\\|nvrm
[sudo] password for vnm_rzv: 
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.9.6-200.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rhgb quiet nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    0.049641] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.9.6-200.fc40.x86_64 root=UUID=e281fdfb-17d3-4104-904b-8d787dacd632 ro rootflags=subvol=root rhgb quiet nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[    5.173877] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input23
[    5.173934] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input24
[    5.173964] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input25
[    5.173999] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input26
[    5.851195] nvidia: loading out-of-tree module taints kernel.
[    5.851200] nvidia: module license 'NVIDIA' taints kernel.
[    5.851202] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    5.851203] nvidia: module license taints kernel.
[    5.977861] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[    5.978692] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[    5.978806] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    6.023554] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.90.07  Fri May 31 09:35:42 UTC 2024
[    6.076860] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[    6.147602] nvidia-uvm: Loaded the UVM driver, major device number 508.
[    6.185592] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.90.07  Fri May 31 09:30:47 UTC 2024
**[    6.190079] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver**
**[    6.626347] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1556)**
**[    6.626371] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0**
**[    6.626528] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice**
**[    6.626614] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device**

With 555.58.02 driver looks like is even worst, not even 6.8.9 kernel on random times doesn’t work anymore, or it didn’t happen’ until now.

Random boot success suggests there maybe a race condition betwwen nvidia and intel driver loading.

remove the nvidia modset option

sudo grubby --update-kernel=ALL --remove-args='nvidia-drm.modeset=1'

to speed up nvidia loading, try

sudo dracut -fvv --add-drivers " nvidia nvidia-drm nvidia-modeset nvidia-uvm "

reboot the same kernel

2 Likes

I did as you said, and on this reboot, it works as expected:

❯ nvidia-smi
Tue Jul  9 17:25:33 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 555.58.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8             13W /  115W |       9MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      4057      G   /usr/bin/gnome-shell                            3MiB |
+-----------------------------------------------------------------------------------------+

I will need to test more to see if it will be consistent. Thank you for your idea, thou!
I will update after some testing.

1 Like

I’ve tested the kernel that I used at the moment I’ve used the commands provided by you and nvidia worked as expected on every boot, but now I’ve installed another newer kernel and the errors reappeared, do I need to redo the commands on every new kernel, and can you explain more what the commands do?

Thank you!

The command I gave only fixes the running kernel, it overrides the dracut module blacklist for nvidia /usr/lib/dracut/dracut.conf.d/99-nvidia-dracut.conf
You will need to rerun the command after each kernel update.

I’ve reused the commands and now it’s working as expected. Do you have any idea why this is happening for me? Thank you, I was really close to change the distro, and that was a shame because I really like Fedora!

The issue was caused by intel loading before nvidia, adding the nvidia modules to initramfs enabled them to load before intel.

1 Like