6.4.4 nvidia driver 535.54.03 doesnt work after kernel update

I am using Lenovo Legion 5 Pro 16ITH6 with rtx 3050 mobile nvidia gpu
Upgrading from 6.3.12 to 6.4.4 caused the nvidia driver (nvidia-powerd.service) failed in boot, and it still works fine when i having booted in 6.3.12, but it took a lot of time to boot (like 2 or 3 minute)

the “systemctl status nvidia-powerd.service” output, having booted into 6.4.4:

× nvidia-powerd.service - nvidia-powerd service
     Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Sun 2023-07-23 16:36:48 +0330; 1min 43s ago
    Process: 1138 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
   Main PID: 1138 (code=exited, status=1/FAILURE)
        CPU: 8ms

Jul 23 16:36:48 fedora systemd[1]: Starting nvidia-powerd.service - nvidia-powerd service...
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: nvidia-powerd version:1.0(build 1)
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: Allocate client failed 38
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: Failed to initialize RM Client
Jul 23 16:36:48 fedora systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 16:36:48 fedora systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.
Jul 23 16:36:48 fedora systemd[1]: Failed to start nvidia-powerd.service - nvidia-powerd service.

nvidia-smi output in 6.4.4:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

and i think this is maybe a bug and i report it on bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=2224839

Hello @goldenhat ,
Whenever the kernel updates, nvidia drivers always lag in updating and they break. Usually, when I was using nvidia gear, I would just use the open source built in nvidia kernel driver (nouveau) since it would get updated with every kernel update. But I do not use nvidia gear now and haven’t for some time so newer cards may not be supported as well on the generic open source kernel driver.

The error may exist as a result of the user doing a reboot quickly after a kernel update, which can interrupt the akmods rebuild of the drivers.

The easiest fix I have found is a simple

  1. sudo dnf remove kmod-nvidia-6.X.X* to remove the failing drivers (use the kernel version that has failed – in this case 6.4.4 )
  2. followed by sudo akmods --force to rebuild and reinstall the driver for the running kernel.
  3. Then after waiting at least 5 minutes for that step to properly complete do a reboot.

This seems to work in almost all cases.

1 Like