Nvidia's driver 560 is bricking distros

Hello, I am having a serious problem. I had to quit Pop_OS! because it updated to Nvidia’s driver 560 and it ruined my entire laptop, the entire experience: Games crash, applications take x3 times to open (Firefox takes about 85 seconds to open instead of the usual 10s as in 555), the keyboard and right-click begin to malfunction (keys like ‘n’, ‘b’, Caps Lock, ‘F1’ either get written twice or ignored very constantly, none of this happened before or in my dual boot windows, it’s not dust or a hardware problem) when there is a heavy app loaded, like Boxes, VS Code, World of Warcraft… Sometimes the notepad itself aswell, my Caps Lock behavior will switch (light on = lowercase), but I think I fixed that one. It is incredibly unbearable and annoying.

The worst part of all is that there is no way to downgrade anymore, both in Pop_OS! and Fedora now. I don’t know why they do this, but I tried a lot of things: runfile method (using CLI mode after a reboot), akm, and everything said in this post: Rolling back rpmfusion Nvidia drivers from 560 to 555 - #11 by caferino

Is this it? Is there no way out?

Specs:
OS: Fedora Linux 40 (Workstation Edition)
Host: HP ENVY m7 Notebook
Kernel: 6.11.4-201.fc40.x86_64
Packages: 2727 (rpm), 56 (flatpak)
Shell: bash 5.2.26
Resolution: 1920x1080
DE: GNOME 46.6
WM: Mutter
WM Theme: Adwaita
Theme: Nordic-darker-v40 [GTK2/3]
Icons: Adwaita [GTK2/3]
Terminal: gnome-terminal
CPU: Intel i7-7500U (4) @ 3.500GHz
GPU: Intel HD Graphics 620, NVIDIA GeForce 940MX
Memory: 2032MiB / 15869MiB

AKMODS doesn’t work either, I get these:

caferino@192:~$ sudo /usr/sbin/akmods --force
Checking kmods exist for 6.10.12-200.fc40.x86_64 [ OK ]
Building and installing nvidia-kmod [FAILED]
Building rpms failed; see /var/cache/akmods/nvidia/550.67-1-for-6.10.12-200.fc40.x86_64.failed.log for details

Hint: Some kmods were ignored or failed to build or install.
You can try to rebuild and install them by by calling
‘/usr/sbin/akmods --force’ as root.

Checking kmods exist for 6.11.4-201.fc40.x86_64 [ OK ]
Building and installing nvidia-kmod [FAILED]
Building rpms failed; see /var/cache/akmods/nvidia/550.67-1-for-6.11.4-201.fc40.x86_64.failed.log for details

Hint: Some kmods were ignored or failed to build or install.
You can try to rebuild and install them by by calling
‘/usr/sbin/akmods --force’ as root.

caferino@192:~modinfo -F version nvidia
modinfo: ERROR: Module nvidia not found.
caferino@192:~$ dnf list installed nvidia
Installed Packages
akmod-nvidia.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
nvidia-modprobe.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
nvidia-settings.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
xorg-x11-drv-nvidia.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
xorg-x11-drv-nvidia-cuda-libs.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
xorg-x11-drv-nvidia-kmodsrc.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
xorg-x11-drv-nvidia-libs.i686 3:550.67-1.fc40 @rpmfusion-nonfree
xorg-x11-drv-nvidia-libs.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
xorg-x11-drv-nvidia-power.x86_64 3:550.67-1.fc40 @rpmfusion-nonfree
caferino@192:~$

Do you have logs that show a problem?
Look in the system journal for kernel panics etc
Look in the user journal for gnome reporting errors.

If your hardware has no other issues, check the logs, then the problem is with your note book, not disros being bricked.
Fixing that may well need nvidia to make a fix to the drivers.

I am not tech-savy enough to understand the logs. The ones given by akmods talk a lot about missing stuff like this one:

/tmp/akmodsbuild.kGIrftDI/BUILD/nvidia-kmod-550.67/_kmod_build_6.10.12-200.fc40.x86_64/common/inc/nv-linux.h: In function ‘nv_vmap’:
/tmp/akmodsbuild.kGIrftDI/BUILD/nvidia-kmod-550.67/_kmod_build_6.10.12-200.fc40.x86_64/common/inc/nv-linux.h:674:51: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
674 | NV_MEMDBG_ADD(ptr, page_count * PAGE_SIZE);

there are hundreds of those. I feel so lost

You may get help with the old drivers.

But its likely that it is easier to fix forward. e.g. fix 560.

If you run with 560 and check the logs maybe there is something that can be done help.

I gave the logs a shot, but they are incredibly massive, specially Gnome’s (started from Sep 28, and it had like 1000 core dump lines, about 5,000 lines per day, all red, couldn’t stand holding Enter for so long, it’d take easily hours). It’s hard for me to write this post, my caps lock got inverted, letters ‘n’ ‘b’ are requiring me to press them like 5 times until they get written, and more. Tried to run WoW with this kernel 6.11 and 560 again, but it’s just impossible, right-click just doesn’t work at all. I first tried with 6.10 and 560 out of curiosity, and it was stable the first 5 minutes, but then it went all down very quickly and behaved just the same

I posted the logs from “journalctl -b” here:

Usually there are many blocks repeating similar messages. A big part of the success of linux is due to the “given enough eyeballs, all problems are shallow”. Many Nvidia problems are Nvidia firmware that conflicts with recent kernels, so will be present across multiple linux distros. journalctl collects massive amounts of data, and has many “filter” options to help select the relevant records. This is a lot of work. You can check to see if users on both Fedora and other distros have reported similar problems with journalctl records and the see if they match what your system produces.

Using pastebin does not allow others to find your journalctl records with web searches, so you are relying on others to post excerpts as searchable text.

It is relatively simple to limit the logs to only one boot period and to trim even that down to relevant portions.
journalctl -b 0 gives only the logs since the last boot
adding additional filters such as with -g (grep) or -p (priority) or --since (time such as since -5 min ) can trim the output significantly and remove a lot of unnecessary verbage in the logs. Once the data shown on screen is reasonably small then it can more easily be read to identify relevant parts.

The journalctl log might redundantly spew the same line over and over but unfortunately the timestamp makes a lot of the similar lines look different. So you could try stripping off the timestamp and pipe the result into “uniq”

journalctl -b 0 --no-pager --no-hostname | awk '{ for (i = 4; i <= NF; i++) printf "%s ",$i;print "" }' | uniq > mylog.log

uniq makes sure that consecutive duplicate lines only show one time.

Then you can try filtering that even more with ‘-g nvidia’

journalctl -b 0 --no-pager --no-hostname -g "nvidia"| awk '{ for (i = 4; i <= NF; i++) printf "%s ",$i;print "" }' | uniq > mylog.log

this might help make a much smaller dump log due to

  1. limiting it to one boot with “-b 0”
  2. limiting lines to those containing ‘nvidia’ via the -g option
  3. squashing repeats

If someone knows a slick built-in way to tell journalctl to dump the time stamp, please reply it because I hate resorting to that awk hack

1 Like

Sorry for the late reply, I got distracted. Today I tried to run World of Warcraft with nvidia-470 but it’d say the 3D accelerator card is not supported, so I installed 560 back and decided to play the game until it crashed (it has become completely unplayable, it will crash at ~10 minutes of gameplay ― happens more likely after that time and then starting combat with anything, or simply being idle after doing stuff like gathering or fighting early on) and see what could the logs say. I ran the command you suggested and I got this:

audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg=‘unit=nvidia-powerd comm=“systemd” exe=“/usr/lib/systemd/systemd” hostname=? addr=? terminal=? res=success’
systemd[1]: Started nvidia-powerd.service - nvidia-powerd service.
/usr/bin/nvidia-powerd[934]: nvidia-powerd version:1.0(build 1)
systemd[1]: nvidia-powerd.service: Deactivated successfully.
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg=‘unit=nvidia-powerd comm=“systemd” exe=“/usr/lib/systemd/systemd” hostname=? addr=? terminal=? res=success’
audit[14314]: SOFTWARE_UPDATE pid=14314 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 msg=‘op=install sw=“kmod-nvidia-6.11.5-300.fc41.x86_64-3:560.35.03-1.fc41.x86_64” sw_type=rpm key_enforce=0 gpg_res=0 root_dir=“/” comm=“dnf” exe=“/usr/bin/dnf5” hostname=? addr=? terminal=? res=success’
akmods[910]: Building and installing nvidia-kmod[ OK ]
kernel: nvidia: loading out-of-tree module taints kernel.
kernel: nvidia: module license ‘NVIDIA’ taints kernel.
kernel: nvidia: module license taints kernel.
kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 511
kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 560.35.03 Fri Aug 16 21:39:15 UTC 2024
kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
kernel: nvidia-uvm: Loaded the UVM driver, major device number 509.
kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 560.35.03 Fri Aug 16 21:21:48 UTC 2024
kernel: [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:04:00.0 on minor 0
kernel: nvidia 0000:04:00.0: [drm] No compatible format found
kernel: nvidia 0000:04:00.0: [drm] Cannot find any crtc or sizes
systemd[1]: nvidia-fallback.service - Fallback to nouveau as nvidia did not load was skipped because of an unmet condition check (ConditionPathExists=!/sys/module/nvidia).
gnome-shell[14417]: Added device ‘/dev/dri/card0’ (nvidia-drm) using atomic mode setting.
/usr/libexec/gdm-x-session[15115]: (II) Applying OutputClass “nvidia” to /dev/dri/card0
/usr/libexec/gdm-x-session[15115]: loading driver: nvidia
/usr/libexec/gdm-x-session[15115]: (==) Matched nvidia as autoconfigured driver 0
/usr/libexec/gdm-x-session[15115]: (II) LoadModule: “nvidia”
/usr/libexec/gdm-x-session[15115]: (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
/usr/libexec/gdm-x-session[15115]: (II) Module nvidia: vendor=“NVIDIA Corporation”
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA dlloader X Driver 560.35.03 Fri Aug 16 21:25:43 UTC 2024
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): Depth 24, (==) framebuffer bpp 32
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): RGB weight 888
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): Default visual is TrueColor
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): Using gamma correction (1.0, 1.0, 1.0)
/usr/libexec/gdm-x-session[15115]: () Option “AllowNVIDIAGpuScreens”
/usr/libexec/gdm-x-session[15115]: (II) Applying OutputClass “nvidia” options to /dev/dri/card0
/usr/libexec/gdm-x-session[15115]: (
) NVIDIA(G0): Option “SLI” “Auto”
/usr/libexec/gdm-x-session[15115]: () NVIDIA(G0): Option “BaseMosaic” “on”
/usr/libexec/gdm-x-session[15115]: (
) NVIDIA(G0): Option “AllowEmptyInitialConfiguration”
/usr/libexec/gdm-x-session[15115]: (WW) NVIDIA(G0): Invalid SLI option: ‘Auto’; using single GPU rendering.
/usr/libexec/gdm-x-session[15115]: (WW) NVIDIA(G0): Base Mosaic is available only on screen 0. Disabling Base
/usr/libexec/gdm-x-session[15115]: (WW) NVIDIA(G0): Mosaic.
/usr/libexec/gdm-x-session[15115]: (**) NVIDIA(G0): Enabling 2D acceleration
/usr/libexec/gdm-x-session[15115]: (II) Loading sub module “glxserver_nvidia”
/usr/libexec/gdm-x-session[15115]: (II) LoadModule: “glxserver_nvidia”
/usr/libexec/gdm-x-session[15115]: (II) Loading /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so
/usr/libexec/gdm-x-session[15115]: (II) Module glxserver_nvidia: vendor=“NVIDIA Corporation”
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA GLX Module 560.35.03 Fri Aug 16 21:27:48 UTC 2024
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA: The X server supports PRIME Render Offload.
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): NVIDIA GPU NVIDIA GeForce 940MX (GM108-A) at PCI:4:0:0
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): (GPU-0)
/usr/libexec/gdm-x-session[15115]: (–) NVIDIA(G0): Memory: 2097152 kBytes
/usr/libexec/gdm-x-session[15115]: (–) NVIDIA(G0): VideoBIOS: 82.08.59.00.7d
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): Detected PCI Express Link width: 4X
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): Validated MetaModes:
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): “NULL”
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): Virtual screen size determined to be 640 x 480
/usr/libexec/gdm-x-session[15115]: (WW) NVIDIA(G0): Unable to get display device for DPI computation.
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): DPI set to (75, 75); computed from built-in default
/usr/libexec/gdm-x-session[15115]: (WW) NVIDIA: Failed to bind sideband socket to
/usr/libexec/gdm-x-session[15115]: (WW) NVIDIA: ‘/var/run/nvidia-xdriver-4eeec4f3’ Permission denied
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA: Reserving 6144.00 MB of virtual memory for indirect memory
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA: access.
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): ACPI: failed to connect to the ACPI event daemon; the daemon
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): may not be running or the “AcpidSocketPath” X
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): configuration option may not be set correctly. When the
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): ACPI event daemon is available, the NVIDIA X driver will
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): try to use it to receive ACPI event notifications. For
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): details, please see the “ConnectToAcpid” and
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): “AcpidSocketPath” X configuration options in Appendix B: X
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): Config Options in the README.
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): Setting mode “NULL”
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): Disabling shared memory pixmaps
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): Backing store enabled
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): Silken mouse enabled
/usr/libexec/gdm-x-session[15115]: (==) NVIDIA(G0): DPMS enabled
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): [DRI2] Setup complete
/usr/libexec/gdm-x-session[15115]: (II) NVIDIA(G0): [DRI2] VDPAU driver: nvidia
systemd[15071]: Started app-gnome-nvidia\x2dsettings\x2duser-15484.scope - Application launched by gnome-session-binary.
systemd[15071]: app-gnome-nvidia\x2dsettings\x2duser-15484.scope: Consumed 374ms CPU time, 136M memory peak.
net.lutris.Lutris.desktop[17205]: 2024-11-01 21:53:28,346: NVIDIA Corporation GM108M [GeForce 940MX] (10de:134d 103c:81d4 nvidia) Driver 560.35.03

Also, in Lutris itself, I’d get dozens of these logs while playing. I asked in Lutris’ Discord to see if they might know what’s up, but from what I read in it it feels related to the cause; I’ve been assuming this whole problem has something to do with loading shaders/graphics from cache, because while playing it feels like something is happening there, but I am not tech savy enough to confirm this. I will keep digging, learning and see if something more useful comes up:

1593.504:07e8:08f0:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Pipeline cache marked dirty. Flush is scheduled.
1594.505:07e8:08f0:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Flushing disk cache (wakeup counter since last flush = 71). It seems like application has stopped creating new PSOs for the time being.
1608.880:07e8:08f0:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Pipeline cache marked dirty. Flush is scheduled.
1609.892:07e8:08f0:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Flushing disk cache (wakeup counter since last flush = 18). It seems like application has stopped creating new PSOs for the time being.

The error of my right-click not sticking to drag the camera around is still there and has gotten worse. It felt fixed with Fedora 41’s install, but it returned back to normal. It feels like it happens a lot less when the driver is freshly installed, I noticed that while troubleshooting, but it slowly builds up until it becomes very constant, in like 30mins, hence why I assume it has something to do with the cache or its size, but I am not sure yet.

While reading the journal’s logs, I find some errors weird, like the virtual screen size being 480p while my screen is actually running everything at 720p, some permission denied problems I might need to do chown somewhere on, not sure; the random NULL modes and values… I don’t know how to dissect this, feels like I need to know a lot of lore about how these things work, it’s intimidating, but I will try, I been dealing with these issues for the past 2 years, I want to help fix it however I can.