Downgrading nvidia driver from 535

I’m trying to downgrade my nvidia driver from 535, since it is causing regular stuttering and frame drop issues that weren’t there before.

Steps I’ve taken:

  • Remove current driver: sudo dnf remove \*nvidia\* --exclude nvidia-gpu-firmware, then reboot
  • Lock out the problematic version: sudo dnf install 'dnf-command(versionlock)', then sudo dnf versionlock exclude akmod-nvidia-535.113.01-1.fc38
  • Install drivers following the normal instructions and verifying that all packages are nvidia 350.x and versionlocking anything that tries to install a different version as above
  • Reboot

I was surprised that despite everything apparently working, I’m still on nouveau after a reboot. lsmod | grep nouveau shows it loaded, and lsmod | grep nvidia shows nothing. All expected packages are installed:

 $ dnf list --installed | grep nvidia
akmod-nvidia.x86_64                                       3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
libva-nvidia-driver.x86_64                                0.0.10-3.fc38                          @updates                                         
nvidia-gpu-firmware.noarch                                20230919-1.fc38                        @updates                                         
nvidia-persistenced.x86_64                                3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
nvidia-settings.x86_64                                    3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia.x86_64                                3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia-cuda.x86_64                           3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia-cuda-libs.i686                        3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia-cuda-libs.x86_64                      3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia-kmodsrc.x86_64                        3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia-libs.i686                             3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia-libs.x86_64                           3:530.41.03-1.fc38                     @rpmfusion-nonfree                               
xorg-x11-drv-nvidia-power.x86_64                          3:530.41.03-1.fc38                     @rpmfusion-nonfree

So I tried manually triggering the akmod build, but it fails:

 $ sudo akmods
Checking kmods exist for 6.5.8-200.fc38.x86_64             [  OK  ]
Building and installing nvidia-kmod                        [FAILED]
Building rpms failed; see /var/cache/akmods/nvidia/530.41.03-1-for-6.5.8-200.fc38.x86_64.failed.log for details

Hint: Some kmods were ignored or failed to build or install.
You can try to rebuild and install them by by calling
'/usr/sbin/akmods --force' as root.

The log file for the failed build is here, the relevant error seems to be a lot of variations on this error:

2023/10/29 18:57:13 akmodsbuild: /tmp/akmodsbuild.M8wOQ9Bg/BUILD/nvidia-kmod-530.41.03/_kmod_build_6.5.8-200.fc38.x86_64/common/inc/nv-mm.h:88:16: error: too many arguments to function 'get_user_pages'
2023/10/29 18:57:13 akmodsbuild:    88 |         return get_user_pages(current, current->mm, start, nr_pages, write,
2023/10/29 18:57:13 akmodsbuild:       |                ^~~~~~~~~~~~~~
2023/10/29 18:57:13 akmodsbuild: ./include/linux/mm.h:2430:6: note: declared here
2023/10/29 18:57:13 akmodsbuild:  2430 | long get_user_pages(unsigned long start, unsigned long nr_pages,
2023/10/29 18:57:13 akmodsbuild:       |      ^~~~~~~~~~~~~~

My reading of this is that the akmod for version 530.41.03 is failing to build against my kernel version (since the function in the error seems to be from a kernel header).

Can anybody help me finish this downgrade? Is it just not possible to downgrade without also downgrading my kernel, and if so, how can I figure out which kernel version to revert to?

System info:

  • Fedora 38 Workstation, GNOME, X11
  • Kernel: 6.5.8-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 20 15:53:48 UTC 2023 x86_64 GNU/Linux
  • GPU: Nvidia 3070Ti (desktop)

You will need to also use an old kernel I would guess.
Or you will need to patch the nvidia glue code to use the changed way to call get_user_pages.
You could look at the latest driver for what is required.

What’s the best way of me finding the source of what’s being compiled?

Any idea how I might find which kernel version each is targeted for?

The name of the .c file will be in the akmodsbuild log.

From your pastbin I see this:

2023/10/29 18:57:13 akmodsbuild: In file included from /tmp/akmodsbuild.M8wOQ9Bg/BUILD/nvidia-kmod-530.41.03/_kmod_build_6.5.8-200.fc38.x86_64/common/inc/nv-pgprot.h:30,
2023/10/29 18:57:13 akmodsbuild:                  from /tmp/akmodsbuild.M8wOQ9Bg/BUILD/nvidia-kmod-530.41.03/_kmod_build_6.5.8-200.fc38.x86_64/common/inc/nv-linux.h:33,
2023/10/29 18:57:13 akmodsbuild:                  from /tmp/akmodsbuild.M8wOQ9Bg/BUILD/nvidia-kmod-530.41.03/_kmod_build_6.5.8-200.fc38.x86_64/nvidia/nv-nano-timer.c:31:
2023/10/29 18:57:13 akmodsbuild: ./include/linux/mm.h:2430:35: note: expected 'long unsigned int' but argument is of type 'struct task_struct *'
2023/10/29 18:57:13 akmodsbuild:  2430 | long get_user_pages(unsigned long start, unsigned long nr_pages,
2023/10/29 18:57:13 akmodsbuild:       |                     ~~~~~~~~~~~~~~^~~~~

So it should be in nv-nano-timer.c:31

This shows the system did not build and install the kmod-nvidia package for the currently installed nvidia packages (530.41.03).

Try that as sudo akmods --force and if it still fails then try
sudo akmods --kernels $(uname -r) --force

The same suggestion was at the bottom of what you posted for the failed akmods run.

I should have said, I tried akmods --force and it failed with exactly the same compilation error. Wouldn’t the default kernel for akmods be the current one anyway?

I meant where can I find the repo that contains the source code, not which file within the code is failing.

On my system I find

# find /usr -name nvidia-kmod*
/usr/share/nvidia-kmod-535.113.01
/usr/share/nvidia-kmod-535.113.01/nvidia-kmod-535.113.01-x86_64.tar.xz
/usr/src/akmods/nvidia-kmod.latest
/usr/src/akmods/nvidia-kmod-535.113.01-1.fc39.src.rpm

and 

ls -l /usr/src/akmods/
total 240
-rw-r--r--. 1 root root 90454 Sep 21 19:00  nvidia-kmod-535.113.01-1.fc39.src.rpm
lrwxrwxrwx. 1 root root    37 Sep 21 19:00  nvidia-kmod.latest -> nvidia-kmod-535.113.01-1.fc39.src.rpm

Those files come from installing akmod-nvidia from rpmfusion.

# dnf provides /usr/src/akmods/nvidia-kmod-535.113.01-1.fc39.src.rpm
Last metadata expiration check: 0:08:46 ago on Mon 30 Oct 2023 01:16:52 PM CDT.
akmod-nvidia-3:535.113.01-1.fc39.x86_64 : Akmod package for nvidia kernel module(s)
Repo        : @System
Matched from:
Filename    : /usr/src/akmods/nvidia-kmod-535.113.01-1.fc39.src.rpm

akmod-nvidia-3:535.113.01-1.fc39.x86_64 : Akmod package for nvidia kernel module(s)
Repo        : rpmfusion-nonfree
Matched from:
Filename    : /usr/src/akmods/nvidia-kmod-535.113.01-1.fc39.src.rpm

akmod-nvidia-3:535.113.01-1.fc39.x86_64 : Akmod package for nvidia kernel module(s)
Repo        : rpmfusion-nonfree-nvidia-driver
Matched from:
Filename    : /usr/src/akmods/nvidia-kmod-535.113.01-1.fc39.src.rpm