F39, kernel 6.6.11-200, nvidia driver 545.29.06-2, akmod-nvidia failed with segfault

Secure boot disabled, packages are from rpmfusion-nonfree-nvidia-driver

make[2]: *** [/usr/src/kernels/6.6.11-200.fc39.x86_64/Makefile:1931: /tmp/akmodsbuild.EheP8HGJ/BUILD/nvidia-kmod-545.29.06/_kmod_build_6.6.11-200.fc39.x86_64] Segmentation fault (core dumped)
make[1]: *** [Makefile:246: __sub-make] Error 2
make[1]: Leaving directory '/usr/src/kernels/6.6.11-200.fc39.x86_64'
make: *** [Makefile:82: modules] Error 2
make: INTERNAL: Exiting with 1 jobserver tokens available; should be 24!
error: Bad exit status from /var/tmp/rpm-tmp.sw9SNU (%build)

Packages I’ve installed

$ dnf list installed | grep nvidia
akmod-nvidia.x86_64                                  3:545.29.06-2.fc39                @rpmfusion-nonfree-nvidia-driver
nvidia-gpu-firmware.noarch                           20231211-1.fc39                   @updates                        
nvidia-modprobe.x86_64                               3:545.29.06-1.fc39                @rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64                               3:545.29.06-1.fc39                @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64                           3:545.29.06-2.fc39                @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64                 3:545.29.06-2.fc39                @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64                   3:545.29.06-2.fc39                @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64                      3:545.29.06-2.fc39                @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-power.x86_64                     3:545.29.06-2.fc39                @rpmfusion-nonfree-nvidia-driver

There’s no other error messages, any ideas?

This build worked when I just updated. You could try doing the driver build again and see if the problem is still present.

sudo akmods --rebuild --force
Checking kmods exist for 6.6.11-200.fc39.x86_64            [  OK  ]
Building and installing nvidia-kmod                        [  OK  ]

If this fails let us know and we can help debug why.

For reference this is what I have installed.

sudo dnf list installed *nvidia*
Installed Packages
akmod-nvidia.x86_64                                       3:545.29.06-2.fc39                  @rpmfusion-nonfree-updates
kmod-nvidia-6.6.11-200.fc39.x86_64.x86_64                 3:545.29.06-2.fc39                  @@commandline
kmod-nvidia-6.6.8-200.fc39.x86_64.x86_64                  3:545.29.06-2.fc39                  @@commandline
kmod-nvidia-6.6.9-200.fc39.x86_64.x86_64                  3:545.29.06-2.fc39                  @@commandline
libva-nvidia-driver.x86_64                                0.0.11-1.fc39                       @updates
nvidia-modprobe.x86_64                                    3:545.29.06-1.fc39                  @rpmfusion-nonfree-updates
nvidia-settings.x86_64                                    3:545.29.06-1.fc39                  @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia.x86_64                                3:545.29.06-2.fc39                  @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda-libs.x86_64                      3:545.29.06-2.fc39                  @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-kmodsrc.x86_64                        3:545.29.06-2.fc39                  @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-libs.i686                             3:545.29.06-2.fc39                  @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-libs.x86_64                           3:545.29.06-2.fc39                  @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-power.x86_64                          3:545.29.06-2.fc39                  @rpmfusion-nonfree-updates

And for kernel:

sudo dnf list installed *kernel*
Installed Packages
abrt-addon-kerneloops.x86_64                                     2.17.1-3.fc39                                  @fedora
kernel.x86_64                                                    6.6.8-200.fc39                                 @updates
kernel.x86_64                                                    6.6.9-200.fc39                                 @updates
kernel.x86_64                                                    6.6.11-200.fc39                                @updates
kernel-core.x86_64                                               6.6.8-200.fc39                                 @updates
kernel-core.x86_64                                               6.6.9-200.fc39                                 @updates
kernel-core.x86_64                                               6.6.11-200.fc39                                @updates
kernel-devel.x86_64                                              6.6.8-200.fc39                                 @updates
kernel-devel.x86_64                                              6.6.9-200.fc39                                 @updates
kernel-devel.x86_64                                              6.6.11-200.fc39                                @updates
kernel-devel-matched.x86_64                                      6.6.11-200.fc39                                @updates
kernel-headers.x86_64                                            6.6.3-200.fc39                                 @updates
kernel-modules.x86_64                                            6.6.8-200.fc39                                 @updates
kernel-modules.x86_64                                            6.6.9-200.fc39                                 @updates
kernel-modules.x86_64                                            6.6.11-200.fc39                                @updates
kernel-modules-core.x86_64                                       6.6.8-200.fc39                                 @updates
kernel-modules-core.x86_64                                       6.6.9-200.fc39                                 @updates
kernel-modules-core.x86_64                                       6.6.11-200.fc39                                @updates
kernel-modules-extra.x86_64                                      6.6.8-200.fc39                                 @updates
kernel-modules-extra.x86_64                                      6.6.9-200.fc39                                 @updates
kernel-modules-extra.x86_64                                      6.6.11-200.fc39                                @updates
kernel-srpm-macros.noarch                                        1.0-20.fc39                                    @fedora
libreport-plugin-kerneloops.x86_64                               2.17.11-3.fc39                                 @fedora

I’m getting the same segfault this time, but logs look different

BUILD/nvidia-kmod-545.29.06/_kmod_build_6.6.11-200.fc39.x86_64/nvidia-uvm/uvm_ampere.o /tmp/akmodsbuild.7I6ftkMT/BUILD/nvidia-kmod-545.29.06/_kmod_build_6.6.11-200.fc39.x86_64/nvidia-uvm/uvm_ampere.c  
2024/01/17 08:22:04 akmodsbuild: /tmp/ccmtW8WF.s: Assembler messages:
2024/01/17 08:22:04 akmodsbuild: /tmp/ccmtW8WF.s:33841: Internal error (Segmentation fault).
2024/01/17 08:22:04 akmodsbuild: Please report this bug.
2024/01/17 08:22:04 akmodsbuild: make[3]: *** [scripts/Makefile.build:243: /tmp/akmodsbuild.7I6ftkMT/BUILD/nvidia-kmod-545.29.06/_kmod_build_6.6.11-200.fc39.x86_64/nvidia-uvm/uvm_maxwell_access_counter_buffer.o] Error 1
2024/01/17 08:22:04 akmodsbuild: make[3]: *** Waiting for unfinished jobs....
2024/01/17 08:22:04 akmodsbuild: make[2]: *** [/usr/src/kernels/6.6.11-200.fc39.x86_64/Makefile:1931: /tmp/akmodsbuild.7I6ftkMT/BUILD/nvidia-kmod-545.29.06/_kmod_build_6.6.11-200.fc39.x86_64] Error 2
2024/01/17 08:22:04 akmodsbuild: make[1]: *** [Makefile:246: __sub-make] Error 2
2024/01/17 08:22:04 akmodsbuild: make[1]: Leaving directory '/usr/src/kernels/6.6.11-200.fc39.x86_64'
2024/01/17 08:22:04 akmodsbuild: make: *** [Makefile:82: modules] Error 2
2024/01/17 08:22:04 akmodsbuild: error: Bad exit status from /var/tmp/rpm-tmp.FveJin (%build)
2024/01/17 08:22:04 akmodsbuild: 
2024/01/17 08:22:04 akmodsbuild: RPM build errors:
2024/01/17 08:22:04 akmodsbuild:     Bad exit status from /var/tmp/rpm-tmp.FveJin (%build)
2024/01/17 08:22:04 akmodsbuild: 
2024/01/17 08:22:04 akmods: Building rpms failed; see /var/cache/akmods/nvidia/545.29.06-2-for-6.6.11-200.fc39.x86_64.failed.log for details

Nvidia packages:

$ sudo dnf list installed '*nvidia*'
Installed Packages
akmod-nvidia.x86_64                                   3:545.29.06-2.fc39                  @rpmfusion-nonfree-nvidia-driver
nvidia-gpu-firmware.noarch                            20231211-1.fc39                     @updates                        
nvidia-modprobe.x86_64                                3:545.29.06-1.fc39                  @rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64                                3:545.29.06-1.fc39                  @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64                            3:545.29.06-2.fc39                  @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64                  3:545.29.06-2.fc39                  @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64                    3:545.29.06-2.fc39                  @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64                       3:545.29.06-2.fc39                  @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-power.x86_64                      3:545.29.06-2.fc39                  @rpmfusion-nonfree-nvidia-driver

Kernel packages:

$ sudo dnf list installed '*kernel*'
Installed Packages
abrt-addon-kerneloops.x86_64                                      2.17.1-3.fc39                                   @fedora 
kernel.x86_64                                                     6.6.11-200.fc39                                 @updates
kernel-core.x86_64                                                6.6.11-200.fc39                                 @updates
kernel-devel.x86_64                                               6.6.11-200.fc39                                 @updates
kernel-devel-matched.x86_64                                       6.6.11-200.fc39                                 @updates
kernel-headers.x86_64                                             6.6.3-200.fc39                                  @updates
kernel-modules.x86_64                                             6.6.11-200.fc39                                 @updates
kernel-modules-core.x86_64                                        6.6.11-200.fc39                                 @updates
kernel-modules-extra.x86_64                                       6.6.11-200.fc39                                 @updates
kernel-srpm-macros.noarch                                         1.0-20.fc39                                     @fedora 
libreport-plugin-kerneloops.x86_64                                2.17.11-3.fc39                                  @fedora 

GPU is recognized

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)

Can you look at the dmesg output? I wonder if there are clues about the SEGV there.

dmesg | grep -i SEGV returns nothing.

I did see a few errors related to USB, wifi and BT devices

[    2.336417] usb 1-13: device descriptor read/64, error -71
...
[    3.758352] usb 1-13: device not accepting address 7, error -71
...
[    3.758618] usb usb1-port13: unable to enumerate USB device
...
[    6.201387] iwlwifi 0000:00:14.3: WRT: Invalid buffer destination
...
[    7.532028] Bluetooth: hci0: Malformed MSFT vendor event: 0x02

Since the nvidia driver didn’t work, I’m on nouveau, and it’s not happy either, I’m having random black screen when I login / logout from the DE. This is the errors I got when black screen happens.

[   18.898990] nouveau 0000:01:00.0: gr: TRAP ch 1 [05ffc8e000 kwin_wayland[7051]]
[   18.898995] nouveau 0000:01:00.0: gr: DISPATCH 80000001 [INJECTED_BUNDLE_ERROR]
[   18.899072] nouveau 0000:01:00.0: gr: TRAP ch 1 [05ffc8e000 kwin_wayland[7051]]
[   18.899076] nouveau 0000:01:00.0: gr: DISPATCH 80000001 [INJECTED_BUNDLE_ERROR]
...
[   20.900573] nouveau 0000:01:00.0: gr: failed to construct context
[   20.900624] nouveau 0000:01:00.0: fifo:c00000:0001:[kwin_wayland[7051]] ectx 0[gr]: -110
[   20.900626] nouveau 0000:01:00.0: fifo:c00000:0001:0001:[kwin_wayland[7051]] vctx 0[gr]: -110

Someone told me I may have the wrong repo. I have rpmfusion-nonfree-nvidia-driver but not rpmfusion-nonfree and rpmfusion-nonfree-updates. Adding those repos and disable the nvidia-driver one, I still can’t compile the driver, same segfault. I tried both the 545 and the 535. It may be related to my gcc toolchain…

2024/01/17 13:22:57 akmodsbuild: gcc: internal compiler error: Segmentation fault signal terminated program cc1
2024/01/17 13:22:57 akmodsbuild: Please submit a full bug report, with preprocessed source.
2024/01/17 13:22:57 akmodsbuild: See <http://bugzilla.redhat.com/bugzilla> for instructions.
2024/01/17 13:22:57 akmodsbuild: make[3]: *** [scripts/Makefile.build:243: /tmp/akmodsbuild.h3mva63h/BUILD/nvidia-kmod-535.129.03/_kmod_build_6.6.11-200.fc39.x86_64/nvidia/nv-p2p.o] Error 4
2024/01/17 13:22:57 akmodsbuild: The bug is not reproducible, so it is likely a hardware or OS problem.
2024/01/17 13:22:57 akmodsbuild: make[3]: *** [scripts/Makefile.build:243: /tmp/akmodsbuild.h3mva63h/BUILD/nvidia-kmod-535.129.03/_kmod_build_6.6.11-200.fc39.x86_64/nvidia/os-registry.o] Error 1
2024/01/17 13:22:57 akmodsbuild: make[2]: *** [/usr/src/kernels/6.6.11-200.fc39.x86_64/Makefile:1931: /tmp/akmodsbuild.h3mva63h/BUILD/nvidia-kmod-535.129.03/_kmod_build_6.6.11-200.fc39.x86_64] Error 2
2024/01/17 13:22:57 akmodsbuild: make[1]: *** [Makefile:246: __sub-make] Error 2
2024/01/17 13:22:57 akmodsbuild: make[1]: Leaving directory '/usr/src/kernels/6.6.11-200.fc39.x86_64'
2

Is this from the /var/cache/akmods/nvidia/545.29.06-2-for-6.6.11-200.fc39.x86_64.log file?

What are the lines before the crash of the gcc?

I think we have to debug why the compiler breaks, which may have nothing to do with nvidia drivers.

1 Like

Does gcc work with other sources? Are there changes from the current (updated) Fedora GCC tools?

GCC Bugzilla 61821 explains that the “bug is not reproducible” message is misleading for RedHat toolchains.

That was an erroneous understanding.
The files in both locations are supposed to be the same, and both repos happily exist side by side.

The rpmfusion-nonfree repos provide additional software besides the drivers but the driver files are the same in rpmfusion-nonfree-updates and rpmfusion-nonfree-nvidia-driver.

As Barry notes, this may be due to the compiler (or hardware) and not the nvidia drivers.

Is this from the /var/cache/akmods/nvidia/545.29.06-2-for-6.6.11-200.fc39.x86_64.log file?

Yes

What are the lines before the crash of the gcc?

Before the crash were standard make logs. I tried to re-install Fedora but same compile error even after the re-install.

Can’t be hardware issue because I switched from Ubuntu, I had no problem compiling both 535 and 545 drivers there.

Compiler issue maybe, but I had the same problem after re-installing Fedora so has to be something related to the distro.

Those are the lines that are the context for the error. Without the context the error message is next to meaningless. Show them please.

Also try to compile a simple C hello world program to see if that works.

1 Like