Random nouveau driver (?) crashes. Fedora 32 KDE spin

Tower computer with Asus Z390-A motherboard and

$ /sbin/lspci | grep -e VGA 01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)

graphics card. Sometimes my monitor flickers from time to time, but at other times, the monitor goes black, the fans start revving and the computer doesn’t respond to any input.

Very irritating and makes me worry about my hardware. I have been using Fedora for two days, and this has happened both days. I am unsure as to whether this is a nouveau driver issue, or a kernel issue. I have also installed the bits from the former Fedora Jam, but it doesn’t look like a real time kernel was installed, so it probably isn’t related to that. Help to track down this issue would be appreciated as I am losing work whenever I try to recover from the crash.

I should probably add that I have been using PCLinuxOS until two days ago, so while this may not be Fedora specific, I have never experienced before with Linux.

Here are my system specs for all eventualities:
Operating System: Fedora 32
KDE Plasma Version: 5.18.5
KDE Frameworks Version: 5.68.0
Qt Version: 5.13.2
Kernel Version: 5.6.10-300.fc32.x86_64
OS Type: 64-bit
Processors: 16 × Intel® Core™ i9-9900KS CPU @ 4.00GHz
Memory: 62.7 GiB of RAM

1 Like

That’s not surprising with relatively new graphics cards and nouveau, have you tried using the nVidia drivers from rpmfusion?

https://rpmfusion.org/Configuration
https://rpmfusion.org/Howto/NVIDIA

1 Like

I will give it a try.

I got a message at boot that there is an nvidia module missing and that the system was falling back to nouveau… Any idea what that is about?

If for whatever reason the kernel module failed to build, there should be a log file in /var/cache/akmods/nvidia/. What does it say?

There is a /var/cache/akmods/akmods.log:

2020/05/12 15:32:37 akmods: Checking kmods exist for 5.6.10-300.fc32.x86_64
2020/05/12 15:32:37 akmods: Files needed for building modules against kernel
2020/05/12 15:32:37 akmods: 5.6.10-300.fc32.x86_64 could not be found as the following
2020/05/12 15:32:37 akmods: directories are missing:
2020/05/12 15:32:37 akmods: /usr/src/kernels/5.6.10-300.fc32.x86_64/
2020/05/12 15:32:37 akmods: /lib/modules/5.6.10-300.fc32.x86_64/build/
2020/05/12 15:32:37 akmods: Is the correct kernel-devel package installed?
2020/05/12 15:34:28 akmods: Checking kmods exist for 5.6.10-300.fc32.x86_64
2020/05/12 15:34:28 akmods: Files needed for building modules against kernel
2020/05/12 15:34:28 akmods: 5.6.10-300.fc32.x86_64 could not be found as the following
2020/05/12 15:34:28 akmods: directories are missing:
2020/05/12 15:34:28 akmods: /usr/src/kernels/5.6.10-300.fc32.x86_64/
2020/05/12 15:34:28 akmods: /lib/modules/5.6.10-300.fc32.x86_64/build/
2020/05/12 15:34:28 akmods: Is the correct kernel-devel package installed?

My kernel-devel package is 5.6.11, but my kernel is 5.6.10. Strange.

Is it? If not, install it
sudo dnf install kernel-devel

I’m not sure if installing the package will trigger a rebuild of the kernel module, but you can either reboot and wait for the package to be built during boot (should be a few seconds on your rig), or run
sudo akmods --force
and then reboot.

I replied as you were editing.

What does rpm -qa | grep kernel | sort give?

You can install the kernel-devel packages for all the kernels you have installed, so no matter which one you boot, you’ll always have an nVidia module built for that kernel.

I think it is working now. A new kernel version apparently landed today and I hadn’t checked, as I don’t expect kernels to land that often. the kernel-devel package was of course updated at the same time. So I ran sudo dnf update -y and whatever akmods scripting does in the background appears to have setup the nvidia driver properly. I ran $ sudo dnf install akmod-nvidia Last metadata expiration check: 0:24:07 ago on Tue 12 May 2020 03:45:16 PM EEST. Package akmod-nvidia-3:440.82-1.fc32.x86_64 is already installed. Dependencies resolved. Nothing to do. Complete! and also looked at the CUDA. Everything seems fine (fingers crossed). Thanks for the quick replies and help.

This means that you have the akmod-nvidia package installed, not necessarily that the kernel module has been built for your current kernel. If it was built successfully, you should have another log file (or files) in /var/cache/akmods/nvidia/ saying so. If you have rebooted, check if the module is loaded. You can also (install and) run nvidia-settings.

Yup, looks OK. nvidia-settings seems to just open CUDA. The log is now there and ends with:

[code]Installed:
kmod-nvidia-5.6.11-300.fc32.x86_64-3:440.82-1.fc32.x86_64

Complete!
2020/05/12 16:07:52 akmods: Successful.[/code]

Huh? Is this not what you are seeing?

Hopefully we’ve solved your stability issues, otherwise we’ll have to look elsewhere.

P.S.: Love the avatar

1 Like