jacek
(Jacek Pliszka)
January 19, 2024, 9:40pm
1
Ever since I switched to Wayland I experience occasional crashes of gnome-shell.
Sometimes 0 per week, sometime 5 or 6 or more. Today I had 2 such crashes.
Process 3208 (gnome-shell) of user 1000 dumped core.
0x00007fe9f80e6b98 pushbuf_kref (libdrm_nouveau.so.2 + 0x5b98)
0x00007fe9f80e72ec pushbuf_validate (libdrm_nouveau.so.2 + 0x62ec)
0x00007fe9ce48ac79 nvc0_flush (nouveau_dri.so + 0xa8ac79)
Anyone knows how should I proceed to have chance for a fix in the future?
chrisawi
(Chris Williams)
January 19, 2024, 9:50pm
2
What GPU model do you have?
You can file a bug against nouveau: MesaDrivers · freedesktop.org
Depending on your nvidia GPU model, you may want to consider using the proprietary nvidia driver.
jacek
(Jacek Pliszka)
January 20, 2024, 2:41pm
3
Thank you for response:
Device-1: Intel CoffeeLake-H GT2 [UHD Graphics 630] driver: i915 v: kernel
Device-2: NVIDIA TU117GLM [Quadro T1000 Mobile] driver: nouveau v: kernel
Device-3: Cheng Uei Precision Industry (Foxlink) HP Wide Vision HD
proprietary driver is too problematic for me. I have secure boot and a few other things enabled and using Nvidia drivers was too much hassle…
chrisawi
(Chris Williams)
January 20, 2024, 4:54pm
4
You could maybe disable the nvidia GPU and just run on the Intel IGP.
The good news is that your card is Turing-based, so it should be supported by nvidia’s new firmware-heavy open kernel module, and thus nouveau (with NVK) should be able to properly drive the hardware in the future using that same firmware.
jacek
(Jacek Pliszka)
March 24, 2024, 12:12pm
5
I think I am running on Intel. Not sure why nouveau gets loaded. How do you suggest to disable it?
Also now crash is worse - everything dies, screen completely frozen:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1 ] PREEMPT SMP NOPTI
CPU: 9 PID: 2615 Comm: gnome-shell Tainted: G O 6.7.9-200.fc39.x86_64 #1
Hardware name: HP HP ZBook 15 G6/860F, BIOS R92 Ver. 01.20.01 06/30/2022
RIP: 0010:gp100_vmm_pgt_mem+0xbb/0x170 [nouveau]
Code: 8b 46 58 48 01 c2 48 09 c3 49 89 56 58 45 01 e5 41 0f b7 47 12 49 8b 7f 08 89 da 42 8d 2c e0 48 8b 47 08 41 83 c4 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7f 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffa45c0305f850 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000000c8001 RCX: 0000000000000001
RDX: 00000000000c8001 RSI: 0000000000000030 RDI: ffff95a79757f280
RBP: 0000000000000030 R08: ffffa45c0305faa8 R09: 0000000000000004
R10: ffff95a7970c9c60 R11: ffff95a78d7d8c00 R12: 0000000000000007
R13: 000000000000000a R14: ffffa45c0305faa8 R15: ffff95a797580a80
FS: 00007f131c5aa640(0000) GS:ffff95b2cd640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000010f370005 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? exc_page_fault+0x7f/0x180
? asm_exc_page_fault+0x26/0x30
? gp100_vmm_pgt_mem+0xbb/0x170 [nouveau]
nvkm_vmm_iter.isra.0+0x2f7/0x890 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_ptes_get_map+0xb1/0xf0 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_map_locked+0x219/0x390 [nouveau]
nvkm_vmm_map+0x89/0xe0 [nouveau]
nvkm_vram_map+0x5a/0x80 [nouveau]
gf100_mem_map+0xc7/0x170 [nouveau]
nvkm_umem_map+0x69/0x100 [nouveau]
nvkm_ioctl_map+0x7e/0xf0 [nouveau]
nvkm_ioctl+0x10b/0x250 [nouveau]
nvif_object_map_handle+0xc8/0x180 [nouveau]
nouveau_ttm_io_mem_reserve+0x189/0x2e0 [nouveau]
ttm_bo_vm_fault_reserved+0xa7/0x3b0 [ttm]
? mmap_region+0x716/0x960
nouveau_ttm_fault+0x69/0xa0 [nouveau]
__do_fault+0x30/0x130
do_fault+0x7e/0x460
__handle_mm_fault+0x782/0xdb0
handle_mm_fault+0x17f/0x360
do_user_addr_fault+0x1e2/0x670
exc_page_fault+0x7f/0x180
asm_exc_page_fault+0x26/0x30
RIP: 0033:0x7f1321b7cc07
jacek
(Jacek Pliszka)
March 24, 2024, 12:15pm
6
glxinfo | grep ‘OpenGL renderer’
OpenGL renderer string: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
The nouveau driver does not properly support the nvidia [Quadro T1000 Mobile] GPU. Thus it is subject to potentially triggering crashes. The same would be true if no driver were installed or if you manage to disable the device.
The fix is relatively simple but does require you work at the command line a bit to resolve it.
First install akmods with sudo dnf install akmods
Follow the steps shown in the file /usr/share/doc/akmods/README.secureboot
so the system is prepared to sign the nvidia modules when they are installed. This will require using sudo with each command listed there.
Ensure the rpmfusion-nonfree-nvidia-driver
repo is enabled by running dnf repolist
and verify that repo is shown in the list. If not then enable that repo by using the gnome software app and enabling it thru the 3rd party repos list (hamburger menu at the top right).
Install the nvidia drivers with sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda
.
Wait about 5 minutes then reboot. The nvidia drivers should now load even when secureboot is enabled.
2 Likes
jacek
(Jacek Pliszka)
July 8, 2024, 1:58pm
8
OK, upgraded to mesa 24.1.2 a week ago and I had a crash today again.
So looks like mesa fix did not solve everything.
I am hesitant to move to nvidia drivers as I had issues with them in the past - on boot (i have strict secure boot) and with Wayland.
jacek
(Jacek Pliszka)
July 8, 2024, 5:35pm
10
Thank you for the links but I will not use nvidia drivers in any near future - lost too many days on that couple years ago.
1 or 2 crashes per month seem to be lesser evil than what I had with nvidia drivers.
I probably need to switch nouveau forum then.
i don’t know what happened a couple years ago, but if you will not use the RPMFusion drivers from the repos, then maybe you can wait for NVK.
1 Like
jacek
(Jacek Pliszka)
July 16, 2024, 10:12am
12
Actually looks like after Fedora and mesa upgrades situation even got worse - I got one per month before, I had 3 crashes in the last 8 days alone… though errors are now different:
kernel: nouveau 0000:01:00.0: gsp: mmu fault queued
kernel: nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:16 type:31 scope:1 part:233
kernel: nouveau 0000:01:00.0: fifo:001001:0002:0010:[gnome-shell[6760]] errored - disabling channel
kernel: nouveau 0000:01:00.0: systemd-logind[6187]: channel 16 killed!
gnome-shell[6760]: nouveau: kernel rejected pushbuf: No such device
gnome-shell[6760]: nouveau: ch16: krec 0 pushes 1 bufs 2 relocs 0
gnome-shell[6760]: nouveau: ch16: buf 00000000 0000000e 00000004 00000004 00000000 0x7fe1eaf4f000 0x710000 0x80000
gnome-shell[6760]: nouveau: ch16: buf 00000001 00000006 00000004 00000000 00000004 0x7fe2324d1000 0x21c000 0x1000
gnome-shell[6760]: nouveau: ch16: psh 00000000 0000000028 00000000a8
gnome-shell[6760]: nouveau: 0x20056080
gnome-shell[6760]: nouveau: 0x000000e6
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x00000040
gnome-shell[6760]: nouveau: 0x00000001
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x20046086
gnome-shell[6760]: nouveau: 0x00000780
gnome-shell[6760]: nouveau: 0x000004b0
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x04400000
gnome-shell[6760]: nouveau: 0x800060ae
gnome-shell[6760]: nouveau: 0x2002608c
gnome-shell[6760]: nouveau: 0x000000df
gnome-shell[6760]: nouveau: 0x00000001
gnome-shell[6760]: nouveau: 0x20056091
gnome-shell[6760]: nouveau: 0x00001e00
gnome-shell[6760]: nouveau: 0x00000780
gnome-shell[6760]: nouveau: 0x000004b0
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x05800000
gnome-shell[6760]: nouveau: 0x80006223
audit[6760]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=6760 comm="gnome-shell" exe="/usr/bin/gnome-shell" sig=11 res=1
gnome-shell[6760]: nouveau: 0x2004622c
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x00000780
gnome-shell[6760]: nouveau: 0x000004b0
gnome-shell[6760]: nouveau: 0x20046230
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x00000001
gnome-shell[6760]: nouveau: 0x00000000
gnome-shell[6760]: nouveau: 0x00000001
gnome-shell[6760]: nouveau: kernel rejected pushbuf: No such device
gnome-shell[6760]: nouveau: ch16: krec 0 pushes 1 bufs 0 relocs 0
kernel: show_signal_msg: 45 callbacks suppressed
kernel: gnome-shell[6760]: segfault at 560800000000 ip 00007fe20e1c0cd7 sp 00007ffeb4ada310 error 4 in nouveau_dri.so[7fe20d816000+1864000] likely on CPU 6 (core 0, socket 0)
kernel: Code: 00 00 90 41 8b 06 48 8b 75 b0 4c 8d 05 a2 e9 ee 00 48 8d 15 43 4f f5 00 49 8b 5e 08 4d 8b 66 10 48 8d 04 80 45 8b 0e 8b 4d bc <4c> 8b 14 c6 4a 8d 04 23 48 8b 3d c2 31 bd 01 be 02 00 00 00 4d 8b
systemd[1]: Created slice system-systemd\x2dcoredump.slice - Slice /system/systemd-coredump.
audit: BPF prog-id=89 op=LOAD
audit: BPF prog-id=90 op=LOAD
audit: BPF prog-id=91 op=LOAD
systemd[1]: Started systemd-coredump@0-18533-0.service - Process Core Dump (PID 18533/UID 0).
audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-18533-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
gnwiii
(George N. White III)
July 16, 2024, 11:18am
13
Nvidia hardware on linux has been a huge time sink, and nouveau does not confer immunity. On systems with Nvidia I generally install both – the rpmfusion howto has a section on switching – and switch when a problem appears.