Kernel-6.8.4-100 Fedora 38 x86_64 w/dual amdgpu issues on Alienware m17 R5 AMD

Hi,

My Alienware m17 R5 AMD laptop got the kernel-6.8.4-100 update a few days ago.

Ever since then, gnome-shell has either crashed or it’s froze to the point I have to hard reset the laptop. Also, I am getting GLX errors (No GL Visuals) from xscreensaver.

These issues happen after I leave the laptop for some amount of time. Everything works as expected after a reboot. I then leave my laptop and come back to see either “No GL Visuals” from xscreensaver and gnome-shell has either crashed or I can’t get it to respond to mouse/keyboard or it takes gnome-shell a very long time (many minutes) to respond to my input.

Reverting back to kernel-6.7.11-100 seems to restore things on the Alienware laptop.

On my other two desktops, which also have AMD RX PCIe cards (RX 6600/6600 XT), they seem to be fine: I don’t have any xscreensaver errors and gnome-shell hasn’t crashed/froze. But my laptop is my daily driver.

I use to have to run an amdgpu kernel option (sg_display=1) but I did remove it for kernel-6.8.4-100 and it made no difference. I still have it removed.

I also tried out amdgpu.runpm=0 with kernel-6.8.4-100 but this made no difference either.

Are there amdgpu issues with kernel-6.8.4?

This description seems to indicate the issue may be directly related to the screensaver.
Have you tried either uninstalling or disabling the screensaver to see if that might be the cause?

Sure, could be, but my desktops which have discreet amdgpu cards (RX 6600/6600 XT) in them with the same Fedora/desktop setup/configuration appear to be running OK. I have only done a quick test (login, load Evolution and Firefox, wait for screensaver, wake it up and check) with them because it’s my laptop that I rely on. My AMD laptop has dual GPUs: APU and dGPU. I am forcing GNOME to use the dGPU by using the eDP port and closing the lid (which the APU drives). I have had to use kernel amdgpu options the past to get this all AMD laptop work correctly…

The error from xscreensaver is that that the GLX screen saver that was running OK (I can manually activate it) now can’t initialize the GLX APIs. When I have this error is when I find evidence of gnome-shell crashing or gnome-shell takes minutes to respond to my inputs. And glxgears didn’t run either…

Everything works fine with a reboot back to kernel-6.7.11-100.

I have turned off xscreensaver to see what happens.

1 Like

Here’s the stuff that screams out in journalctl:

Apr 07 01:57:54 gnome-shell[4483]: Creating pipes for GWakeup: Too many open files
Apr 07 01:58:05 systemd-coredump[71348]: [🡕] Process 4483 (gnome-shell) of user 1001 dumped core.
Stack trace of thread 71298:
                                                         #0  0x00007fe5a9f4bc3a _ZN2js3jit14JSJitFrameIterppEv (libmozjs-102.so.0 + 0x74bc3a)
                                                         #1  0x00007fe5a9a062f8 _ZN2js9FrameIter18settleOnActivationEv (libmozjs-102.so.0 + 0x2062f8)
                                                         #2  0x00007fe5a9a47fd7 _ZN2js13DumpBacktraceEP9JSContextRNS_14GenericPrinterE (libmozjs-102.so.0 + 0x247fd7)
                                                         #3  0x00007fe5a9a483ea _ZN2js13DumpBacktraceEP9JSContextP8_IO_FILE (libmozjs-102.so.0 + 0x2483ea)
                                                         #4  0x00007fe5ab42bd86 gjs_dumpstack (libgjs.so.0 + 0xa8d86)
                                                         #5  0x000055be3bc889a7 dump_gjs_stack_on_signal_handler (gnome-shell + 0x49a7)
                                                         #6  0x00007fe5aae5fbb0 __restore_rt (libc.so.6 + 0x3dbb0)
                                                         #7  0x00007fe5ab545b8f g_log_structured_array (libglib-2.0.so.0 + 0x61b8f)
                                                         #8  0x00007fe5ab545e7c g_log_default_handler (libglib-2.0.so.0 + 0x61e7c)
                                                         #9  0x00007fe5ab546120 g_logv (libglib-2.0.so.0 + 0x62120)
                                                         #10 0x00007fe5ab546403 g_log (libglib-2.0.so.0 + 0x62403)
                                                         #11 0x00007fe5ab59897a g_wakeup_new (libglib-2.0.so.0 + 0xb497a)
                                                         #12 0x00007fe5ab67ca2b g_cancellable_make_pollfd (libgio-2.0.so.0 + 0x4fa2b)
                                                         #13 0x00007fe5ab6cecff g_socket_condition_timed_wait (libgio-2.0.so.0 + 0xa1cff)
                                                         #14 0x00007fe5ab6cf086 g_socket_receive_with_timeout (libgio-2.0.so.0 + 0xa2086)
                                                         #15 0x00007fe5ab6af111 g_input_stream_read (libgio-2.0.so.0 + 0x82111)
                                                         #16 0x00007fe59fde8c62 g_tls_connection_gnutls_pull_func (libgiognutls.so + 0xdc62)
                                                         #17 0x00007fe5a8448875 _gnutls_io_read_buffered (libgnutls.so.30 + 0x48875)
                                                         #18 0x00007fe5a843d667 _gnutls_recv_in_buffers (libgnutls.so.30 + 0x3d667)
                                                         #19 0x00007fe5a844a065 _gnutls_handshake_io_recv_int (libgnutls.so.30 + 0x4a065)
                                                         #20 0x00007fe5a844c831 _gnutls_recv_handshake (libgnutls.so.30 + 0x4c831)
                                                         #21 0x00007fe5a84502ee gnutls_handshake (libgnutls.so.30 + 0x502ee)
                                                         #22 0x00007fe59fdedd86 g_tls_connection_gnutls_handshake_thread_handshake (libgiognutls.so + 0x12d86)
                                                         #23 0x00007fe59fdf122f handshake_thread (libgiognutls.so + 0x1622f)
                                                         #24 0x00007fe59fdf13e3 async_handshake_thread (libgiognutls.so + 0x163e3)
                                                         #25 0x00007fe5ab6e2f84 g_task_thread_pool_thread (libgio-2.0.so.0 + 0xb5f84)
                                                         #26 0x00007fe5ab571112 g_thread_pool_thread_proxy.lto_priv.0 (libglib-2.0.so.0 + 0x8d112)
                                                         #27 0x00007fe5ab56e9f3 g_thread_proxy (libglib-2.0.so.0 + 0x8a9f3)
                                                         #28 0x00007fe5aaeae947 start_thread (libc.so.6 + 0x8c947)
                                                         #29 0x00007fe5aaf34970 __clone3 (libc.so.6 + 0x112970)

Here’s the stack trace for the main thread:

Stack trace of thread 4483:
                                                         #0  0x00007fe5a9e0f732 _ZN2js3jit15DoTrialInliningEP9JSContextPNS0_13BaselineFrameE (libmozjs-102.so.0 + 0x60f732)
                                                         #1  0x000030542485afca n/a (n/a + 0x0)
                                                         #2  0x00003054255b7555 n/a (n/a + 0x0)
                                                         #3  0x000055be49347bb8 n/a (n/a + 0x0)
                                                         #4  0x000030542556af21 n/a (n/a + 0x0)
                                                         #5  0x000055be43cdc8d8 n/a (n/a + 0x0)
                                                         #6  0x00003054255b7555 n/a (n/a + 0x0)
                                                         #7  0x000055be3fd69330 n/a (n/a + 0x0)
                                                         #8  0x000030542485856a n/a (n/a + 0x0)
                                                         #9  0x00007fe5a9f571e4 _ZL8EnterJitP9JSContextRN2js8RunStateEPh (libmozjs-102.so.0 + 0x7571e4)
                                                         #10 0x00007fe5a994edb9 _ZN2js9RunScriptEP9JSContextRNS_8RunStateE (libmozjs-102.so.0 + 0x14edb9)
                                                         #11 0x00007fe5a994f208 _ZN2js23InternalCallOrConstructEP9JSContextRKN2JS8CallArgsENS_14MaybeConstructENS_10CallReasonE (libmozjs-102.so.0 + 0x14f208)
                                                         #12 0x00007fe5a994f604 _ZN2js4CallEP9JSContextN2JS6HandleINS2_5ValueEEES5_RKNS_13AnyInvokeArgsENS2_13MutableHandleIS4_EENS_10CallReasonE (libmozjs-102.so.0 + 0x14f604)
                                                         #13 0x00007fe5a9974156 _ZN2js4CallEP9JSContextN2JS6HandleINS2_5ValueEEES5_S5_NS2_13MutableHandleIS4_EE (libmozjs-102.so.0 + 0x174156)
                                                         #14 0x00007fe5a9a65acb _ZL18PromiseReactionJobP9JSContextjPN2JS5ValueE (libmozjs-102.so.0 + 0x265acb)
                                                         #15 0x00007fe5a994f140 _ZN2js23InternalCallOrConstructEP9JSContextRKN2JS8CallArgsENS_14MaybeConstructENS_10CallReasonE (libmozjs-102.so.0 + 0x14f140)
                                                         #16 0x00007fe5a994f604 _ZN2js4CallEP9JSContextN2JS6HandleINS2_5ValueEEES5_RKNS_13AnyInvokeArgsENS2_13MutableHandleIS4_EENS_10CallReasonE (libmozjs-102.so.0 + 0x14f604)
                                                         #17 0x00007fe5a99d1620 _ZN2JS4CallEP9JSContextNS_6HandleINS_5ValueEEES4_RKNS_16HandleValueArrayENS_13MutableHandleIS3_EE (libmozjs-102.so.0 + 0x1d1620)
                                                         #18 0x00007fe5ab413e25 _ZN17GjsContextPrivate17run_jobs_fallibleEv.localalias (libgjs.so.0 + 0x90e25)
                                                         #19 0x00007fe5ab4141b8 _ZN17GjsContextPrivate7runJobsEP9JSContext (libgjs.so.0 + 0x911b8)
                                                         #20 0x00007fe5ab423afe _ZN3Gjs20PromiseJobDispatcher6SourceUlP8_GSourcePFiPvES4_E_4_FUNES3_S6_S4_.lto_priv.0 (libgjs.so.0 + 0xa0afe)
                                                         #21 0x00007fe5ab5404fc g_main_context_dispatch (libglib-2.0.so.0 + 0x5c4fc)
                                                         #22 0x00007fe5ab59e6b8 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba6b8)
                                                         #23 0x00007fe5ab53faff g_main_loop_run (libglib-2.0.so.0 + 0x5baff)
                                                         #24 0x00007fe5ab0d54ca meta_context_run_main_loop (libmutter-12.so.0 + 0xd54ca)
                                                         #25 0x000055be3bc87fb7 main (gnome-shell + 0x3fb7)
                                                         #26 0x00007fe5aae49b8a __libc_start_call_main (libc.so.6 + 0x27b8a)
                                                         #27 0x00007fe5aae49c4b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c4b)
                                                         #28 0x000055be3bc88295 _start (gnome-shell + 0x4295)

Here’s the abrt-notification (which I deleted by accident):

Apr 07 01:58:07 abrt-notification[71676]: [🡕] Process 4483 (gnome-shell) crashed in js::jit::JSJitFrameIter::operator++()()

I also see alot of:

Apr 07 03:27:52 gnome-shell[71491]: glibtop(c=71491): [WARNING] Could not open /etc/mtab: Too many open files

I did check on ulimit and I don’t know why this message is written b/c there isn’t too many open files.

Those gnome-shell messages didn’t appear until kernel-6.8.4-100.

With no applications running, I was able to get no video/graphics issues (it ran as expected, sharp, no flickers, etc) for 10 hours, so I think my issue is with gnome-shell crashing so frequently. Why gnome-shell doesn’t crash as much in kernel-6.7.11-100 is a mystery to me, but it is crashing quite a bit for me on this AMD Advantage laptop with kernel-6.8.4-100. I sync my home/settings to my desktops. Other than not using my desktops as much, I can’t explain why I don’t get the same issue on the desktops. They use amdgpu also (RX 6600 PCIe cards).

I did more digging into this issue and I think it’s a kernel regression.

I had to blacklist a built-in kernel module: dell_smm_hwmon. There were changes merged for the kernel 6.8 series that made this module work differently than it did in kernel 6.7.

The basic issue: gnome-shell leaks files in /sys & /proc at some point, usually hours, after it’s started. I have a sysprof capture which shows that the kernel time was taking more than half of the total time. I then started digging in the sysprof capture in the kernel tree part and found that there was calls being made into the dell_smm_hwmon kernel module that was leaking file handles in /sys and /proc at a rate that was causing gnome-shell to crash, and then restart itself, multiple times per hour. It was running out of file handles because of kernel calls being made to dell_smm_hwmon.

I did take a quick peak at this module source and I don’t know how it would be applicable to an Alienware Laptop. Or that the changes being made would work/cover an Alienware AMD Advantage Laptop. I must have the Dell “chip/interface” that dell_smm_hwmon claims to support but it should be blacklisted I think. I don’t think there’s a way to disable fan control in the Alienware BIOS. It seems like this module is more intended for the mainline business Dell laptops. So I blacklisted it and no issues for 24 hours now.

My kernel-6.8.4-100 grub boot args:

GRUB_CMDLINE_LINUX="rhgb quiet LANG=en_US.UTF.8 iommu=pt module_blacklist=dell_smm_hwmon"

dmesg confirmation:

[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.4-100.fc38.x86_64 root=UUID=963fa5ce-f647-480c-a9f0-face7ff48b7a ro rhgb quiet LANG=en_US.UTF.8 iommu=pt module_blacklist=dell_smm_hwmon
[    0.613663] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.4-100.fc38.x86_64 root=UUID=963fa5ce-f647-480c-a9f0-face7ff48b7a ro rhgb quiet LANG=en_US.UTF.8 iommu=pt module_blacklist=dell_smm_hwmon
[   14.771456] Module dell_smm_hwmon is blacklisted
[   14.832738] dell_smbios: Unable to run on non-Dell system

I can’t explain why the version of dell_smm_hwmon isn’t different between kernels, but it’s functionality is definitely different:

$ modinfo /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
filename:       /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
alias:          i8k
license:        GPL
description:    Dell laptop SMM BIOS hwmon driver
author:         Pali Rohár <pali@kernel.org>
author:         Massimo Dal Zotto (dz@debian.org)
rhelversion:    9.99

$ modinfo /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
filename:       /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
alias:          i8k
license:        GPL
description:    Dell laptop SMM BIOS hwmon driver
author:         Pali Rohár <pali@kernel.org>
author:         Massimo Dal Zotto (dz@debian.org)
rhelversion:    9.99

There are other dell_* Kernel modules I am considering blocking as well:

dell_wmi_sysman
dell_laptop
dell_rbtn
dell_smbios
dell_smo8800
dell_wmi_aio
dell_wmi_ddv
dell_wmi_descriptor
dell_wmi_led
dell_wm

Most of these are not loading for me though. And I don’t have an “airplane” (dell_rbtn) button/swtich (like many Dell laptops do), but it’s loaded:

$ lsmod | grep dell
dell_wmi_descriptor    20480  0
dell_rbtn              20480  0
rfkill                 40960  8 bluetooth,dell_rbtn,cfg80211
wmi                    36864  4 video,alienware_wmi,wmi_bmof,dell_wmi_descriptor

Gnome-shell has been running without issue on kernel-6.8.4-100 for 24 hours. No fd leaks, no memory leaks, and no sigfaults.

1 Like