I did more digging into this issue and I think it’s a kernel regression.
I had to blacklist a built-in kernel module: dell_smm_hwmon. There were changes merged for the kernel 6.8 series that made this module work differently than it did in kernel 6.7.
The basic issue: gnome-shell leaks files in /sys & /proc at some point, usually hours, after it’s started. I have a sysprof capture which shows that the kernel time was taking more than half of the total time. I then started digging in the sysprof capture in the kernel tree part and found that there was calls being made into the dell_smm_hwmon kernel module that was leaking file handles in /sys and /proc at a rate that was causing gnome-shell to crash, and then restart itself, multiple times per hour. It was running out of file handles because of kernel calls being made to dell_smm_hwmon.
I did take a quick peak at this module source and I don’t know how it would be applicable to an Alienware Laptop. Or that the changes being made would work/cover an Alienware AMD Advantage Laptop. I must have the Dell “chip/interface” that dell_smm_hwmon claims to support but it should be blacklisted I think. I don’t think there’s a way to disable fan control in the Alienware BIOS. It seems like this module is more intended for the mainline business Dell laptops. So I blacklisted it and no issues for 24 hours now.
My kernel-6.8.4-100 grub boot args:
GRUB_CMDLINE_LINUX="rhgb quiet LANG=en_US.UTF.8 iommu=pt module_blacklist=dell_smm_hwmon"
dmesg confirmation:
[ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.4-100.fc38.x86_64 root=UUID=963fa5ce-f647-480c-a9f0-face7ff48b7a ro rhgb quiet LANG=en_US.UTF.8 iommu=pt module_blacklist=dell_smm_hwmon
[ 0.613663] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.4-100.fc38.x86_64 root=UUID=963fa5ce-f647-480c-a9f0-face7ff48b7a ro rhgb quiet LANG=en_US.UTF.8 iommu=pt module_blacklist=dell_smm_hwmon
[ 14.771456] Module dell_smm_hwmon is blacklisted
[ 14.832738] dell_smbios: Unable to run on non-Dell system
I can’t explain why the version of dell_smm_hwmon isn’t different between kernels, but it’s functionality is definitely different:
$ modinfo /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
filename: /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
alias: i8k
license: GPL
description: Dell laptop SMM BIOS hwmon driver
author: Pali Rohár <pali@kernel.org>
author: Massimo Dal Zotto (dz@debian.org)
rhelversion: 9.99
$ modinfo /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
filename: /lib/modules/6.7.11-100.fc38.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
alias: i8k
license: GPL
description: Dell laptop SMM BIOS hwmon driver
author: Pali Rohár <pali@kernel.org>
author: Massimo Dal Zotto (dz@debian.org)
rhelversion: 9.99
There are other dell_* Kernel modules I am considering blocking as well:
dell_wmi_sysman
dell_laptop
dell_rbtn
dell_smbios
dell_smo8800
dell_wmi_aio
dell_wmi_ddv
dell_wmi_descriptor
dell_wmi_led
dell_wm
Most of these are not loading for me though. And I don’t have an “airplane” (dell_rbtn) button/swtich (like many Dell laptops do), but it’s loaded:
$ lsmod | grep dell
dell_wmi_descriptor 20480 0
dell_rbtn 20480 0
rfkill 40960 8 bluetooth,dell_rbtn,cfg80211
wmi 36864 4 video,alienware_wmi,wmi_bmof,dell_wmi_descriptor
Gnome-shell has been running without issue on kernel-6.8.4-100 for 24 hours. No fd leaks, no memory leaks, and no sigfaults.