How to identify culprit when laptop is under a soft lockup

Hi,

Recently I’ve been having a problem where my laptop undergoes a soft lockup seemingly randomly, sometimes it will happen 10 minutes into a session, sometimes not for hours. Top shows me that it is a kworker events_unbound using 100% of a CPU thread, however all troubleshooting methods I’ve found online (dmesg, perf) need to run in sudo, and nothing new opens in the soft lockup.

This happens regardless of WM/DE, it’s happened in Gnome, Cosmic, and Hyprland for me.

The sequence of events usually goes>
wifi out → new apps can’t be opened/certain cli things (sudo, btop), old apps stop working (can’t run echo, man, top) → I have to hard restart with my power button

Possible Relevant Information

I’m new, so please let me know if anything else is needed.

Common fixes

I’ve downgraded from 6.10 to 6.8, which had worked for me before in the past. The issue still happens.

I’ve ran the full test suite of memtest86 under 4 passes with no errors.

Running dmesg | grep kworker or journalctl -b -0 | grep kworker shows nothing. However, I can’t run it during/after the soft lockup.

Fpaste output

=== fpaste 0.5.0.0 System Information ===
* OS Release (lsb_release -ds):
     "Fedora Linux 40 (Workstation Edition)"
     
* CPU Model (grep 'model name' /proc/cpuinfo | awk -F: '{print $2}' | uniq -c |
     sed -re 's/^ +//' ):
     12  12th Gen Intel(R) Core(TM) i7-1250U
     
* 64-bit Support (grep -q ' lm ' /proc/cpuinfo && echo Yes || echo No):
     Yes
     
* Hardware Virtualization Support (grep -Eq '(vmx|svm)' /proc/cpuinfo && echo Yes || echo No):
     Yes
     
* Kernel (uname -r):
     6.8.5-301.fc40.x86_64
     
* Kernel cmdline (cat /proc/cmdline):
     BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.8.5-301.fc40.x86_64 root=UUID=820fde66-6130-4b54-afa4-ac8155a7da50 ro rootflags=subvol=root rhgb quiet
     
* Desktop(s) Running (without results: "ps -eo comm= | grep -E '(gnome-session|startkde|startactive|xfce.?-session|fluxbox|blackbox|hackedbox|ratpoison|enlightenment|icewm-session|od-session|wmaker|wmx|openbox-lxde|openbox-gnome-session|openbox-kde-session|mwm|e16|fvwm|xmonad|sugar-session|mate-session|lxqt-session|cinnamon|lxdm-session|awesome|phosh|sway|Hyperland)' "):
     N/A

* Desktop(s) Installed (ls -m /usr/share/{xsessions,wayland-sessions}/ | sed 's/\.desktop//g' ):
     /usr/share/wayland-sessions/:
     cosmic, gnome-classic, gnome-classic-wayland,
     gnome, gnome-wayland, hyprland
     
     /usr/share/xsessions/:
     gnome-classic, gnome-classic-xorg, gnome,
     gnome-xorg
     
* Session Type (env | grep 'XDG_SESSION_TYPE' | sed 's/.*=//' ):
     wayland
     
* SELinux Status (sestatus):
     SELinux status:                 enabled
     SELinuxfs mount:                /sys/fs/selinux
     SELinux root directory:         /etc/selinux
     Loaded policy name:             targeted
     Current mode:                   enforcing
     Mode from config file:          enforcing
     Policy MLS status:              enabled
     Policy deny_unknown status:     allowed
     Memory protection checking:     actual (secure)
     Max kernel policy version:      33
     
* SELinux Errors (selinuxenabled && journalctl --no-hostname --since yesterday |grep avc: | grep -Eo comm="[^ ]+" | sort |uniq -c |sort -rn):
          34 comm="plymouthd"
     
* Memory usage (free -hm):
                    total        used        free      shared  buff/cache   available
     Mem:            30Gi       4.4Gi        22Gi       1.0Gi       5.6Gi        26Gi
     Swap:          8.0Gi          0B       8.0Gi
     
* ZRAM usage (zramctl --output-all):
     NAME       DISKSIZE DATA COMPR ALGORITHM STREAMS ZERO-PAGES TOTAL MEM-LIMIT MEM-USED MIGRATED MOUNTPOINT
     /dev/zram0       8G   4K   80B lzo-rle        12          0   12K        0B      12K       0B [SWAP]
     
* Load average (uptime):
      09:51:46 up 13:13,  1 user,  load average: 1.10, 1.00, 1.18
     
* Pressure Stall Information (grep -R . /proc/pressure/):
     /proc/pressure/io:some avg10=0.12 avg60=0.31 avg300=0.15 total=13095683
     /proc/pressure/io:full avg10=0.12 avg60=0.31 avg300=0.15 total=12148003
     /proc/pressure/cpu:some avg10=0.00 avg60=0.00 avg300=0.00 total=15139003
     /proc/pressure/cpu:full avg10=0.00 avg60=0.00 avg300=0.00 total=0
     /proc/pressure/irq:full avg10=0.00 avg60=0.00 avg300=0.00 total=12119615
     /proc/pressure/memory:some avg10=0.00 avg60=0.00 avg300=0.00 total=31
     /proc/pressure/memory:full avg10=0.00 avg60=0.00 avg300=0.00 total=31
     
* Top 5 CPU hogs (ps axuScnh | awk '$2!=19919' | sort -rnk3 | head -5):
         1000   16893 24.7  1.6 12330068 543588 tty2  Sl+  09:39   3:04 firefox
         1000   18148  9.2  1.0 3201088 336144 tty2   Sl+  09:40   1:03 Isolated Web Co
         1000   17079  7.2  1.2 20104864 408220 tty2  Sl+  09:39   0:53 WebExtensions
         1000    2071  6.8  0.0  30180 15928 ?        Ss   Oct09  54:02 systemd
         1000   15972  6.2  0.6 2828060 209976 tty2   Sl+  09:39   0:47 cosmic-comp
     
* Top 5 Memory hogs (ps axuScnh | sort -rnk4 | head -5):
         1000   16893 24.7  1.6 12330068 543588 tty2  Sl+  09:39   3:04 firefox
         1000   18299  4.3  1.5 3446804 510764 tty2   Sl+  09:40   0:28 Isolated Web Co
         1000   17079  7.2  1.2 20104864 408220 tty2  Sl+  09:39   0:53 WebExtensions
         1000   18148  9.2  1.0 3201088 336144 tty2   Sl+  09:40   1:03 Isolated Web Co
         1000   19228  3.2  0.8 2954404 266572 tty2   Sl+  09:48   0:05 Isolated Web Co
     
* block devices (lsblk -o NAME,FSTYPE,SIZE,FSUSE%,MOUNTPOINT,UUID,MIN-IO,SCHED,DISC-GRAN,MODEL):
     NAME        FSTYPE   SIZE FSUSE% MOUNTPOINT UUID                                 MIN-IO SCHED DISC-GRAN MODEL
     zram0                  8G        [SWAP]                                            4096              4K 
     nvme0n1            953.9G                                                           512 none       512B 3460 NVMe Micron 1024GB
     ├─nvme0n1p1 vfat     600M     3% /boot/efi  FB9E-A352                               512 none       512B 
     ├─nvme0n1p2 ext4       1G    37% /boot      b92c3910-73f6-4d90-b260-24bec5297757    512 none       512B 
     └─nvme0n1p3 btrfs  952.3G     2% /home      820fde66-6130-4b54-afa4-ac8155a7da50    512 none       512B 
     
* PCI devices (lspci -nn):
     00:00.0 Host bridge [0600]: Intel Corporation Alder Lake Host and DRAM Controller [8086:4602] (rev 06)
     00:02.0 VGA compatible controller [0300]: Intel Corporation Alder Lake-UP4 GT2 [Iris Xe Graphics] [8086:46aa] (rev 0c)
     00:04.0 Signal processing controller [1180]: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant [8086:461d] (rev 06)
     00:05.0 Multimedia controller [0480]: Intel Corporation Alder Lake Imaging Signal Processor [8086:465d] (rev 06)
     00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 06)
     00:07.0 PCI bridge [0604]: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 [8086:466e] (rev 06)
     00:07.1 PCI bridge [0604]: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #1 [8086:463f] (rev 06)
     00:08.0 System peripheral [0880]: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] (rev 06)
     00:0d.0 USB controller [0c03]: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller [8086:461e] (rev 06)
     00:0d.2 USB controller [0c03]: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 [8086:463e] (rev 06)
     00:12.0 Serial controller [0700]: Intel Corporation Alder Lake-P Integrated Sensor Hub [8086:51fc] (rev 01)
     00:14.0 USB controller [0c03]: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller [8086:51ed] (rev 01)
     00:14.2 RAM memory [0500]: Intel Corporation Alder Lake PCH Shared SRAM [8086:51ef] (rev 01)
     00:14.3 Network controller [0280]: Intel Corporation Alder Lake-P PCH CNVi WiFi [8086:51f0] (rev 01)
     00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 [8086:51e8] (rev 01)
     00:15.1 Serial bus controller [0c80]: Intel Corporation Alder Lake PCH Serial IO I2C Controller #1 [8086:51e9] (rev 01)
     00:16.0 Communication controller [0780]: Intel Corporation Alder Lake PCH HECI Controller [8086:51e0] (rev 01)
     00:1e.0 Communication controller [0780]: Intel Corporation Alder Lake PCH UART #0 [8086:51a8] (rev 01)
     00:1e.2 Serial bus controller [0c80]: Intel Corporation Alder Lake SPI Controller [8086:51aa] (rev 01)
     00:1e.3 Serial bus controller [0c80]: Intel Corporation Alder Lake SPI Controller [8086:51ab] (rev 01)
     00:1f.0 ISA bridge [0601]: Intel Corporation Alder Lake LPC Controller [8086:5187] (rev 01)
     00:1f.3 Multimedia audio controller [0401]: Intel Corporation Alder Lake Smart Sound Technology Audio Controller [8086:51cc] (rev 01)
     00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake PCH-P SMBus Host Controller [8086:51a3] (rev 01)
     00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-P PCH SPI Controller [8086:51a4] (rev 01)
     01:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc 3460 NVMe SSD [1344:5414] (rev 01)
     
* USB devices (lsusb):
     Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
     Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
     Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
     Bus 003 Device 002: ID 27c6:63ac Shenzhen Goodix Technology Co.,Ltd. Goodix Fingerprint USB Device
     Bus 003 Device 003: ID 8086:0b63 Intel Corp. USB Bridge
     Bus 003 Device 004: ID 8087:0033 Intel Corp. AX211 Bluetooth
     Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
     
* PCI Video Card (lspci |  grep -i -E 'vga' | cut -b1-7 | xargs -i lspci -vnnks {} | grep -v "<access denied>"):
     00:02.0 VGA compatible controller [0300]: Intel Corporation Alder Lake-UP4 GT2 [Iris Xe Graphics] [8086:46aa] (rev 0c) (prog-if 00 [VGA controller])
     	DeviceName: Onboard - Video
     	Subsystem: Dell Device [1028:0b14]
     	Flags: bus master, fast devsel, latency 0, IRQ 152, IOMMU group 1
     	Memory at 6077000000 (64-bit, non-prefetchable) [size=16M]
     	Memory at 4000000000 (64-bit, prefetchable) [size=256M]
     	I/O ports at 3000 [size=64]
     	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
     	Kernel driver in use: i915
     	Kernel modules: i915, xe
     
     
* GL Support (glxinfo -B | grep -E "OpenGL version|OpenGL renderer"):
     OpenGL renderer string: Mesa Intel(R) Graphics (ADL GT2)
     OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.1.7
     
* DRM Information (journalctl -k -b --no-hostname | grep -o 'kernel:.*drm.*$' | cut -d ' ' -f 2- ):
     ACPI: bus type drm_connector registered
     [drm] Initialized simpledrm 1.0.0 20200625 for simple-framebuffer.0 on minor 0
     simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
     i915 0000:00:02.0: [drm] VT-d active for gfx access
     i915 0000:00:02.0: [drm] Using Transparent Hugepages
     i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc.bin (v2.20)
     i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.20.0
     i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
     i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
     i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
     i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
     i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
     i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
     [drm] Initialized i915 1.6.0 20230929 for 0000:00:02.0 on minor 1
     fbcon: i915drmfb (fb0) is primary device
     i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
     i915 0000:00:02.0: [drm] Selective fetch area calculation failed in pipe A
     i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.20.0
     i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
     i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
     i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
     i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
     i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
     i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.20.0
     i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
     i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
     i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
     i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
     i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
     i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.20.0
     i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
     i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
     i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
     i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
     i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
     
* Xorg modules (grep LoadModule /var/log/Xorg.0.log ~/.local/share/xorg/Xorg.0.log | cut -d \" -f 2 | xargs):
     
     
* Xorg errors (without results: "grep '^\[.*(EE)' /var/log/Xorg.0.log ~/.local/share/xorg/Xorg.0.log | cut -d ':' -f 2- "):
     N/A

* PCI Audio devices (lspci |  grep -i -E 'audio' | cut -b1-7 | xargs -i lspci -vnnks {} | grep -v "<access denied>"):
     00:1f.3 Multimedia audio controller [0401]: Intel Corporation Alder Lake Smart Sound Technology Audio Controller [8086:51cc] (rev 01)
     	Subsystem: Dell Device [1028:0b14]
     	Flags: bus master, fast devsel, latency 64, IRQ 187, IOMMU group 15
     	Memory at 6078190000 (64-bit, non-prefetchable) [size=16K]
     	Memory at 6078000000 (64-bit, non-prefetchable) [size=1M]
     	Kernel driver in use: sof-audio-pci-intel-tgl
     	Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl
     
     
* Audio devices (cat /proc/asound/cards):
      0 [sofsoundwire   ]: sof-soundwire - sof-soundwire
                           Intel Soundwire SOF
     
* User audio services (systemctl --user --no-pager status wireplumber pipewire* | sed "s/$(hostname)/ahost/"):
     ● wireplumber.service - Multimedia Service Session Manager
          Loaded: loaded (/usr/lib/systemd/user/wireplumber.service; enabled; preset: enabled)
         Drop-In: /usr/lib/systemd/user/service.d
                  └─10-timeout-abort.conf
          Active: active (running) since Wed 2024-10-09 20:39:08 CDT; 13h ago
        Main PID: 2280 (wireplumber)
           Tasks: 6 (limit: 37997)
          Memory: 5.2M (peak: 5.9M)
             CPU: 484ms
          CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/wireplumber.service
                  └─2280 /usr/bin/wireplumber
     
     Oct 09 20:39:08 ahost wireplumber[2280]: spa.alsa: pa_sw_volume_divide: Volume exceeds maximum allowed value and will be clipped. Please check your volume settings.
     Oct 09 20:39:08 ahost wireplumber[2280]: spa.alsa: pa_sw_volume_divide: Volume exceeds maximum allowed value and will be clipped. Please check your volume settings.
     Oct 09 20:39:08 ahost wireplumber[2280]: spa.alsa: pa_sw_volume_divide: Volume exceeds maximum allowed value and will be clipped. Please check your volume settings.
     Oct 10 09:38:54 ahost wireplumber[2280]: m-dbus-connection: <WpDBusConnection:0x55e9d682be20> DBus connection closed: Underlying GIOStream returned 0 bytes on an async read
     Oct 10 09:38:54 ahost wireplumber[2280]: m-dbus-connection: <WpDBusConnection:0x55e9d682be20> Trying to reconnect after core sync
     Oct 10 09:39:02 ahost wireplumber[2280]: spa.bluez5.midi: org.bluez.GattManager1.RegisterApplication() failed: GDBus.Error:org.bluez.Error.AlreadyExists: Already Exists
     Oct 10 09:39:02 ahost wireplumber[2280]: spa.alsa: pa_sw_volume_divide: Volume exceeds maximum allowed value and will be clipped. Please check your volume settings.
     Oct 10 09:39:02 ahost wireplumber[2280]: spa.alsa: pa_sw_volume_divide: Volume exceeds maximum allowed value and will be clipped. Please check your volume settings.
     Oct 10 09:39:02 ahost wireplumber[2280]: spa.alsa: pa_sw_volume_divide: Volume exceeds maximum allowed value and will be clipped. Please check your volume settings.
     Oct 10 09:39:02 ahost wireplumber[2280]: spa.alsa: pa_sw_volume_divide: Volume exceeds maximum allowed value and will be clipped. Please check your volume settings.
     
     ● pipewire-pulse.service - PipeWire PulseAudio
          Loaded: loaded (/usr/lib/systemd/user/pipewire-pulse.service; disabled; preset: disabled)
         Drop-In: /usr/lib/systemd/user/service.d
                  └─10-timeout-abort.conf
          Active: active (running) since Wed 2024-10-09 20:39:10 CDT; 13h ago
     TriggeredBy: ● pipewire-pulse.socket
        Main PID: 2870 (pipewire-pulse)
           Tasks: 3 (limit: 37997)
          Memory: 8.6M (peak: 13.0M)
             CPU: 347ms
          CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/pipewire-pulse.service
                  └─2870 /usr/bin/pipewire-pulse
     
     Oct 09 20:39:10 ahost systemd[2071]: Started pipewire-pulse.service - PipeWire PulseAudio.
     
     ● pipewire.socket - PipeWire Multimedia System Sockets
          Loaded: loaded (/usr/lib/systemd/user/pipewire.socket; enabled; preset: enabled)
          Active: active (running) since Wed 2024-10-09 20:39:07 CDT; 13h ago
        Triggers: ● pipewire.service
          Listen: /run/user/1000/pipewire-0 (Stream)
                  /run/user/1000/pipewire-0-manager (Stream)
          CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/pipewire.socket
     
     Oct 09 20:39:07 ahost systemd[2071]: Listening on pipewire.socket - PipeWire Multimedia System Sockets.
     
     ● pipewire.service - PipeWire Multimedia Service
          Loaded: loaded (/usr/lib/systemd/user/pipewire.service; disabled; preset: disabled)
         Drop-In: /usr/lib/systemd/user/pipewire.service.d
                  └─00-uresourced.conf
                  /usr/lib/systemd/user/service.d
                  └─10-timeout-abort.conf
          Active: active (running) since Wed 2024-10-09 20:39:08 CDT; 13h ago
     TriggeredBy: ● pipewire.socket
        Main PID: 2278 (pipewire)
           Tasks: 3 (limit: 37997)
          Memory: 7.2M (peak: 8.3M)
             CPU: 438ms
          CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/pipewire.service
                  └─2278 /usr/bin/pipewire
     
     Oct 09 20:39:08 ahost systemd[2071]: Started pipewire.service - PipeWire Multimedia Service.
     
     ● pipewire-pulse.socket - PipeWire PulseAudio
          Loaded: loaded (/usr/lib/systemd/user/pipewire-pulse.socket; enabled; preset: enabled)
          Active: active (running) since Wed 2024-10-09 20:39:07 CDT; 13h ago
        Triggers: ● pipewire-pulse.service
          Listen: /run/user/1000/pulse/native (Stream)
          CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/pipewire-pulse.socket
     
     Oct 09 20:39:07 ahost systemd[2071]: Listening on pipewire-pulse.socket - PipeWire PulseAudio.
     
* PCI Network devices (lspci |  grep -i -E 'net' | cut -b1-7 | xargs -i lspci -vnnks {} | grep -v "<access denied>"):
     00:14.3 Network controller [0280]: Intel Corporation Alder Lake-P PCH CNVi WiFi [8086:51f0] (rev 01)
     	Subsystem: Intel Corporation Dual Band Wi-Fi 6E(802.11ax) AX211 160MHz 2x2 [Garfield Peak] [8086:4090]
     	Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 11
     	Memory at 6078194000 (64-bit, non-prefetchable) [size=16K]
     	Kernel driver in use: iwlwifi
     	Kernel modules: iwlwifi
     
     
* Network status (ip -br addr | awk '{print $1" " $2}' | column -t):
     lo         UNKNOWN
     wlp0s20f3  UP
     
* Kernel buffer tail (journalctl --no-hostname -k --lines 50):
     Oct 10 09:49:38 kernel: wlp0s20f3: authenticated
     Oct 10 09:49:38 kernel: wlp0s20f3: associate with 54:d7:e3:a8:ea:10 (try 1/3)
     Oct 10 09:49:38 kernel: wlp0s20f3: RX ReassocResp from 54:d7:e3:a8:ea:10 (capab=0x11 status=0 aid=19)
     Oct 10 09:49:38 kernel: wlp0s20f3: associated
     Oct 10 09:49:38 kernel: wlp0s20f3: deauthenticating from 54:d7:e3:a8:ea:10 by local choice (Reason: 13=INVALID_IE)
     Oct 10 09:49:39 kernel: wlp0s20f3: authenticate with 54:d7:e3:a9:b5:80 (local address=42:ec:66:fa:70:bf)
     Oct 10 09:49:39 kernel: wlp0s20f3: send auth to 54:d7:e3:a9:b5:80 (try 1/3)
     Oct 10 09:49:39 kernel: wlp0s20f3: authenticated
     Oct 10 09:49:39 kernel: wlp0s20f3: associate with 54:d7:e3:a9:b5:80 (try 1/3)
     Oct 10 09:49:39 kernel: wlp0s20f3: RX AssocResp from 54:d7:e3:a9:b5:80 (capab=0x511 status=34 aid=0)
     Oct 10 09:49:39 kernel: wlp0s20f3: 54:d7:e3:a9:b5:80 denied association (code=34)
     Oct 10 09:49:42 kernel: wlp0s20f3: authenticate with 54:d7:e3:a8:ea:20 (local address=42:ec:66:fa:70:bf)
     Oct 10 09:49:42 kernel: wlp0s20f3: send auth to 54:d7:e3:a8:ea:20 (try 1/3)
     Oct 10 09:49:42 kernel: wlp0s20f3: authenticated
     Oct 10 09:49:42 kernel: wlp0s20f3: associate with 54:d7:e3:a8:ea:20 (try 1/3)
     Oct 10 09:49:42 kernel: wlp0s20f3: RX AssocResp from 54:d7:e3:a8:ea:20 (capab=0x1011 status=0 aid=1)
     Oct 10 09:49:42 kernel: wlp0s20f3: associated
     Oct 10 09:49:43 kernel: wlp0s20f3: Limiting TX power to 30 (30 - 0) dBm as advertised by 54:d7:e3:a8:ea:20
     Oct 10 09:50:17 kernel: wlp0s20f3: disconnect from AP 54:d7:e3:a8:ea:20 for new auth to 54:d7:e3:a8:ea:10
     Oct 10 09:50:17 kernel: wlp0s20f3: authenticate with 54:d7:e3:a8:ea:10 (local address=42:ec:66:fa:70:bf)
     Oct 10 09:50:17 kernel: wlp0s20f3: send auth to 54:d7:e3:a8:ea:10 (try 1/3)
     Oct 10 09:50:17 kernel: wlp0s20f3: authenticated
     Oct 10 09:50:17 kernel: wlp0s20f3: associate with 54:d7:e3:a8:ea:10 (try 1/3)
     Oct 10 09:50:17 kernel: wlp0s20f3: RX ReassocResp from 54:d7:e3:a8:ea:10 (capab=0x511 status=34 aid=0)
     Oct 10 09:50:17 kernel: wlp0s20f3: 54:d7:e3:a8:ea:10 denied association (code=34)
     Oct 10 09:50:20 kernel: wlp0s20f3: 80 MHz not supported, disabling VHT
     Oct 10 09:50:20 kernel: wlp0s20f3: authenticate with 54:d7:e3:a9:6b:c0 (local address=42:ec:66:fa:70:bf)
     Oct 10 09:50:20 kernel: wlp0s20f3: send auth to 54:d7:e3:a9:6b:c0 (try 1/3)
     Oct 10 09:50:20 kernel: wlp0s20f3: authenticated
     Oct 10 09:50:20 kernel: wlp0s20f3: associate with 54:d7:e3:a9:6b:c0 (try 1/3)
     Oct 10 09:50:20 kernel: wlp0s20f3: RX AssocResp from 54:d7:e3:a9:6b:c0 (capab=0x1431 status=0 aid=4)
     Oct 10 09:50:20 kernel: wlp0s20f3: associated
     Oct 10 09:50:21 kernel: wlp0s20f3: Limiting TX power to 36 (36 - 0) dBm as advertised by 54:d7:e3:a9:6b:c0
     Oct 10 09:50:34 kernel: usb 3-3: reset full-speed USB device number 2 using xhci_hcd
     Oct 10 09:51:31 kernel: wlp0s20f3: disconnect from AP 54:d7:e3:a9:6b:c0 for new auth to 54:d7:e3:a8:ea:10
     Oct 10 09:51:31 kernel: wlp0s20f3: authenticate with 54:d7:e3:a8:ea:10 (local address=42:ec:66:fa:70:bf)
     Oct 10 09:51:31 kernel: wlp0s20f3: send auth to 54:d7:e3:a8:ea:10 (try 1/3)
     Oct 10 09:51:31 kernel: wlp0s20f3: authenticated
     Oct 10 09:51:31 kernel: wlp0s20f3: associate with 54:d7:e3:a8:ea:10 (try 1/3)
     Oct 10 09:51:31 kernel: wlp0s20f3: associate with 54:d7:e3:a8:ea:10 (try 2/3)
     Oct 10 09:51:32 kernel: wlp0s20f3: associate with 54:d7:e3:a8:ea:10 (try 3/3)
     Oct 10 09:51:32 kernel: wlp0s20f3: association with 54:d7:e3:a8:ea:10 timed out
     Oct 10 09:51:35 kernel: wlp0s20f3: 80 MHz not supported, disabling VHT
     Oct 10 09:51:35 kernel: wlp0s20f3: authenticate with 54:d7:e3:a8:ea:00 (local address=42:ec:66:fa:70:bf)
     Oct 10 09:51:35 kernel: wlp0s20f3: send auth to 54:d7:e3:a8:ea:00 (try 1/3)
     Oct 10 09:51:35 kernel: wlp0s20f3: authenticated
     Oct 10 09:51:35 kernel: wlp0s20f3: associate with 54:d7:e3:a8:ea:00 (try 1/3)
     Oct 10 09:51:35 kernel: wlp0s20f3: RX AssocResp from 54:d7:e3:a8:ea:00 (capab=0x1431 status=0 aid=1)
     Oct 10 09:51:35 kernel: wlp0s20f3: associated
     Oct 10 09:51:35 kernel: wlp0s20f3: Limiting TX power to 36 (36 - 0) dBm as advertised by 54:d7:e3:a8:ea:00
     
* Last few reboots (last -x -n10 reboot runlevel):
     runlevel (to lvl 5)   6.8.5-301.fc40.x Wed Oct  9 20:39   still running
     reboot   system boot  6.8.5-301.fc40.x Wed Oct  9 20:38   still running
     runlevel (to lvl 5)   6.8.5-301.fc40.x Wed Oct  9 15:22 - 15:24  (00:01)
     reboot   system boot  6.8.5-301.fc40.x Wed Oct  9 15:21 - 15:24  (00:02)
     runlevel (to lvl 5)   6.8.5-301.fc40.x Wed Oct  9 14:31 - 15:22  (00:51)
     reboot   system boot  6.8.5-301.fc40.x Wed Oct  9 14:30 - crash  (00:51)
     runlevel (to lvl 5)   6.8.5-301.fc40.x Wed Oct  9 13:05 - 14:31  (01:25)
     reboot   system boot  6.8.5-301.fc40.x Wed Oct  9 13:05 - crash  (01:25)
     runlevel (to lvl 5)   6.10.12-200.fc40 Wed Oct  9 12:59 - 13:04  (00:05)
     reboot   system boot  6.10.12-200.fc40 Wed Oct  9 12:59 - 13:04  (00:05)
     
     wtmp begins Sun Sep 29 10:24:43 2024
     
* DNF Repositories (dnf repolist):
     repo id                                                             repo name
     code                                                                Visual Studio Code
     copr:copr.fedorainfracloud.org:alternateved:keyd                    Copr repo for keyd owned by alternateved
     copr:copr.fedorainfracloud.org:atim:lazygit                         Copr repo for lazygit owned by atim
     copr:copr.fedorainfracloud.org:errornointernet:packages             Copr repo for packages owned by errornointernet
     copr:copr.fedorainfracloud.org:phracek:PyCharm                      Copr repo for PyCharm owned by phracek
     copr:copr.fedorainfracloud.org:ryanabx:cosmic-epoch                 Copr repo for cosmic-epoch owned by ryanabx
     copr:copr.fedorainfracloud.org:solopasha:hyprland                   Copr repo for hyprland owned by solopasha
     copr:copr.fedorainfracloud.org:tofik:nwg-shell                      Copr repo for nwg-shell owned by tofik
     copr:copr.fedorainfracloud.org:wezfurlong:wezterm-nightly           Copr repo for wezterm-nightly owned by wezfurlong
     coprdep:copr.fedorainfracloud.org:erikreider:SwayNotificationCenter Copr copr.fedorainfracloud.org/tofik/nwg-shell runtime dependency #3 - erikreider/SwayNotificationCenter
     coprdep:copr.fedorainfracloud.org:mochaa:gtk-session-lock           Copr copr.fedorainfracloud.org/tofik/nwg-shell runtime dependency #2 - mochaa/gtk-session-lock
     coprdep:copr.fedorainfracloud.org:tofik:sway                        Copr copr.fedorainfracloud.org/tofik/nwg-shell runtime dependency #1 - tofik/sway
     fedora                                                              Fedora 40 - x86_64
     fedora-cisco-openh264                                               Fedora 40 openh264 (From Cisco) - x86_64
     google-chrome                                                       google-chrome
     mullvad-stable                                                      Mullvad VPN
     rpmfusion-free                                                      RPM Fusion for Fedora 40 - Free
     rpmfusion-free-updates                                              RPM Fusion for Fedora 40 - Free - Updates
     rpmfusion-nonfree                                                   RPM Fusion for Fedora 40 - Nonfree
     rpmfusion-nonfree-nvidia-driver                                     RPM Fusion for Fedora 40 - Nonfree - NVIDIA Driver
     rpmfusion-nonfree-steam                                             RPM Fusion for Fedora 40 - Nonfree - Steam
     rpmfusion-nonfree-updates                                           RPM Fusion for Fedora 40 - Nonfree - Updates
     updates                                                             Fedora 40 - x86_64 - Updates
     
* DNF Extras (dnf -C list extras):
     Last metadata expiration check: 9:53:28 ago on Wed 09 Oct 2024 11:58:20 PM CDT.
     
* Last 20 packages installed (rpm -qa --nodigest --nosignature --last | head -20):
     cosmic-osd-1.0.0~alpha.2^git20241009.c6fda40-1.fc40.x86_64 Thu 10 Oct 2024 12:18:23 AM CDT
     cosmic-greeter-1.0.0~alpha.2^git20241009.6ba26db-1.fc40.x86_64 Thu 10 Oct 2024 12:18:23 AM CDT
     cosmic-desktop-1.0.0~alpha.2^20241010-1.fc40.noarch Thu 10 Oct 2024 12:18:23 AM CDT
     cosmic-bg-1.0.0~alpha.2^git20241009.fd44edf-1.fc40.x86_64 Thu 10 Oct 2024 12:18:23 AM CDT
     wezterm-20241007_103714_ed430415-0.x86_64     Wed 09 Oct 2024 12:23:33 PM CDT
     yt-dlp-zsh-completion-2024.09.27-1.fc40.noarch Wed 09 Oct 2024 12:23:32 PM CDT
     yt-dlp-fish-completion-2024.09.27-1.fc40.noarch Wed 09 Oct 2024 12:23:32 PM CDT
     yt-dlp-bash-completion-2024.09.27-1.fc40.noarch Wed 09 Oct 2024 12:23:32 PM CDT
     webkitgtk6.0-2.46.1-1.fc40.x86_64             Wed 09 Oct 2024 12:23:32 PM CDT
     virtualbox-guest-additions-7.1.2-1.fc40.x86_64 Wed 09 Oct 2024 12:23:32 PM CDT
     python3-boto3-1.35.29-1.fc40.noarch           Wed 09 Oct 2024 12:23:32 PM CDT
     perl-Module-CoreList-5.20240920-1.fc40.noarch Wed 09 Oct 2024 12:23:32 PM CDT
     webkit2gtk4.1-2.46.1-1.fc40.x86_64            Wed 09 Oct 2024 12:23:31 PM CDT
     hyprland-0.44.1-1.fc40.x86_64                 Wed 09 Oct 2024 12:23:31 PM CDT
     containers-common-extra-0.60.4-1.fc40.noarch  Wed 09 Oct 2024 12:23:31 PM CDT
     cosmic-term-1.0.0~alpha.2^git20241007.aa1824d-1.fc40.x86_64 Wed 09 Oct 2024 12:23:30 PM CDT
     cosmic-store-1.0.0~alpha.2^git20241008.95dba6e-1.fc40.x86_64 Wed 09 Oct 2024 12:23:30 PM CDT
     cosmic-edit-1.0.0~alpha.2^git20241007.5546ed9-1.fc40.x86_64 Wed 09 Oct 2024 12:23:30 PM CDT
     containers-common-0.60.4-1.fc40.noarch        Wed 09 Oct 2024 12:23:30 PM CDT
     aquamarine-0.4.2-1.fc40.x86_64                Wed 09 Oct 2024 12:23:30 PM CDT
     
* EFI boot manager output (efibootmgr -v):
     BootCurrent: 0002
     Timeout: 5 seconds
     BootOrder: 0002,0000,0001
     Boot0000* UEFI 3460 NVMe Micron 1024GB 2227393FF417 1	HD(1,GPT,0f91eedf-590b-4aed-a627-c9116a174383,0x800,0x12c000)/\EFI\Boot\BootX64.efi{auto_created_boot_option}
           dp: 04 01 2a 00 01 00 00 00 00 08 00 00 00 00 00 00 00 c0 12 00 00 00 00 00 df ee 91 0f 0b 59 ed 4a a6 27 c9 11 6a 17 43 83 02 02 / 04 04 30 00 5c 00 45 00 46 00 49 00 5c 00 42 00 6f 00 6f 00 74 00 5c 00 42 00 6f 00 6f 00 74 00 58 00 36 00 34 00 2e 00 65 00 66 00 69 00 00 00 / 7f ff 04 00
         data: 4e ac 08 81 11 9f 59 4d 85 0e e2 1a 52 2c 59 b2
     Boot0001* UEFI HTTPs Boot	PciRoot(0x0)/Pci(0x1f,0x6)/MAC(000000000000,0)/IPv4(0.0.0.0,0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(){auto_created_boot_option}
           dp: 02 01 0c 00 d0 41 03 0a 00 00 00 00 / 01 01 06 00 06 1f / 03 0b 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 / 03 0c 1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 / 03 18 04 00 / 7f ff 04 00
         data: 4e ac 08 81 11 9f 59 4d 85 0e e2 1a 52 2c 59 b2
     Boot0002* Fedora	HD(1,GPT,0f91eedf-590b-4aed-a627-c9116a174383,0x800,0x12c000)/\EFI\fedora\shimx64.efi
           dp: 04 01 2a 00 01 00 00 00 00 08 00 00 00 00 00 00 00 c0 12 00 00 00 00 00 df ee 91 0f 0b 59 ed 4a a6 27 c9 11 6a 17 43 83 02 02 / 04 04 34 00 5c 00 45 00 46 00 49 00 5c 00 66 00 65 00 64 00 6f 00 72 00 61 00 5c 00 73 00 68 00 69 00 6d 00 78 00 36 00 34 00 2e 00 65 00 66 00 69 00 00 00 / 7f ff 04 00

Maybe it is losing access to its storage? Can you clone your storage to another device and test running on that? (It might not be a bad idea to have a backup anyway.)

There have been other reports of kworker soaking up CPU time and causing the system to feel like it’s running at ~1 frames/second. The other reports seem to tie to the amdgpu drivers, including Fedora and Arch Linux.

In the case for Fedora, I had downgraded to 6.10.9 and that seemed to have mitigated the issue. I have upgraded to 6.10.13 from updates-testing since there are other fixes for the amdgpu driver.

Interesting that it’s happening with Intel integrated graphics here…

Hi, just some updates, I was able to finally access logs using

sudo -i

premptively, so I was able to get my dmesg output. I think it’s traced down to my Intel wifi drivers. If anyone has insights, I’d love to know, otherwise I’ve opened bugzilla reports for this on Red Hat and Kernel. The weirdest thing is that it only happens in certain buildings in my university, which makes it a lot easier to repro now.

Dmesg output:


watchdog: BUG: soft lockup - CPU#5 stuck for 78s! [kworker/u24:6:35898]
Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib ip_set qrtr bnep uinput snd_ctl_led snd_soc_sof_sdw snd_soc_intel_hda_dsp_common sunrpc snd_sof_probes snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw snd_hda_codec_hdmi regmap_sdw_mbq regmap_sdw snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink binfmt_misc soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match vfat snd_soc_acpi fat soundwire_generic_allocation soundwire_bus intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal snd_soc_core intel_powerclamp iwlmvm coretemp snd_compress ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec mac80211 kvm
 snd_hda_core snd_hwdep irqbypass snd_seq libarc4 rapl btusb snd_seq_device processor_thermal_device_pci spi_nor hid_sensor_als dell_laptop iTCO_wdt btrtl intel_cstate intel_pmc_bxt mei_hdcp mei_pxp mtd spi_ljca gpio_ljca i2c_ljca iTCO_vendor_support intel_rapl_msr dell_wmi iwlwifi intel_uncore snd_pcm btintel hid_sensor_trigger processor_thermal_device dell_wmi_ddv pcspkr btbcm processor_thermal_wt_hint dell_smbios hid_sensor_iio_common snd_timer btmtk processor_thermal_rfim dcdbas industrialio_triggered_buffer cfg80211 bluetooth dell_smm_hwmon dell_wmi_sysman firmware_attributes_class ledtrig_audio dell_wmi_descriptor wmi_bmof usb_ljca mei_me snd spi_intel_pci processor_thermal_rapl kfifo_buf spi_intel i2c_i801 industrialio mei rfkill intel_rapl_common soundcore idma64 i2c_smbus processor_thermal_wt_req thunderbolt igen6_edac processor_thermal_power_floor processor_thermal_mbox int3403_thermal intel_skl_int3472_tps68470 tps68470_regulator int340x_thermal_zone intel_pmc_core clk_tps68470 intel_vsec
 nft_reject_inet pmt_telemetry intel_hid int3400_thermal nf_reject_ipv4 pmt_class intel_skl_int3472_discrete acpi_thermal_rel sparse_keymap acpi_pad acpi_tad nf_reject_ipv6 joydev nft_reject nft_masq nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables i2c_dev loop nfnetlink zram xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec hid_sensor_hub intel_ishtp_hid i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit polyval_clmulni drm_buddy polyval_generic ttm nvme ghash_clmulni_intel nvme_core drm_display_helper ucsi_acpi sha512_ssse3 video hid_multitouch typec_ucsi intel_ish_ipc sha256_ssse3 spi_pxa2xx_platform sha1_ssse3 typec dw_dmac cec intel_ishtp nvme_auth i2c_hid_acpi i2c_hid wmi pinctrl_tigerlake serio_raw ip6_tables ip_tables fuse
CPU: 5 PID: 35898 Comm: kworker/u24:6 Tainted: G             L     6.8.5-301.fc40.x86_64 #1
Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.23.0 08/08/2024
Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
RIP: 0010:iwl_mvm_scan_umac_v14_and_above+0x4f3/0xde0 [iwlmvm]
Code: 54 24 30 4c 89 54 24 38 4c 89 44 24 40 eb 0f 83 c6 01 40 0f b6 c6 39 e8 0f 83 fa 00 00 00 40 0f b6 c6 48 8d 04 80 49 8d 3c 86 <44> 39 7f 04 75 df 0f b6 47 11 3c 80 74 15 0f b6 14 24 80 fa 80 0f
RSP: 0018:ffffb64c405a3848 EFLAGS: 00000297
RAX: 000000000000002d RBX: ffff9feac1fcd000 RCX: 0000000000000000
RDX: ffff9feac1fcd46d RSI: 00000000cbac4409 RDI: ffff9fede2e60324
RBP: 000000000000011e R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000003
R13: 0000000000000000 R14: ffff9fede2e60270 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff9ff22f740000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc961bdc000 CR3: 000000029b422000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
 <IRQ>
 ? watchdog_timer_fn+0x1ea/0x270
 ? __pfx_watchdog_timer_fn+0x10/0x10
 ? __hrtimer_run_queues+0x12f/0x2a0
 ? hrtimer_interrupt+0xf8/0x230
 ? __sysvec_apic_timer_interrupt+0x4a/0x140
 ? sysvec_apic_timer_interrupt+0x6d/0x90
 </IRQ>
 <TASK>
 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
 ? iwl_mvm_scan_umac_v14_and_above+0x4f3/0xde0 [iwlmvm]
 ? iwl_mvm_scan_umac_v14_and_above+0x443/0xde0 [iwlmvm]
 iwl_mvm_reg_scan_start+0x3e7/0x660 [iwlmvm]
 iwl_mvm_mac_hw_scan+0x4e/0x70 [iwlmvm]
 drv_hw_scan+0x9f/0x150 [mac80211]
 __ieee80211_start_scan+0x296/0x750 [mac80211]
 ? cfg80211_scan_6ghz+0x3f2/0xef0 [cfg80211]
 rdev_scan+0x25/0xd0 [cfg80211]
 cfg80211_scan_6ghz+0x48b/0xef0 [cfg80211]
 ? ttwu_do_activate+0x64/0x220
 ? try_to_wake_up+0x233/0x670
 ___cfg80211_scan_done+0x1e3/0x250 [cfg80211]
 cfg80211_wiphy_work+0xab/0xe0 [cfg80211]
 process_one_work+0x16d/0x330
 worker_thread+0x273/0x3c0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe5/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x31/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

This makes me think it is actually a network issue.

In the past I had a failing wifi card and when it would disconnect while the PC was expecting a response from the internet it would cause a hard lock and timeout on a CPU and caused crashes. Nothing in the logs showed the cause and I finally fixed it by replacing the wifi card (after already replacing the motherboard and cpu since it was cpu lock and timeouts that were reported in the abrt and crash logs). Once the network was working properly the crashes were totally eliminated.

In my case the final determination of cause was reflected by repeated disconnect and reconnect of the wifi to the AP as shown in both dmesg and journalctl logs.

Yeah, it doesn’t crash when wifi is off.

Do you have any way to know if it’s due to my wifi card or the drivers? dmesg is showing that a specific function is failing, iwl_mvm_scan_umac_v14_and_above, so I’m assuming that the driver is the culprit. I submitted a bugzilla, but I’m not sure if there are any ways to mitigate this issue in the meantime while I wait for a response.

Does either dmesg or journalctl show repeated disconnect & reconnect with the AP in those buildings where the problem occurs?

Do the logs shown with the abrt notifications show any details. Those types of errors should provide crash dumps, oops dumps, or similar to look at.
Something like this

Haven’t checked journalctl, but dmesg shows nothing about connects and disconnects. It warns about a soft lockup detected, and the stack trace which points to a function. I attached my dmesg below, it basically repeats itself every 16 seconds.

My logs also show nothing at all. I think I’ll just submit a redhat bugzilla as well and hope for a patch.

[55337.971903] watchdog: BUG: soft lockup - CPU#5 stuck for 130s! [kworker/u24:6:35898]
[55337.971910] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib ip_set qrtr bnep uinput snd_ctl_led snd_soc_sof_sdw snd_soc_intel_hda_dsp_common sunrpc snd_sof_probes snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw snd_hda_codec_hdmi regmap_sdw_mbq regmap_sdw snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink binfmt_misc soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match vfat snd_soc_acpi fat soundwire_generic_allocation soundwire_bus intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal snd_soc_core intel_powerclamp iwlmvm coretemp snd_compress ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec mac80211 kvm
[55337.971941]  snd_hda_core snd_hwdep irqbypass snd_seq libarc4 rapl btusb snd_seq_device processor_thermal_device_pci spi_nor hid_sensor_als dell_laptop iTCO_wdt btrtl intel_cstate intel_pmc_bxt mei_hdcp mei_pxp mtd spi_ljca gpio_ljca i2c_ljca iTCO_vendor_support intel_rapl_msr dell_wmi iwlwifi intel_uncore snd_pcm btintel hid_sensor_trigger processor_thermal_device dell_wmi_ddv pcspkr btbcm processor_thermal_wt_hint dell_smbios hid_sensor_iio_common snd_timer btmtk processor_thermal_rfim dcdbas industrialio_triggered_buffer cfg80211 bluetooth dell_smm_hwmon dell_wmi_sysman firmware_attributes_class ledtrig_audio dell_wmi_descriptor wmi_bmof usb_ljca mei_me snd spi_intel_pci processor_thermal_rapl kfifo_buf spi_intel i2c_i801 industrialio mei rfkill intel_rapl_common soundcore idma64 i2c_smbus processor_thermal_wt_req thunderbolt igen6_edac processor_thermal_power_floor processor_thermal_mbox int3403_thermal intel_skl_int3472_tps68470 tps68470_regulator int340x_thermal_zone intel_pmc_core clk_tps68470 intel_vsec
[55337.971976]  nft_reject_inet pmt_telemetry intel_hid int3400_thermal nf_reject_ipv4 pmt_class intel_skl_int3472_discrete acpi_thermal_rel sparse_keymap acpi_pad acpi_tad nf_reject_ipv6 joydev nft_reject nft_masq nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables i2c_dev loop nfnetlink zram xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec hid_sensor_hub intel_ishtp_hid i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit polyval_clmulni drm_buddy polyval_generic ttm nvme ghash_clmulni_intel nvme_core drm_display_helper ucsi_acpi sha512_ssse3 video hid_multitouch typec_ucsi intel_ish_ipc sha256_ssse3 spi_pxa2xx_platform sha1_ssse3 typec dw_dmac cec intel_ishtp nvme_auth i2c_hid_acpi i2c_hid wmi pinctrl_tigerlake serio_raw ip6_tables ip_tables fuse
[55337.972008] CPU: 5 PID: 35898 Comm: kworker/u24:6 Tainted: G             L     6.8.5-301.fc40.x86_64 #1
[55337.972010] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.23.0 08/08/2024
[55337.972012] Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
[55337.972071] RIP: 0010:iwl_mvm_scan_umac_v14_and_above+0x4e7/0xde0 [iwlmvm]
[55337.972094] Code: db c6 44 24 08 00 c6 04 24 80 48 89 54 24 30 4c 89 54 24 38 4c 89 44 24 40 eb 0f 83 c6 01 40 0f b6 c6 39 e8 0f 83 fa 00 00 00 <40> 0f b6 c6 48 8d 04 80 49 8d 3c 86 44 39 7f 04 75 df 0f b6 47 11
[55337.972096] RSP: 0018:ffffb64c405a3848 EFLAGS: 00000297
[55337.972097] RAX: 000000000000006c RBX: ffff9feac1fcd000 RCX: 0000000000000000
[55337.972098] RDX: ffff9feac1fcd46d RSI: 000000007760476c RDI: ffff9fede2e60acc
[55337.972099] RBP: 000000000000011e R08: 0000000000000002 R09: 0000000000000000
[55337.972100] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000003
[55337.972100] R13: 0000000000000000 R14: ffff9fede2e60270 R15: 0000000000000000
[55337.972101] FS:  0000000000000000(0000) GS:ffff9ff22f740000(0000) knlGS:0000000000000000
[55337.972102] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[55337.972103] CR2: 00007fc961bdc000 CR3: 000000029b422000 CR4: 0000000000f50ef0
[55337.972104] PKRU: 55555554
[55337.972105] Call Trace:
[55337.972107]  <IRQ>
[55337.972108]  ? watchdog_timer_fn+0x1ea/0x270
[55337.972112]  ? __pfx_watchdog_timer_fn+0x10/0x10
[55337.972114]  ? __hrtimer_run_queues+0x12f/0x2a0
[55337.972117]  ? hrtimer_interrupt+0xf8/0x230
[55337.972119]  ? __sysvec_apic_timer_interrupt+0x4a/0x140
[55337.972122]  ? sysvec_apic_timer_interrupt+0x6d/0x90
[55337.972125]  </IRQ>
[55337.972126]  <TASK>
[55337.972126]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[55337.972131]  ? iwl_mvm_scan_umac_v14_and_above+0x4e7/0xde0 [iwlmvm]
[55337.972146]  ? iwl_mvm_scan_umac_v14_and_above+0x443/0xde0 [iwlmvm]
[55337.972160]  iwl_mvm_reg_scan_start+0x3e7/0x660 [iwlmvm]
[55337.972179]  iwl_mvm_mac_hw_scan+0x4e/0x70 [iwlmvm]
[55337.972196]  drv_hw_scan+0x9f/0x150 [mac80211]
[55337.972264]  __ieee80211_start_scan+0x296/0x750 [mac80211]
[55337.972301]  ? cfg80211_scan_6ghz+0x3f2/0xef0 [cfg80211]
[55337.972337]  rdev_scan+0x25/0xd0 [cfg80211]
[55337.972372]  cfg80211_scan_6ghz+0x48b/0xef0 [cfg80211]
[55337.972407]  ? ttwu_do_activate+0x64/0x220
[55337.972409]  ? try_to_wake_up+0x233/0x670
[55337.972411]  ___cfg80211_scan_done+0x1e3/0x250 [cfg80211]
[55337.972446]  cfg80211_wiphy_work+0xab/0xe0 [cfg80211]
[55337.972479]  process_one_work+0x16d/0x330
[55337.972481]  worker_thread+0x273/0x3c0
[55337.972484]  ? __pfx_worker_thread+0x10/0x10
[55337.972485]  kthread+0xe5/0x120
[55337.972487]  ? __pfx_kthread+0x10/0x10
[55337.972488]  ret_from_fork+0x31/0x50
[55337.972491]  ? __pfx_kthread+0x10/0x10
[55337.972492]  ret_from_fork_asm+0x1b/0x30
[55337.972495]  </TASK>

Those messages are exactly what I was seeing when my system was crashing as a result of a failing wifi card.

Soft lockups and cpu stuck for long enough to cause the crash. Note that even though it shows kworker on the first line you posted the next several lines are all related to network and specifically wifi

The way I understood the issue with my system was that the kernel made a request via the network and the dropped link then prevented a timely reply so the cpu was locked pending a reply that could not be received.

The kernel you show in that post is 6.8.5 which is from April so I wonder if this is happening on a live media boot or if the system just has not been updated since installed?

A quick and inexpensive test.
Get a wifi usb dongle and use it instead of the current wifi. If that fixes the problem then you know it is the wifi card. If it does not then still network related but not the card.

1 Like

I’m going to get a wifi card and test it out, however I’m still uncertain that it is a hardware issue rather than a software issue since I can see the stack trace.

I tried downgrading to see if the newer kernels introduced a regression, but it happens on both 6.8 and 6.10.

I’m going to try out a dongle. I agree with you that it is a wifi card, but I’m not sure if it’s the wifi card failing (hardware), or a bug in the kernel, and I’m not sure how to test it unless I’m using the same wifi card as in my computer.

Since try_to_wake_up is in the call stack, it might be a power management problem. Interestingly, the driver appears to show that power management is disabled by default (at least on 6.10.11).

$ modinfo -p iwlwifi | grep power
power_save:enable WiFi power management (default: disable) (bool)
power_level:default power save level (range from 1 - 5, default: 1) (int)
$ uname -r
6.10.11-100.fc39.x86_64

Were it me, I might try turning off (or turning down) power management in the BIOS.

Anything power option right now relates to only the CPU, or AC/Battery. Do you think power-profiles-daemon could be a cause? I know I have these issues regardless of the power mode I’m on.

I once had some flaky Dell computers that would hang if the BIOS was set to “deeper” sleep modes. I forget how they are labeled (S0, S3?), but I was able to fix them by changing that BIOS setting so they did not attempt to go into the deeper power save modes.

Another thing you might want to look into is updating your firmware.

I’ll update after the weekend when I am able to go back into the problematic buildings.

Firmware is updated to latest versions according to fwupd.

I’ll also try out the Ubuntu LTS kernel and the newest 6.11 stable to see if anything gets fixed.

I don’t know that it means anything or if it could be related to your specific problem in any way, but when I searched kernel.org for “XPS 9315”, this curious comment came up:

… the Linux [hardware] descriptions were buggy so use Windows definition
instead. …

-- [PATCH 0/2] Ignore bad graph port nodes on Dell XPS 9315

There are some ACPI (Advanced Configuration and Power Interface) related changes in that patch. (But again, I don’t know enough about the code base to say whether any of what was changed could have anything to do with this problem. It just looks a little suspicious.)

This is resolved, thanks all for the help. It ended up being something in the kernel drivers, and the patch works for me.

https://bugzilla.kernel.org/show_bug.cgi?id=219375

2 Likes