Hard hangs while typing after resume from suspend

I’ve run Fedora KDE spin on a Thinkpad T460 laptop (8GB, Intel® Core™ i5-6300U, Intel HD Graphics 520) with external monitor and Logitech wireless USB mouse for 7 years, the last few atop Wayland. In that time suspend has worked reasonably well; sometimes the laptop comes out of suspend “on its own” within minutes of entering suspend (web search suggests this might be a USB issue).

But for a few weeks now, I resume after choosing Sleep in KDE, and soon in the middle of typing the system hangs. No cursor motion, keyboard unresponsive, lid close does nothing; if the fan was on it stays on. I can only do a hard power-off. Upon powering on, there is no journald information about the hang, typically no journald entries from around the hang, and then lots of boot logging with no obvious smoking gun. One non-standard thing about my setup is I mount an NTFS partition read-write with documents and Firefox and Thunderbird app data (not kernel, not root, not home directories) using the ntfs-3g userspace driver; maybe that FUSE driver hangs hard, maybe when KDE’s baloo indexer reindexes some file on NTFS? But this is pure conjecture, and doesn’t explain why it never hangs until after I suspend then resume.

I always and mostly run konsole, Firefox and Thunderbird for Wayland, and KeePassXC. The hang has occurred while typing in the terminal, typing in Firefox, and once just typing my password at the lock screen. It’s usually within a few minutes of resuming. I don’t think it’s hung while just browsing and scrolling. It doesn’t seem a hardware problem since I’ve never had a hang after power-on, only after resume from suspend.

Any suggestions in how to debug this? I can’t find a Fedora guide to debugging suspend-resume. Is Ubuntu’s guide appropriate? It suggests enabling /sys/power/pm_trace. Others suggest disabling the kernel i915 Intel graphics driver. The wrinkle here is that suspend and resume both seem to work, but shortly after resume I reliably experience the hard hang. So maybe I need to consult a “debugging input hangs” guide. Thanks! :hugs:

I’m working on a Thinkpad X270 and I’m experiencing the same. Similar for other Thinkpad notebooks (also X270s) in my family.

Behavior is always the same: during typing - and only during typing - the machine locks up and becomes fatally unresponsive. I didn’t notice that this only occurs after a resume from suspend, but your correct; the machine does not freeze when freshly rebooted.

I’ve also noticed the upper mouse buttons (the ones with the red stripe) becoming unresponsive after a resume from suspend, sometimes. In these cases I need to either unload/reload the drivers or just suspend/resume again.

I have not yet been able to find out, what the cause is.

This is what the dmesg from before the crash shows.

Mai 25 00:30:34 localhost.localdomain kernel: usb 1-8: device not accepting address 16, error -71
Mai 25 00:30:34 localhost.localdomain kernel: usb 1-8: WARN: invalid context state for evaluate context command.
Mai 25 00:30:34 localhost.localdomain kernel: usb usb1-port8: unable to enumerate USB device
Mai 25 00:30:35 localhost.localdomain kernel: usb 1-8: new full-speed USB device number 17 using xhci_hcd
Mai 25 00:30:36 localhost.localdomain kernel: usb 1-8: device descriptor read/64, error -71
Mai 25 00:30:36 localhost.localdomain kernel: usb 1-8: device descriptor read/64, error -71
Mai 25 00:30:36 localhost.localdomain kernel: usb 1-8: new full-speed USB device number 18 using xhci_hcd
Mai 25 00:30:36 localhost.localdomain kernel: usb 1-8: device descriptor read/64, error -71
Mai 25 00:30:36 localhost.localdomain kernel: usb 1-8: device descriptor read/64, error -71
Mai 25 00:30:36 localhost.localdomain kernel: usb usb1-port8: attempt power cycle
Mai 25 00:30:37 localhost.localdomain kernel: usb 1-8: new full-speed USB device number 19 using xhci_hcd
Mai 25 00:30:37 localhost.localdomain kernel: usb 1-8: Device not responding to setup address.
Mai 25 00:30:37 localhost.localdomain kernel: usb 1-8: Device not responding to setup address.
Mai 25 00:30:37 localhost.localdomain kernel: usb 1-8: device not accepting address 19, error -71
Mai 25 00:30:37 localhost.localdomain kernel: usb 1-8: WARN: invalid context state for evaluate context command.
Mai 25 00:30:37 localhost.localdomain kernel: usb 1-8: new full-speed USB device number 20 using xhci_hcd
Mai 25 00:30:37 localhost.localdomain kernel: usb 1-8: Device not responding to setup address.
Mai 25 00:30:37 localhost.localdomain kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [kworker/1:4:266]

[...]

Mai 25 00:34:05 localhost.localdomain kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 328s! [kworker/1:4:266]
Mai 25 00:34:05 localhost.localdomain kernel: Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 >
Mai 25 00:34:05 localhost.localdomain kernel:  snd_hda_codec_generic tps6598x typec intel_rapl_common snd_soc_acpi iwlmvm snd_soc_core intel_pmc_core_pltdrv snd_compress intel_pmc_core ac97_>
Mai 25 00:34:05 localhost.localdomain kernel:  rtsx_pci sha256_ssse3 sha1_ssse3 cec nvme_auth video wmi serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
Mai 25 00:34:05 localhost.localdomain kernel: CPU: 1 PID: 266 Comm: kworker/1:4 Tainted: G        W    L     6.8.9-300.fc40.x86_64 #1
Mai 25 00:34:05 localhost.localdomain kernel: Hardware name: LENOVO 20K5S46403/20K5S46403, BIOS R0IET67W (1.45 ) 02/22/2022
Mai 25 00:34:05 localhost.localdomain kernel: Workqueue: events linkwatch_event
Mai 25 00:34:05 localhost.localdomain kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x72/0x2d0
Mai 25 00:34:05 localhost.localdomain kernel: Code: 77 79 f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 55 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 8>
Mai 25 00:34:05 localhost.localdomain kernel: RSP: 0000:ffffa51b0045fba0 EFLAGS: 00000202
Mai 25 00:34:05 localhost.localdomain kernel: RAX: 0000000000000001 RBX: ffff9705cb57b428 RCX: 0000000000000000
Mai 25 00:34:05 localhost.localdomain kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9705cb57b428
Mai 25 00:34:05 localhost.localdomain kernel: RBP: ffff9705c21fd138 R08: 0000000000000000 R09: 0000000000000000
Mai 25 00:34:05 localhost.localdomain kernel: R10: ffff9705c21fd200 R11: 0000000000000010 R12: ffff9705cb57b428
Mai 25 00:34:05 localhost.localdomain kernel: R13: ffff9705c21fd000 R14: ffffa51b0045fc97 R15: ffffa51b0045fcb8
Mai 25 00:34:05 localhost.localdomain kernel: FS:  0000000000000000(0000) GS:ffff970cd1680000(0000) knlGS:0000000000000000
Mai 25 00:34:05 localhost.localdomain kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mai 25 00:34:05 localhost.localdomain kernel: CR2: 0000561675db5028 CR3: 0000000105296002 CR4: 00000000003706f0
Mai 25 00:34:05 localhost.localdomain kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mai 25 00:34:05 localhost.localdomain kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mai 25 00:34:05 localhost.localdomain kernel: Call Trace:
Mai 25 00:34:05 localhost.localdomain kernel:  <IRQ>
Mai 25 00:34:05 localhost.localdomain kernel:  ? watchdog_timer_fn+0x1ea/0x270
Mai 25 00:34:05 localhost.localdomain kernel:  ? __pfx_watchdog_timer_fn+0x10/0x10
Mai 25 00:34:05 localhost.localdomain kernel:  ? __hrtimer_run_queues+0x113/0x280
Mai 25 00:34:05 localhost.localdomain kernel:  ? ktime_get_update_offsets_now+0x49/0x110
Mai 25 00:34:05 localhost.localdomain kernel:  ? hrtimer_interrupt+0xf8/0x230
Mai 25 00:34:05 localhost.localdomain kernel:  ? __sysvec_apic_timer_interrupt+0x4a/0x140
Mai 25 00:34:05 localhost.localdomain kernel:  ? sysvec_apic_timer_interrupt+0x6d/0x90
Mai 25 00:34:05 localhost.localdomain kernel:  </IRQ>
Mai 25 00:34:05 localhost.localdomain kernel:  <TASK>
Mai 25 00:34:05 localhost.localdomain kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Mai 25 00:34:05 localhost.localdomain kernel:  ? native_queued_spin_lock_slowpath+0x72/0x2d0
Mai 25 00:34:05 localhost.localdomain kernel:  _raw_spin_lock+0x29/0x30
Mai 25 00:34:05 localhost.localdomain kernel:  e1000e_get_stats64+0x22/0x120 [e1000e]
Mai 25 00:34:05 localhost.localdomain kernel:  dev_get_stats+0x65/0x150
Mai 25 00:34:05 localhost.localdomain kernel:  ? __nla_reserve+0x3c/0x50
Mai 25 00:34:05 localhost.localdomain kernel:  rtnl_fill_stats+0x3b/0x130
Mai 25 00:34:05 localhost.localdomain kernel:  rtnl_fill_ifinfo+0x83d/0x1550
Mai 25 00:34:05 localhost.localdomain kernel:  ? __kmalloc_node_track_caller+0x33a/0x4d0
Mai 25 00:34:05 localhost.localdomain kernel:  ? __alloc_skb+0x8a/0x1a0
Mai 25 00:34:05 localhost.localdomain kernel:  rtmsg_ifinfo_build_skb+0xae/0x110
Mai 25 00:34:05 localhost.localdomain kernel:  rtmsg_ifinfo+0x3c/0x90
Mai 25 00:34:05 localhost.localdomain kernel:  netdev_state_change+0x89/0x90
Mai 25 00:34:05 localhost.localdomain kernel:  linkwatch_do_dev+0x4f/0x60
Mai 25 00:34:05 localhost.localdomain kernel:  __linkwatch_run_queue+0xdf/0x220
Mai 25 00:34:05 localhost.localdomain kernel:  linkwatch_event+0x31/0x40
Mai 25 00:34:05 localhost.localdomain kernel:  process_one_work+0x16f/0x330
Mai 25 00:34:05 localhost.localdomain kernel:  worker_thread+0x273/0x3c0
Mai 25 00:34:05 localhost.localdomain kernel:  ? __pfx_worker_thread+0x10/0x10
Mai 25 00:34:05 localhost.localdomain kernel:  kthread+0xe5/0x120
Mai 25 00:34:05 localhost.localdomain kernel:  ? __pfx_kthread+0x10/0x10
Mai 25 00:34:05 localhost.localdomain kernel:  ret_from_fork+0x31/0x50
Mai 25 00:34:05 localhost.localdomain kernel:  ? __pfx_kthread+0x10/0x10
Mai 25 00:34:05 localhost.localdomain kernel:  ret_from_fork_asm+0x1b/0x30
Mai 25 00:34:05 localhost.localdomain kernel:  </TASK>
Mai 25 00:34:33 localhost.localdomain kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 354s! [kworker/1:4:266]
Mai 25 00:34:33 localhost.localdomain kernel: Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 >
Mai 25 00:34:33 localhost.localdomain kernel:  snd_hda_codec_generic tps6598x typec intel_rapl_common snd_soc_acpi iwlmvm snd_soc_core intel_pmc_core_pltdrv snd_compress intel_pmc_core ac97_>
Mai 25 00:34:33 localhost.localdomain kernel:  rtsx_pci sha256_ssse3 sha1_ssse3 cec nvme_auth video wmi serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
Mai 25 00:34:33 localhost.localdomain kernel: CPU: 1 PID: 266 Comm: kworker/1:4 Tainted: G        W    L     6.8.9-300.fc40.x86_64 #1
Mai 25 00:34:33 localhost.localdomain kernel: Hardware name: LENOVO 20K5S46403/20K5S46403, BIOS R0IET67W (1.45 ) 02/22/2022
Mai 25 00:34:33 localhost.localdomain kernel: Workqueue: events linkwatch_event
Mai 25 00:34:33 localhost.localdomain kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x72/0x2d0
Mai 25 00:34:33 localhost.localdomain kernel: Code: 77 79 f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 55 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 8>
Mai 25 00:34:33 localhost.localdomain kernel: RSP: 0000:ffffa51b0045fba0 EFLAGS: 00000202
Mai 25 00:34:33 localhost.localdomain kernel: RAX: 0000000000000001 RBX: ffff9705cb57b428 RCX: 0000000000000000
Mai 25 00:34:33 localhost.localdomain kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9705cb57b428
Mai 25 00:34:33 localhost.localdomain kernel: RBP: ffff9705c21fd138 R08: 0000000000000000 R09: 0000000000000000
Mai 25 00:34:33 localhost.localdomain kernel: R10: ffff9705c21fd200 R11: 0000000000000010 R12: ffff9705cb57b428
Mai 25 00:34:33 localhost.localdomain kernel: R13: ffff9705c21fd000 R14: ffffa51b0045fc97 R15: ffffa51b0045fcb8

We do not have enough info, but this seems the source of the crash.

Look through journalctl -xeand see if it shows anything more. Could produce useful info.

I found something, but I doubt that it is related. This has to do with wifi.

Mai 25 00:33:43 abrt-notification[3374]: [🡕] System encountered a non-fatal error in iwl_mvm_rx_umac_scan_complete_notif()
░░ Subject: ABRT has detected a non-fatal system error
░░ Defined-By: ABRT
░░ Support: https://bugzilla.redhat.com/
░░ Documentation: man:abrt(1)
░░ 
░░ WARNING: CPU: 0 PID: 9847 at drivers/net/wireless/intel/iwlwifi/mvm/scan.c:3158 iwl_mvm_rx_umac_scan_complete_notif+0x1f8/0x210 [iwlmvm] [iwlmvm]
░░ 
░░ Use the abrt command-line tool for further analysis or to report
░░ the problem to the appropriate support site.

I’ll try to see if I can get anything out of abrt.

Thanks for chiming in! I had given up and just stopped using standby/suspend :frowning_face:. I looked in the logs I saved back in October 2023 and there were no usb .* (WARN|Device not responding or soft lockup messages. I did have a Logitech F310 USB gamepad plugged in back then but I’m pretty sure I tried unplugging it. Did you do anything to get more log output?

I never use the Thinkpad’s mouse buttons above the trackpad, or its “mouse nipple.”

One time in May 2024, processes on my Thinkpad started hanging and there were messages like

May 04 18:41:20 fedlaptop kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 708s! [kworker/2:0:173352]

in journalctl. But I was able to continue in some windows, it wasn’t an immediate hard total lockup while typing. I don’t think this happened since.

I’ll start using standby/suspend again. If anyone has guidance to get more potentially useful kernel logging, thanks in advance!

… and my Thinkpad running KDE hasn’t hung in 10 or so suspend/resume cycles! :partying_face:

Maybe software improvements since September 2023 fixed the hard hangs. Also, in the interim I switched from GRUB 2 to systemd-boot (losing secure boot).