In the last couple of days I’ve started having strange kernel issues. The first was about four days ago, an exception (IIRC a page fault) in a kernel thread that resulted in it stuck in a spin loop. After updating and rebooting, I’ve had three panics with messages about a clobbered stack canary in __x64_sys_poll
. It’s currently running memtest, so the information I can provide is a bit limited at the moment, but here’s an overview:
- OS: Fedora 42, kernel is whatever is currently in
updates-testing
- The system is about five months old
- CPU: Intel Core Ultra 9 285k
- Memory: 4x Crucial CP32G56C46U5. It was running on a 5200MT/s XMP profile, but the issue has persisted after disabling XMP
- Motherboard: ASUS PRIME Z890M-PLUS WIFI. I tried updating to the latest firmware and the issue still persists.
- GPU: AMD Radeon RX 7800 XT
Every time an issue has occurred has been whilst playing Kerbal Space Program in Wine, but the system has never previously had issues either running KSP or running other workloads so I’m not sure what to make of that. I’m not aware of any obvious trigger; last time it happened was after a couple of hours of running KSP.
Things I’ve tried:
- Updating to the latest kernel
- Dug around the forums and RedHat and kernel Bugzillas for reports of similar issues, but I haven’t found anything similar
- Running memtest. No errors reported either a full pass that I ran yesterday and no errors reported after the one-and-a-half passes that I’m currently running
- Updating the BIOS
- Disabling XMP
Theories:
- Genuine stack-overflow bug in the kernel. It seems unlikely that I alone would be experiencing a serious bug in one of the most used syscalls in the kernel.
- Canary overwritten by exploitation attempt. This also seems unlikely since there’s surely simpler ways to exploit a single-user Linux system.
- Hardware issue. Given the newness of the system and that I’m more used to working with older components, it seems pretty likely that I’ve misinstalled or misconfigured something that’s leading to memory corruption. But it seems very odd that it’s been running perfectly fine for months, that the issue that manifests is specifically a clobbered canary in
poll
, and that memcheck isn’t showing any issues.
I’m at a loss for how to go about further diagnostics. Unfortunately semester starts next week, so a half-dozen day-long memtest runs after swapping out DIMMs or reseating components is not really ideal. Any assistance in narrowing down the issue would be greatly appreciated.