Stack-protector: Kernel stack is corrupted in __x64_sys_poll

In the last couple of days I’ve started having strange kernel issues. The first was about four days ago, an exception (IIRC a page fault) in a kernel thread that resulted in it stuck in a spin loop. After updating and rebooting, I’ve had three panics with messages about a clobbered stack canary in __x64_sys_poll. It’s currently running memtest, so the information I can provide is a bit limited at the moment, but here’s an overview:

  • OS: Fedora 42, kernel is whatever is currently in updates-testing
  • The system is about five months old
  • CPU: Intel Core Ultra 9 285k
  • Memory: 4x Crucial CP32G56C46U5. It was running on a 5200MT/s XMP profile, but the issue has persisted after disabling XMP
  • Motherboard: ASUS PRIME Z890M-PLUS WIFI. I tried updating to the latest firmware and the issue still persists.
  • GPU: AMD Radeon RX 7800 XT

Every time an issue has occurred has been whilst playing Kerbal Space Program in Wine, but the system has never previously had issues either running KSP or running other workloads so I’m not sure what to make of that. I’m not aware of any obvious trigger; last time it happened was after a couple of hours of running KSP.

Things I’ve tried:

  • Updating to the latest kernel
  • Dug around the forums and RedHat and kernel Bugzillas for reports of similar issues, but I haven’t found anything similar
  • Running memtest. No errors reported either a full pass that I ran yesterday and no errors reported after the one-and-a-half passes that I’m currently running
  • Updating the BIOS
  • Disabling XMP

Theories:

  1. Genuine stack-overflow bug in the kernel. It seems unlikely that I alone would be experiencing a serious bug in one of the most used syscalls in the kernel.
  2. Canary overwritten by exploitation attempt. This also seems unlikely since there’s surely simpler ways to exploit a single-user Linux system.
  3. Hardware issue. Given the newness of the system and that I’m more used to working with older components, it seems pretty likely that I’ve misinstalled or misconfigured something that’s leading to memory corruption. But it seems very odd that it’s been running perfectly fine for months, that the issue that manifests is specifically a clobbered canary in poll, and that memcheck isn’t showing any issues.

I’m at a loss for how to go about further diagnostics. Unfortunately semester starts next week, so a half-dozen day-long memtest runs after swapping out DIMMs or reseating components is not really ideal. Any assistance in narrowing down the issue would be greatly appreciated.

Does this mean you were over-clocking the memory?

It is possible that the code that detected the problem is not the code that caused the problem.

Yes, although it’s not like I was manually dialing in values to push beyond manufacturer specifications. The RAM kits are labelled on the box as operating at 5600MT/s, and the XMP profiles that allow such speeds are stored in on-DIMM ROMs which are automatically applied by the BIOS.

That’s true, but I don’t understand your point. Does that hint at something I should be trying?

I bug in component X can corrupt memory and crash component Y.
So saying Y is used by everybody so why did no one else see the problem is because others do not have X on their systems.

Does that help?

I understood what you meant. It just sounded like I was meant to glean something about how to proceed from your observation, and I wasn’t sure what that was.

Since the issue is reproducible without XMP, I believe at this point you should file a bug with Fedora. The Fedora kernel maintainers would be able to help you nail this down much faster (they usually aren’t on the forums afaict).

It would also be helpful for you to list out the kernel versions you have tried, and whether any of them did not exhibit the problem, given that you said it’s a recent occurrence.


If you wanted to stress test your system further, stressapptest, mprime and y-cruncher are great tools for stressing both RAM and CPU. When it comes to stability testing, it’s recommended that you employ a wide range of tools, as certain faults might only show up in one tool but not another.