Plasmashell unresponsive in hard loop

Copied from top:

2683 user   20   0 4871172 364608 214168 R 100.3   0.7  14:39.10 plasmashell

Brand new MSI Tomahawk X870E w/ AMD 9950X (it’s fast) with Fedora 42 installed from Live USB stick. It seemed to be fine until I upgraded to the latest bits (Discover) yesterday (I put the machine together yesterday, installed Windows 11 for the usual benchmarks, then Fedora for the “preferred” OS. For all its faults, W11 does provide a pretty solid basis for comparison and troubleshooting).

In more than 50% of cases on login plasma shell is unresponsive - there’s no panel and the widgets are not animated. It appears that plasmashell has a race condition or deadly spinlock - as you can see it’s burning one core. F42 GNU/Linux is alive and functional, but Plasma is not (using it now in this state).

I can believe there’s a concurrency issue in there that’s manifested by the speediness of this machine. I have installed F42 on other machines (AMD 7950X and a couple of Alder Lake laptops) and they’re mostly behaving themselves. Everything seemed good with the older F42 from the Live USB but a) those bits were older and there was a ton of upgrades (700+) and b) booting off USB is slowing everything down.

I did the usual due diligence - searched here, googled & looked in Redhat Bugzilla without finding anything salient. I will probably raise a bug but it would be interested to know if others are seeing similar problems. (I have other problems with this config on F42 that I’ll be raising momentarily - have to compare with Windows 11 on the same machine first, sad to say).

I’m encountering this Plasmashell crash due to some ISO C++ API changes. Maybe it’s related?

Fix from the thread:

For anyone finding this a quick workaround was to set animation speed to 1/2 or to 3/4

Thanks. I don’t believe it’s the same problem: plasmashell locks hard (not 100% repro, hence likely timing/concurrency bug). I don’t get the option to do anything with it in this state - nothing in the plasma UI is responsive, and there’s no crash to speak of.

Is kde working long enough to start a two terminals and run sudo journalctl -f in one and journalctl --user -f in the other?

If that does not work start then in a pair of virtual consoles (Ctrl-Alt-F4, Ctlr-Alt-F5).

If so do you see errors when it hangs?

Alas I have bigger problems - tried flashing the BIOS for a NIC issue on both Windows & Linux and now it’s bricked.

The system was working fine, just that plasmashell was stuck in a hard loop somewhere taking up 100% of one CPU thread. I did try making a config change in Dolphin at one point which tried to do some UI task and hung too. Konsole was working 100%.

Hopefully I will have the machine back and running, but I have to try and replace or RMA the mobo first. FWIW I don’t believe these hardware problems were related to the plasmashell hang - I’ve been having Plasma hang issues on other machines, but it seems that this one being more rapid than the others caused the plasmashell problem to manifest more frequently/reproducibly. I’ll come back here if/when I get the machine going again.

OK, I’m back online - had to get a new motherboard and RMA the old one!

I just updated to the latest bits and rebooted: plasmashell in a tight loop again, so it wasn’t anything to do with the motherboard (unless it’s pathological). System is otherwise operational (using it now). The two journalctl instances are not showing me anything useful that I can see. The only thing in --user -f that looks even semi-significant to me is the following:

May 22 17:59:35 fedora konsole[4602]: QLayout: Cannot add a null widget to QHBoxLayout/
May 22 17:59:35 fedora systemd[2161]: Started app-org.kde.konsole-4602.scope.
May 22 17:59:37 fedora konsole[4602]: qt.qpa.wayland: Creating a popup with a parent, QWidgetWindow(0x5590bfd0c4a0, name="MainWindow#1Window") which does not match the current topmost grabbing popup, QWidgetWindow(0x5590c01ae190, name="session-popup-menuWindow") With some shell surface protocols, this is not allowed. The wayland QPA plugin is currently handling it by setting the parent to the topmost grabbing popup. Note, however, that this may cause positioning errors and popups closing unxpectedly. Please fix the transient parent of the popup.

Everything appears to be running correctly in Windows 11 booted off the same PCIe5 M.2 SSD.

Hmph. I just killed plasmashell via sudo kill -9 plasmashell and it restarted and is now behaving normally. Seems that there’s a pretty fatal race condition on system start, on this speedy system at any rate. It’s a fresh install of Fedora 42, from a Live USB. Can’t see anything interesting in dmesg or journalctl -b

I’ve done the usual due diligence but don’t see anyone else having this specific issue with a recent version of Fedora 42/KDE (I just updated to the very latest bits, and that killed off WiFi again which is having a separate problem caused by a regression in the ath12k driver - I have to downgrade it to a version from March if I want WiFi).

That sounds similar to what I’ve seen with GNOME: Gnome does not allow login

Log-in sometimes either works, or sits at a frozen screen. Had it since F38 or F39, still saw it F41, and the only fix I still can’t figure out was to use Legacy/CSM boot vs UEFI.

I’d try a different boot type (if UEFI, use Legacy/CSM).

Are you using anything semi- or better -exotic, hardware-wise (UHD 630 so maybe not)? I have quite a few issues with plasmashell: it usually eventually gets into a state where I have to reboot (TBF, I don’t know for a fact that the issue is plasmashell or some other component and plasmashell is getting the blame).

I’m mostly using an i7-12700 (XPS) laptop which is no slouch, with the iGPU (it has a 3050 but Fedora doesn’t seem to be able to use that). Since I upgraded to Fedora 42 on that machine plasmashell has been faulting fairly reproducibly when I mouse over the panel which I’ve set to hide to conserve screen real estate.

Now I have this new X870E Tomahawk motherboard with an AMD 9950X processor which is hands down the fastest machine I have and in virtually 100% of cases when I boot and log into F42 plasmashell will sit in a tight loop (it has well over a hundred threads, at least it does now that it’s operational since I killed it and it restarted). It has all the hallmarks of a timing bug that’s only showing up with a faster processor/system. That said, there must be quite a few people in the community running faster/more capable hardware, in which case I’d expect to see more instances of this. (Intel 13th & 14th gen i9s and the Core Ultra 2 285K are faster in single-threaded workloads than anything AMD has, short of X3D parts. I don’t have one of those. I’m assuming it’s a processor/synchronization issue but it could be memory/cache-related or even I/O. I’m not sure about how to go about debugging it other than siccing gdb on it. I’m more of a Windbg expert than gdb (I have to keep a cheat sheet close at hand to use gdb.))