High CPU usage from kwin_wayland

I’m on Fedora 41 using KDE. Yesterday when I started my desktop, it booted into an emergency shell. I burned a live disk, chrooted in to my system, and set the root password. Then I rebooted, logged in to the emergency shell, and if I remember the order of events correctly, found that /lib/modules was missing and reinstalled all of the kernel* packages, which repopulated the directory.

I rebooted again and didn’t get dropped into an emergency shell, but the Plymouth boot screen was much larger than normal, like it was set for a lower resolution and stretched out to fit my monitor. KDE runs extremely slow and my CPU usage is almost at 100% on all cores, with ~93% of each core used by /usr/bin/kwin_wayland. If I switch to a tty terminal, the CPU usage drops to a normal idle state (1-3%).

I looked through dmesg and journalctl and the probable cause seems to be this portion of dmesg:

[    5.706491] [drm] amdgpu kernel modesetting enabled.
[    5.706682] amdgpu: Virtual CRAT table created for CPU
[    5.706693] amdgpu: Topology: Add CPU node
[    5.706880] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x7480 0x148C:0x2421 0xCF).
[    5.706891] [drm] register mmio base: 0xFBB00000
[    5.706892] [drm] register mmio size: 1048576
[    5.710084] [drm] add ip block number 0 <soc21_common>
[    5.710086] [drm] add ip block number 1 <gmc_v11_0>
[    5.710088] [drm] add ip block number 2 <ih_v6_0>
[    5.710089] [drm] add ip block number 3 
[    5.710090] [drm] add ip block number 4 
[    5.710091] [drm] add ip block number 5 
[    5.710092] [drm] add ip block number 6 <gfx_v11_0>
[    5.710093] [drm] add ip block number 7 <sdma_v6_0>
[    5.710094] [drm] add ip block number 8 <vcn_v4_0>
[    5.710095] [drm] add ip block number 9 <jpeg_v4_0>
[    5.710096] [drm] add ip block number 10 <mes_v11_0>
[    5.721843] [drm] BIOS signature incorrect 0 0
[    5.721854] amdgpu 0000:03:00.0: No more image in the PCI ROM
[    5.721872] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    5.721874] amdgpu: ATOM BIOS: 113-EXT85100-001
[    5.721906] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/psp_13_0_7_sos.bin failed with error -2
[    5.721908] [drm:amdgpu_device_init.cold [amdgpu]] ERROR early_init of IP block  failed -19
[    5.722431] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/smu_13_0_7.bin failed with error -2
[    5.722433] [drm:amdgpu_device_init.cold [amdgpu]] ERROR early_init of IP block  failed -19
[    5.722801] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/dcn_3_2_1_dmcub.bin failed with error -2
[    5.722803] [drm:amdgpu_device_init.cold [amdgpu]] ERROR early_init of IP block  failed -19
[    5.723174] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_2_pfp.bin failed with error -2
[    5.723175] [drm:amdgpu_device_init.cold [amdgpu]] ERROR early_init of IP block <gfx_v11_0> failed -19
[    5.723557] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/sdma_6_0_2.bin failed with error -2
[    5.723561] [drm:amdgpu_device_init.cold [amdgpu]] ERROR early_init of IP block <sdma_v6_0> failed -19
[    5.723931] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/vcn_4_0_4.bin failed with error -2
[    5.723933] [drm:amdgpu_device_init.cold [amdgpu]] ERROR early_init of IP block <vcn_v4_0> failed -19
[    5.724305] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_2_mes_2.bin failed with error -2
[    5.724307] amdgpu 0000:03:00.0: amdgpu: try to fall back to gc_11_0_2_mes.bin
[    5.724324] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_2_mes.bin failed with error -2
[    5.724325] [drm:amdgpu_device_init.cold [amdgpu]] ERROR early_init of IP block <mes_v11_0> failed -19
[    5.724685] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[    5.724691] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.

I tried various combinations of reinstalling the firmware and kernel packages until I finally reinstalled every package on the system with dnf reinstall $(rpm -qa --qf="%{N}-%{V}\n" | sort) --skip-unavailable as root, but I still get this behavior. The amdgpu module is loaded and the amdgpu driver is shown in the output of lsinitrd. The graphics card is a Radeon RX 7600.

Possible causes of this issue: I performed an upgrade on 10/15 and installed a piece of legacy software, WordPerfect 8.1 for Linux, using the script file for Fedora at Installing WordPerfect 8.1 for Linux on a distro current in or after 2019 on 10/16. I don’t remember if I rebooted between those events, but the first entry in my dnf history from my attempts to fix the problem are from 10/17.

For completeness, I tried installing Sway. In Sway, the CPU is at idle, but I have no hardware acceleration if I try to play a video in a browser. Also, in KDE, my brightness control usually has a title for the monitor connected via displayport that says something like “Acer XV370” but it currently says “Unknown-1”.

I’ve also taken the most recent update for Fedora 41, which had kernel-6.11.4-300 and associated packages in it, but the problem persists even on the new kernel.

Any ideas/info needed?

I wonder if you installed a bunch of debug kernels by mistake? Those are only meant for use by developers. Depending on what all you’ve done, it is possible that you’ve got your system into a state that may require a reinstall to correct. FWIW, I would recommend doing a reinstall, then, before you do too much tinkering, create a snapshot of your installation. If you are judicious about creating snapshots before you make significant changes to your system, then you will have an easy way out if you accidentally shoot yourself in the foot (which Linux will let you do!)

No, when I say reinstall, I really mean reinstall, not install. The kernel packages currently on the system (rpm -qa | grep ^kernel) are:

kernel-core-6.11.3-300.fc41.x86_64
kernel-modules-core-6.11.3-300.fc41.x86_64
kernel-modules-6.11.3-300.fc41.x86_64
kernel-6.11.3-300.fc41.x86_64
kernel-modules-extra-6.11.3-300.fc41.x86_64
kernel-headers-6.11.3-300.fc41.x86_64
kernel-srpm-macros-1.0-24.fc41.noarch
kernel-modules-core-6.11.4-300.fc41.x86_64
kernel-core-6.11.4-300.fc41.x86_64
kernel-modules-6.11.4-300.fc41.x86_64
kernel-6.11.4-300.fc41.x86_64
kernel-modules-extra-6.11.4-300.fc41.x86_64

If I try to install kernel-debug-core via dnf reinstall, I get a message saying the package is available but not installed, and dnf exits.

You can try regenerating the plymouth theme or switching to a different one.

sudo plymouth-set-default-theme spinner -R

You might need to install the va-drivers from rpmfussion to get the hardware acceleration to work: https://rpmfusion.org/Howto/Multimedia

Use vainfo to see what codecs are available for hardware acceleration.

I’m pretty sure the issue has something to do with the AMD firmware being loaded. Here are the amdgpu dmesg lines from a boot on a live DVD:

[    9.611884] [drm] amdgpu kernel modesetting enabled.
[    9.612102] amdgpu: Virtual CRAT table created for CPU
[    9.612123] amdgpu: Topology: Add CPU node
[    9.628765] amdgpu 0000:03:00.0: No more image in the PCI ROM
[    9.628788] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    9.628791] amdgpu: ATOM BIOS: 113-EXT85100-001
[    9.685024] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
[    9.734006] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[    9.734674] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    9.734743] amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[    9.734747] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    9.734913] [drm] amdgpu: 8176M of VRAM memory ready
[    9.734915] [drm] amdgpu: 30093M of GTT memory ready.
[    9.795093] amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x81fc000000 for PSP TMR
[    9.888901] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    9.896580] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    9.896582] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    9.896663] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000035, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x00525c00 (82.92.0)
[    9.896666] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[    9.940339] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[   10.216075] amdgpu: HMM registered 8176MB device memory
[   10.223177] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[   10.223190] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[   10.223230] amdgpu: Virtual CRAT table created for GPU
[   10.223347] amdgpu: Topology: Add dGPU node [0x7480:0x1002]
[   10.223350] kfd kfd: amdgpu: added device 1002:7480
[   10.223363] amdgpu 0000:03:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 32
[   10.223367] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[   10.223368] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[   10.223369] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[   10.223370] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[   10.223372] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[   10.223373] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[   10.223374] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[   10.223375] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[   10.223376] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[   10.223377] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[   10.223378] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[   10.223379] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[   10.223381] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[   10.223382] amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[   10.229627] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[   10.230501] [drm] Initialized amdgpu 3.58.0 for 0000:03:00.0 on minor 0
[   10.240176] fbcon: amdgpudrmfb (fb0) is primary device
[   10.370309] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   75.624091] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

Installing mesa-va-drivers-freeworld is all I’ve done to get access to the extra video acceleration entrypoints (the output from vainfo changed to show that more entrypoints were available). If there is more to it, maybe someone else will be able to help.

Alright, just in case anything helpful is provided by vainfo, here’s the output on the live DVD (videos play smoothly, hardware acceleration seems to be working):

Trying display: wayland
libva info: VA-API version 1.22.0
libva info: Trying to open /usr/lib64/dri-nonfree/radeonsi_drv_video.so
libva info: Trying to open /usr/lib64/dri-freeworld/radeonsi_drv_video.so
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_22
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.22 (libva 2.22.0)
vainfo: Driver version: Mesa Gallium driver 24.2.0 for AMD Radeon RX 7600 (radeonsi, navi33, LLVM 18.1.8, DRM 3.58, 6.11.0-0.rc5.43.fc41.x86_64)
vainfo: Supported profile and entrypoints
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileAV1Profile0            : VAEntrypointVLD
      VAProfileAV1Profile0            : VAEntrypointEncSlice
      VAProfileNone                   : VAEntrypointVideoProc

Output on the installed system:

Trying display: wayland
libva info: VA-API version 1.22.0
libva info: Trying to open /usr/lib64/dri-nonfree/simpledrm_drv_video.so
libva info: Trying to open /usr/lib64/dri-freeworld/simpledrm_drv_video.so
libva info: Trying to open /usr/lib64/dri/simpledrm_drv_video.so
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit

It looks like radeonsi_drv_video.so is provided by mesa-va-drivers. I checked and that package is installed at the current available version (24.2.4-1).

Apparently it is running on the simpledrm driver. Do you have any kernel parameters set to blacklist the amdgpu driver? It might alternatively be done via a config file in /etc/modprobe.d or /etc/dracut.conf.d.

If not, maybe you just need to regenerate your initramfs. I’d need to see the output from sudo uname -r and sudo ls /boot to tell you the exact command to do that.

Nope, here’s cat /proc/cmdline:

BOOT_IMAGE=(hd9,gpt2)/vmlinuz-6.11.4-300.fc41.x86_64 root=UUID=<snip> ro rootflags=subvol=fedora_newroot rhgb quiet

There’s nothing that mentions amdgpu in /etc/modprobe.d or /etc/dracut.conf.d.

6.11.4-300.fc41.x86_64

config-6.11.3-300.fc41.x86_64  grub2                                                    loader                             symvers-6.11.4-300.fc41.x86_64.xz                  vmlinuz-6.11.3-300.fc41.x86_64
config-6.11.4-300.fc41.x86_64  initramfs-0-rescue-64324b5198794b96bbd256d543e9176a.img  lost+found                         System.map-6.11.3-300.fc41.x86_64                  vmlinuz-6.11.4-300.fc41.x86_64
efi                            initramfs-6.11.3-300.fc41.x86_64.img                     memtest86+x64.efi                  System.map-6.11.4-300.fc41.x86_64
extlinux                       initramfs-6.11.4-300.fc41.x86_64.img                     symvers-6.11.3-300.fc41.x86_64.xz  vmlinuz-0-rescue-64324b5198794b96bbd256d543e9176a

OK. Try running sudo dracut -f /boot/initramfs-6.11.4-300.fc41.x86_64.img and reboot. Then run that vainfo command again and see if it reports that it is using the right driver.

You can also use lspci -k as a more generic way to see what drivers are being used for which devices in your system.

Just tried that, the output of vainfo is the same, failing to find simpledrm.

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev cf)
        Subsystem: Tul Corporation / PowerColor Device 2421
        Kernel modules: amdgpu

Hmm, I’m not sure why it isn’t loading the amdgpu driver. Does dmesg | grep amdgpu give any clues?

For comparison, this is what my lspci -k output shows:

00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] (rev da)
	Subsystem: Lenovo Device 364f
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

Only the failures when loading the firmware (edit: pasting the entire output of that section of dmesg instead of just the lines with amdgpu):

[    6.979981] [drm] amdgpu kernel modesetting enabled.
[    6.980191] amdgpu: Virtual CRAT table created for CPU
[    6.980203] amdgpu: Topology: Add CPU node
[    6.980450] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x7480 0x148C:0x2421 0xCF).
[    6.980472] [drm] register mmio base: 0xFBB00000
[    6.980475] [drm] register mmio size: 1048576
[    6.983735] [drm] add ip block number 0 <soc21_common>
[    6.983737] [drm] add ip block number 1 <gmc_v11_0>
[    6.983739] [drm] add ip block number 2 <ih_v6_0>
[    6.983740] [drm] add ip block number 3 <psp>
[    6.983741] [drm] add ip block number 4 <smu>
[    6.983743] [drm] add ip block number 5 <dm>
[    6.983744] [drm] add ip block number 6 <gfx_v11_0>
[    6.983745] [drm] add ip block number 7 <sdma_v6_0>
[    6.983747] [drm] add ip block number 8 <vcn_v4_0>
[    6.983748] [drm] add ip block number 9 <jpeg_v4_0>
[    6.983749] [drm] add ip block number 10 <mes_v11_0>
[    6.996169] [drm] BIOS signature incorrect 0 0
[    6.996185] amdgpu 0000:03:00.0: No more image in the PCI ROM
[    6.996209] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    6.996213] amdgpu: ATOM BIOS: 113-EXT85100-001
[    6.996257] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/psp_13_0_7_sos.bin failed with error -2
[    6.996261] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* early_init of IP block <psp> failed -19
[    6.997005] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/smu_13_0_7.bin failed with error -2
[    6.997010] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* early_init of IP block <smu> failed -19
[    6.997663] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/dcn_3_2_1_dmcub.bin failed with error -2
[    6.997666] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* early_init of IP block <dm> failed -19
[    6.998243] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_2_pfp.bin failed with error -2
[    6.998246] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* early_init of IP block <gfx_v11_0> failed -19
[    6.998840] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/sdma_6_0_2.bin failed with error -2
[    6.998845] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* early_init of IP block <sdma_v6_0> failed -19
[    6.999423] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/vcn_4_0_4.bin failed with error -2
[    6.999426] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* early_init of IP block <vcn_v4_0> failed -19
[    7.000015] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_2_mes_2.bin failed with error -2
[    7.000018] amdgpu 0000:03:00.0: amdgpu: try to fall back to gc_11_0_2_mes.bin
[    7.000044] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_2_mes.bin failed with error -2
[    7.000046] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* early_init of IP block <mes_v11_0> failed -19
[    7.000687] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[    7.000692] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.

I’ve checked /usr/lib/firmware/amdgpu and these files are all there.

Does rpm -V amd-gpu-firmware show any output? (It shouldn’t unless some of the files are corrupt or otherwise altered.)

No (extending response for Discourse)

All I can find about “early_init of IP block <whatever> failed” is that some people with AMD cards have been seeing that since kernel 6.8.

I guess Fedora 41 shipped with a newer kernel than that. How about trying this – dnf downgrade --releasever=40 kernel* That should install an older kernel on your system. Then try booting it and see if it works.

All I’m seeing available as a downgrade from dnf is 6.11.3-200 (for releasever=40) and 6.11.3-100 (for releasever=39). I did try booting from my existing older kernels and recovery kernel when this issue initially happened before I resorted to the live DVD, and none of those fixed the problem. I can check and see what the kernel version is on the live system, but I suspect it’s later than 6.8.

How about if you add --repo=fedora as a parameter to dnf?

That shows 6.8.5-301 for the kernel, still worth a shot. I’ll try installing it and update the post after rebooting.

Update: no change. The GPU firmware still fails to load and vainfo still tries to load simpledrm.

1 Like