Fedora workstation hangs every time with kernel 6.14.8 (kernel 6.14.6 still works properly)

hello, i just updated fedora with new 6.14.8 kernel, but every time i boot into it computer hangs after about one minute usage

this “should” be output from last boot but maybe someone can help me to provide better logs with journalctl

as of now i am using old 6.14.6 kernel and i don’t have any problems at all

thank you

The logs you attached are of a boot of 6.14.6, so I assume it is not a frozen one. We indeed need logs of a frozen boot. (I assume with “hangs” you mean frozen permanently, right?)

I suggest to boot into 6.14.8, wait until it freezes and wait a minute or two. If I understood it right, your system is then completely frozen and you need to force reboot. If so, do so (so force reboot after it is frozen for a minute or two) :classic_smiley: Then you can boot into a working 6.14.6 boot and get the logs of the frozen boot that was immediately before the current one. You can get this by -boot=-1 (-1 → current boot minus 1 = the one before the current). Therefore, please get the following logs and provide both the same way you provided the earlier logs:
sudo journalctl -r -k --boot=-1 --no-hostname
and
sudo journalctl -r --boot=-1 --no-hostname

Also please provide the following information so that I have that at a glance:

  • output of cat /proc/sys/kernel/tainted
  • output of lscpu | grep odel
  • output of dnf repolist
  • What Fedora variant do you use? Workstation, KDE Spin, Silverblue, Kinoite?
  • Do you have nvidia graphics?

Also, let us know if you have third party software or so installed, and what you have modified/customized on your system.

hello, will do that, thank you

in the meantime, my system is a full amd one and i am using fedora workstation 42 (as stated in the title)

no modifications indeed, i do only have cdemu installed that installs a kernel dkms (or whatever is called) but it’s been installed from literally ages

will have reports soon anyway :slight_smile:

1 Like

ok, let’s start, this is

sudo journalctl -r -k --boot=-1 --no-hostname

and this is

sudo journalctl -r --boot=-1 --no-hostname

i’ll post the rest in the meantime :slight_smile:

cat /proc/sys/kernel/tainted
12288
lscpu | grep odel
Model name:                           AMD Ryzen 7 5700X 8-Core Processor
Model:                                33
dnf repolist
repo id                                        repo name                          
copr:copr.fedorainfracloud.org:agonie:Refine   Copr repo for Refine owned by agoni
copr:copr.fedorainfracloud.org:phracek:PyCharm Copr repo for PyCharm owned by phra
copr:copr.fedorainfracloud.org:rok:cdemu       Copr repo for cdemu owned by rok   
fedora                                         Fedora 42 - x86_64                 
fedora-cisco-openh264                          Fedora 42 openh264 (From Cisco) - x
google-chrome                                  google-chrome                      
opera                                          Opera packages                     
rpmfusion-free                                 RPM Fusion for Fedora 42 - Free    
rpmfusion-free-updates                         RPM Fusion for Fedora 42 - Free - U
rpmfusion-nonfree                              RPM Fusion for Fedora 42 - Nonfree 
rpmfusion-nonfree-nvidia-driver                RPM Fusion for Fedora 42 - Nonfree 
rpmfusion-nonfree-steam                        RPM Fusion for Fedora 42 - Nonfree 
rpmfusion-nonfree-updates                      RPM Fusion for Fedora 42 - Nonfree 
rpmsphere-noarch                               RPM Sphere - Noarch                
updates                                        Fedora 42 - x86_64 - Updates       
vivaldi 

i suppose it should be everything

First of all, you have several modifications in terms of third party repos, not just cdemu:

I consider rpmfusion quite normal, I am not sure about google-chrome (I think that can be enabled by default today), but the others are definitely not considered in our QA/testing.

A major question is why you have nvidia repos enabled if you have no nvidia card? Are you sure you have no nvidia graphics available? You might give us lspci output, and also sudo dnf list --installed *vidia* and glxinfo | grep evice. I just want to verify that there is no nvidia. However, given your logs, I presume for now there is no nvidia driver involved but only amdgpu, which seems indeed indicated by the logs (if that proves right, I suggest to disable nvidia repos!).

That implies two problems your kernel has recognized, both can be linked to the problem, but do not have to:
4096 - An out-of-tree module has been loaded.
8192 - An unsigned module has been loaded in a kernel supporting module signature.

This means your kernel has (a) modification(s), likely one (or more) that is not considered in our QA/testing, and for now I presume it is not nvidia. That could be the cdemu. It might have caused both of these taints, not sure for now.

However, given your logs, I would keep your kernel taints in mind, but I think that what you experience is a known issue (actually, it is two issues that are sometimes easy to confuse) that is currently in assessment by AMD. I assume that because of this:

mag 28 19:28:11 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:11 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:08 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:08 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:07 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:07 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:07 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:07 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 28 19:28:07 kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

You might read my recent 5 posts from another topic beginning here and maybe also the posts of other people there about these two AMD issue cases (keep in mind that in a few posts, people mixed up other issues with these two) to get an overview: the two issues of AMD upstream (#4141 and #4238; links are in my mentioned posts) manifest in a comparable way as you describe it. The logs are comparable, and several people have experienced the issue after 6.14.6 but not with 6.14.6.

However, your logs contain several error entries that are not yet known from others. For now, I would assume that these log entries / differences are linked to your modifications of your kernel. But given what you have in common with the others, especially the log entries with which the problem starts, I would assume for now that you have one of these two issues.

If you want to know for sure, you might try to revert the changes to get your kernel to the taint level 0 and see if that makes a difference in terms of if the error logs change. But I think for now, this is not important: I expect you have one of the two or both issues. So you might also not try for now to save time…

I would tackle your case individually once the other two issues of AMD are solved in a future kernel if the issue then remains in your case. Till then, I suggest to:

  1. read through the topic I mentioned and through the two AMD issue tickets as they already mentioned possibilities to temporarily mitigate the problem until it can be finally solved by a future kernel. What solves temporarily the issue for many is the kernel paramenter amdgpu.dcdebugmask=0x10 which disables panel self-refresh (-> PSR): keep in mind that this increases the power use of your system, so you should not use that permanently but it might be a mitigation until the problem could be finally solved. Beyond, see the tickets, there is some more exchange of people about mitigations.
  2. check early for new updates of the kernel: each time a new kernel is there, remove whatever mitigation you have in place (e.g., amdgpu.dcdebugmask=0x10), and then check if the then-new kernel works again without mitigation
  3. keep watching the AMD tickets: if the tickets get solved in a new kernel, and this new kernel does not work for you (or still needs the mitigation), then let us know here to get deeper into your case!!

Let’s hope for the best, these AMD issues are currently a mess since 6.14 :frowning:

But in the meantime, you might also check what I mentioned above about nvidia - check for sure if there is something of nvidia contained or installed, and if not, remove the repo.

One thing about nvidia: it is normal in any case that one package with nvidia in its name is installed: nvidia-gpu-firmware.noarch 20250509-1.fc42 updates → so if that is the only output about nvidia given the commands I mentioned above (the dnf command in this case), then it is ok and you should be able to just remove the nvidia repo… But if you don’t know how you came to have the nvidia repo enabled, I would indeed check the lspci and glxinfo | grep evice if something of nvidia is there :open_mouth: Even if inactive, just to know about it

1 Like

hello, i will read everything after dinner, but first le me clarify a little

  • refine was enabled with copr months ago, and it was working fine, we all know what refine is
  • opera and vivaldi are their official repos and i’ve enabled them like ten years ago
  • rpmfusion is well, rpmfusion

all the other repos, including the nvidia one are enabled by checking the “third party repos” slider in the fedora configuration, and i’ve enabled them to install steam and to have,of course, steam updated

i could try removing cdemu but there’s nothing wrong or weird with other repos

(and anyway, cdemu too “should” be fine, i also mentioned it in an article i wrote for the fedora magazine)

:slight_smile:

Are you using the rok/cdemu Copr? Do you use an AppIndicator library? Are you using X11? It is not unknown for copr packages to be out of sync with recent kernel updates until the author has time to make required changes.

i am using the rok copr and using wayland

unfortunately i don’t know what the appindicator library is, but if it’s the one for displaying icons on the top bar then no, my system is 98% vanilla…

EDIT: let me say that your command for searching for nvidia things only returns me the firmware package so that’s ok, and the third command you gave me only returns my amd radeon 6600

i’ll look for the other things and update this message in the meantime

EDIT 2: i don’t have any modification to the kernel and cdemu is the only one so i do assume those taints come from there, but if you want i can try removing them

as already said cdemu + the gui client were taken from the rok copr

EDIT 3: i perfectly know how the nvidia repo got enabled, it enables in the initial system configuration where there is a prompt asking you if you want enable third party repos, it enables exactly the google chrome one, the steam one, the python one and the nvidia one. I enabled them for steam only but since they never gave me any problem in years i have never disabled them

EDIT 4: let me point out in fact that 6.14 has been a nightmare for my gpu and seems that latest kernel (i think 6.14.6) solved things after months

anyway i don’t mind as i am sure i will solve the problem one way or another

You are not alone with that. Most has been solved, but two issues remain open (likely even three but the last is just a kernel warning potentially without any measurable impact). It is possible that the issue you experience now is there since 6.14’s beginning but has not been provoked in your case for some time for whatever reason. I think 6.14.6 is what most have as last working kernel, but there are also people who experienced this issue already at 6.14.2. Let’s hope its done soon.

Your decision. You can try to identify if the last log entries are linked to that, but even if not, I would still stick with the assumption that you have one or both of the two AMD issues given the existing positive correlations and the initial log entries that are issued when the issue occurs in the kernel. For now, if I was you, I presume I would just stick with what I suggested above: check out for mitigations in the tickets I mentioned until they are solved, and if the mitigations work, I would consider that more expressive than the appearance of the later log entries (I do not exclude that the last entries are linked to something around Workstation)… Check mitigations, regularly check new kernels without the mitigations to see if solved, and follow the AMD issue tickets to see when they are solved. If the latter does not solve the issue for you, then we have to start investigating further: but that would indeed start with step by step removing your changes, beginning with cdemu. But given the correlations, I think we will not come to that.

I suggest to disable or remove the nvidia repo.

1 Like

let me also point out, i have an openmandriva gnome installation on a secondary ssd, which is at 6.15.0 already and it’s working flawlessly so i will try the mitigations but i do assume that 6.15 will solve my issue

(for the record, i installed openmandriva last month and it was with 6.15 already, the rc versions, so i don’t know how 6.14 performed here)

Could be solved earlier: 6.14.9 is released from the kernel community, and it contains several bug fixes in amdgpu including some from the handler of the two mentioned AMD issues. But I indeed agree that a future kernel will solve the issue. The worst case would be that some recent AMD functions of 6.14 will be reverted if the issue cannot be identified soon. But I hope it gets just solved.

6.14.9 is not yet in the testing pipeline of Fedora, but given today’s release of 6.14.9, I hope it is in our testing later today in available in stable updates in a few days - if all works out perfectly it could be quicker.

But we are indeed already in the transition to 6.15, there is already the first build existing, but the transition to the next mainline kernel takes more time, and that can take some time until it becomes available (a lot testing is done until a new mainline kernel is released to all commnunity - it sometimes gets *.*.2 or *.*.3 before one gets pushed to stable repos). So my guess about the AMD issues is 6.14.9 or 6.14.10.

2 Likes

Hello, writing to inform people that 6.14.9 perfectly solved the problem, i had no freezes at all finally :slight_smile:

3 Likes

The hang still reproduces with kernel 6.14.9-300.fc42.x86_64.