Since the kernel was updated to 6.15 and the latest Linux firmware packages, I have had many screen related issues.
I followed some topics here and waited for the updates to the firmware/kernel pair 6.15.4 (which resolved some display issues but not screen freeze) and 6.15.5, which did not resolve screen freeze.
I have a “AMD Radeon RX 7900XTX” GPU and 2 screens:
main screen : 2560x1440:240Hz
secondary screen : 1920x1080:60Hz
Here is the behavior:
If I start the computer with only the secondary screen, no problem.
If I start the computer with the main screen connected, the PC freezes when SDDM opens (Black screen with cursor).
If I start only with the secondary screen, and connect the main screen on SDDM before opening the session: the PC freeze.
If I start only with the secondary screen, and connect the main screen after opening the session: the PC freeze, except if the main screen is configured to 60 Hz only.
So there are likely still some issues with the kernel 6.15 and/or the amdgpu linux firmware.
This command for initramfs does not work (Maybe because I am on ostree):
$ sudo dracut -f
dracut[F]: Can't write to /boot/efi/ac2864e6a98d4ff8885b5993ba29498f/6.15.5-200.fc42.x86_64: Directory /boot/efi/ac2864e6a98d4ff8885b5993ba29498f/6.15.5-200.fc42.x86_64 does not exist or is not accessible.
Could you provide the logs of a broken/frozen boot? Maybe also logs of two different variants of the problem: you elaborated different ways of causing the issue.
You might provoke the issue, and then boot again and get the logs of the broken boot, which then is the last (but not the current) boot. To get the last boot’s logs, you can use sudo journalctl -r -k --no-hostname --boot=-1 (-> the -1 implies the current boot minus 1). Then provoke the problem in a different way, and again reboot and get the logs.
In both cases, please provide the full logs, and if you want, you might add additionally the last 30 seconds of sudo journalctl -r --no-hostname --boot=-1 (this is without -k, so that contains much more, so the last 30 seconds should be fine about this). Feel free to anonymize MAC addresses or UUIDs if you consider them private or so.
Keep in mind that at least one of the AMD issues occurring for some weeks (now I think even a few months) are still in processing, though there is indication that 6.16 might have solved it. It’s a complex one But the point is, it is not forgotten but the AMD maintainer is working on it. Maybe your logs indicate if that is what you are experiencing (it is possible that in your hardware/software constellation it was dormant for long and not provoked before 6.15.4/5 → so it could be the same issue evaluated for some time even if you have not experience it before). Let’s see what your logs say.
I also had issues with the 6.15.x kernels with a Radeon RX 7800 XT:
6.15.3-200.fc42: Severe artifacts (half screen gray/static) progressing to complete crash
6.15.4-200.fc42: White/gray bars 6.15.5-200.fc42: White/gray bars
6.15.6-200.fc42: White/gray bars progressing to complete crash with white/gray screen
I reported the bug on bugzilla.
I rebuilt the kernel 6.14.11 initramfs with my current firmware: no artifacts, everything works wonderfully.
When I boot any kernel 6.15.3+: artifacts are present, and I get crashes on 6.15.6.
This seems to prove that the regression comes from the kernel. At least, it wouldn’t be coming from the firmware.
The full logs without screen freeze (journalctl -r -k --no-hostname --boot=0) : Main screen not connected on start, login to SDDM, then connection of the main screen (Logs cover all steps).
Focusing on the amdgpu error:
$ journalctl -r --no-hostname --boot=-1 -g amdgpu
juil. 14 13:40:26 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\n"
juil. 14 13:40:25 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:24 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:23 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:22 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:21 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:20 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
juil. 14 13:40:20 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
juil. 14 13:40:20 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:19 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:18 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:17 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:16 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:15 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:15 kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
juil. 14 13:40:14 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
juil. 14 13:40:14 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
juil. 14 13:40:14 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
juil. 14 13:40:14 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
juil. 14 13:40:14 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:13 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
juil. 14 13:40:13 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
juil. 14 13:40:13 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:12 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:11 kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
juil. 14 13:40:11 sddm-helper-start-wayland[2350]: "kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver\nkwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues\nkwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'\n"
juil. 14 13:40:10 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
juil. 14 13:40:10 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000024 SMN_C2PMSG_82:0x00000001
juil. 14 13:40:07 kernel: [drm:parse_edid_cea.constprop.0.isra.0 [amdgpu]] *ERROR* EDID CEA parser failed
juil. 14 13:40:07 kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
juil. 14 13:40:05 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to get current clock freq!
juil. 14 13:40:05 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
juil. 14 13:40:05 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000024 SMN_C2PMSG_82:0x00000001
juil. 14 13:39:59 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
juil. 14 13:39:59 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000024 SMN_C2PMSG_82:0x00000001
juil. 14 13:39:54 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to get current clock freq!
juil. 14 13:39:54 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
juil. 14 13:39:54 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000024 SMN_C2PMSG_82:0x00000001
juil. 14 13:39:52 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
juil. 14 13:39:48 kernel: amdgpu 0000:03:00.0: amdgpu: (-62) failed to disable fullscreen 3D power profile mode
juil. 14 13:39:48 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to set workload mask 0x00000001
juil. 14 13:39:48 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000024 SMN_C2PMSG_82:0x00000001
juil. 14 13:39:32 kernel: snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
juil. 14 13:39:32 kernel: snd_hda_intel 0000:6d:00.1: bound 0000:6d:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: [drm] Cannot find any crtc or sizes
juil. 14 15:39:02 kernel: [drm] Initialized amdgpu 3.63.0 for 0000:6d:00.0 on minor 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: [drm] Registered 4 planes with drm panic
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: Runtime PM not available
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 2, active_cu_number 2
juil. 14 15:39:02 kernel: kfd kfd: amdgpu: added device 1002:164e
juil. 14 15:39:02 kernel: amdgpu: Topology: Add dGPU node [0x164e:0x1002]
juil. 14 15:39:02 kernel: amdgpu: Virtual CRAT table created for GPU
juil. 14 15:39:02 kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
juil. 14 15:39:02 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: SMU is initialized successfully!
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: RAP: optional rap ta ucode is not available
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: RAS: optional ras ta ucode is not available
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: Found VCN firmware Version ENC: 1.33 DEC: 4 VEP: 0 Revision: 6
juil. 14 15:39:02 kernel: [drm] amdgpu: 96179M of GTT memory ready.
juil. 14 15:39:02 kernel: [drm] amdgpu: 512M of VRAM memory ready
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
juil. 14 15:39:02 kernel: amdgpu: ATOM BIOS: 102-RAPHAEL-008
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: Fetched VBIOS from VFCT
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 9 <jpeg_v3_0>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 8 <vcn_v3_0>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 7 <sdma_v5_2>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 6 <gfx_v10_0>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 5 <dm>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 4 <smu>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 3 <psp>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 2 <navi10_ih>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 1 <gmc_v10_0>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: amdgpu: detected ip block number 0 <nv_common>
juil. 14 15:39:02 kernel: amdgpu 0000:6d:00.0: enabling device (0006 -> 0007)
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
juil. 14 15:39:02 kernel: fbcon: amdgpudrmfb (fb0) is primary device
juil. 14 15:39:02 kernel: [drm] Initialized amdgpu 3.63.0 for 0000:03:00.0 on minor 1
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: [drm] Registered 4 planes with drm panic
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: SE 6, SH per SE 2, CU per SH 8, active_cu_number 96
juil. 14 15:39:02 kernel: kfd kfd: amdgpu: added device 1002:744c
juil. 14 15:39:02 kernel: amdgpu: Topology: Add dGPU node [0x744c:0x1002]
juil. 14 15:39:02 kernel: amdgpu: Virtual CRAT table created for GPU
juil. 14 15:39:02 kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
juil. 14 15:39:02 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
juil. 14 15:39:02 kernel: amdgpu: HMM registered 24560MB device memory
juil. 14 15:39:02 kernel: amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x004e8100 (78.129.0)
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x85fc000000 for PSP TMR
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 11
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 11
juil. 14 15:39:01 kernel: [drm] amdgpu: 96179M of GTT memory ready.
juil. 14 15:39:01 kernel: [drm] amdgpu: 24560M of VRAM memory ready
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: VRAM: 24560M 0x0000008000000000 - 0x00000085FEFFFFFF (24560M used)
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not presented.
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: MEM ECC is not presented.
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: vgaarb: deactivate vga console
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
juil. 14 15:39:01 kernel: amdgpu: ATOM BIOS: 113-4E4710U-T4Y
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 5 <dm>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 4 <smu>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 3 <psp>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 2 <ih_v6_0>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 0 <soc21_common>
juil. 14 15:39:01 kernel: amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
juil. 14 15:39:01 kernel: amdgpu: Topology: Add CPU node
juil. 14 15:39:01 kernel: amdgpu: Virtual CRAT table created for CPU
juil. 14 15:39:01 kernel: amdgpu: ATPX version 1, functions 0x00000000
juil. 14 15:39:01 kernel: amdgpu: vga_switcheroo: detected switching method \_SB_.PCI0.GP17.VGA_.ATPX handle
juil. 14 15:39:01 kernel: [drm] amdgpu kernel modesetting enabled.
Not -g amdgpu but -k. You presume that a) the issue is amdgpu related, which you should not even if the “indication” in the logs goes into that direction (I have seen wifi driver issues that manifested in amdgpu error logs), and b) even if the issue is amdgpu-related, all indicative logs are solely amdgpu. Both you should not presume in the beginning.
Please provide the logs as asked for if you want further support.
Supplement: ok, I got them now, thanks for the update. But in the second logs, you are sure --boot=0 is the affected one? So you have the issue but can still access the console?
So, that is not the #4141 I earlier referred to. I cannot exclude it is the same but with a different output/manifest on your system given a different hardware/software constellation or so, but for now, I would assume it is something new, and I cannot find an existing bug ticket with some of your error logs…
I suggest to open a new ticket upstream: Making sure you're not a bot! (maybe the maintainer will resolve your ticket into another one, but that decision should be done by them)
Please elaborate the issue, add the kernel logs (so the journalctl -r -k --no-hostname --boot=<number of broken boot> logs), and verify with earlier kernels if they are still without the issue or if they are now affected too: just to ensure it is not a firmware package or something else that triggers the issue (if now all kernels are affected, let us know here before creating an upstream ticket). Add exactly what kernels you have tested in the current state of the system (so with the update state of today or whatever), and what was the last kernel working without the issue with the current state, and what the first kernel is that now has the issue.
Beyond that, are you able to test upstream kernels? So could you test kernels that are still in development stages and that might contain the risk to break your system? This is very useful information and the highest value for the maintainer (it can save a lot of time and resources in solving your bug), but these kernels can be dangerous and bring the risk of loss of data (so you should have backups of everything important + the time to spare for reinstalling Fedora) → that’s unlikely, but a risk with upstream kernels not yet tested for Fedora, so this should be done only if you are sure about this.
Ah, sorry, you have kinoite. So yeah, you have to transfer the “normal” Fedora dnf commands to rpm-ostree. Cannot help with that as I have no experience with Kinoite
Indeed, but would be good to test kernels in between → or do you know for sure at which it started? So 6.15.3 was the first while you had 6.14.11 before that? (6.15.3 was the first 6.15 in Fedora stable while 6.15.1/2 was used only by testers). So to get at which one it started. Especially if you cannot test upstream kernels, that might be very relevant information.
It would be useful if you provide the link of the upstream ticket here, so that the Fedora issue is linked with them, and therefore others with that issue can find it and also provide their information directly at the right place (rather than opening new tickets).
That can be helpful to see if the issue manifests differently on different systems. Can be helpful to solve the issue.
6.15.0 is available only in koji for f43 but this likely would boot in f42. However, if you know 6.15.1 always works while 6.15.2 is the first kernel to be broken while you can reproduce the issue 100%, you already have the necessary information and should provide it in the ticket.
Since the Fedora kernel is not equal to the mainline, you might provide the maintainer with the very links of the respective kernels so that they can review the patches contained if they want: kernel-6.15.1-200.fc42kernel-6.15.2-201.fc42kernel-6.15.3-200.fc42 → though it would be interesting if the issue already occurs at kernel-6.15.2-200.fc42 (that’s 6.15.2-200 rather than 6.15.2-201 → you tested already the second build of 6.15.2).
If the issue was introduced between 6.15.2-200 and -201, the origin might be identified easily, given the small difference between the two:
* Fri Jun 13 2025 Justin M. Forbes <jforbes@fedoraproject.org> [6.15.2-0]
- wifi: ath12k: support MLO as well if single_chip_mlo_support flag is set (Baochen Qiang)
- wifi: ath12k: use fw_features only when it is valid (Baochen Qiang)
- wifi: ath12k: introduce ath12k_fw_feature_supported() (Baochen Qiang)
- aarch64: Switch TI_SCI_CLK and TI_SCI_PM_DOMAINS symbols to built-in (Peter Robinson)
- redhat/configs: fedora: set some qcom clk, icc, and pinctrl drivers to built in (Brian Masney)
Though I expect these patches introduced by -201 to be unrelated. So I expect 200 will behave as 201 Though you might try to be sure… as mentioned, I already experienced a wifi driver bug to cause an amdgpu error.
Ok, that is really useful information. Provide that to the maintainer, and add the extract from me about the difference between 200 and 201. (the issue might be in the patches of 201, but we cannot say if they contain the bug or if they trigger the bug)
In this case, I suggest to also add a bug ticket against our Fedora kernel, and please provide a link in both tickets to the respective other. In all cases provide all information, but I expect the 200/201 difference is paramount.
You already have the amd issue tracker link. Here is the one for Fedora-specific bugzilla: Log in to Red Hat Bugzilla (use your normal Fedora account)
→ at our bugzilla, please report the bug against the product “Fedora” and then the component “kernel” → once you have choosen the latter, a template will be added to the bug ticket. Please consider the template and ensure everything is filled.
You might also add a link to this Discourse topic here → so that maintainer of Fedora or amd can see what is going on here, e.g., if more people show up with that issue or so.