Fedora 43: Gnome desktop crashing periodically since kernel ~6.17.8

Hi,

I’m struggling w/ a temperamental system. To stay focused I’ll keep to one issue but can share more details as needed. Since Kernel 6.17.8 or 6.17.9 my gnome-shell crashes every 10-20 minutes. Often right after startup. Visually, my desktop freezes for 5-10 seconds, then I am returned to the login screen. I am able to login to a clean session, where the cycle repeats.

I am relatively new at running Linux desktops. I have updated my system to the best of my ability, patching all drivers available and my BIOS. I have tried different combinations of my displays in case there is an interaction w/ me using both HDMI and displayport (it’s just what I have available). I have tried dropping to a single monitor. I have tried using Gnome classic. Nothing seems to change how the problem presents - I am out of ideas. Please let me know if I can provide add’l details, any hints on next steps appreciated.

My system info, logs follow:

os-relase:

NAME="Fedora Linux"
VERSION="43 (Workstation Edition)"
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=43
VERSION_CODENAME=""
PRETTY_NAME="Fedora Linux 43 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:43"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f43/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=43
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=43
SUPPORT_END=2026-12-02
VARIANT="Workstation Edition"
VARIANT_ID=workstation

inxi -Fzxx:

System:
  Kernel: 6.17.12-300.fc43.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 15.2.1
  Desktop: GNOME v: 49.2 tk: GTK v: 3.24.51 wm: gnome-shell dm: GDM
    Distro: Fedora Linux 43 (Workstation Edition)
Machine:
  Type: Desktop Mobo: Gigabyte model: X870E AORUS MASTER v: x.x
    serial: <superuser required> UEFI: American Megatrends LLC. v: F10
    date: 12/12/2025
CPU:
  Info: 8-core model: AMD Ryzen 7 9700X bits: 64 type: MT MCP arch: Zen 5
    rev: 0 cache: L1: 640 KiB L2: 8 MiB L3: 32 MiB
  Speed (MHz): avg: 2653 min/max: 605/5582 boost: enabled cores: 1: 2653
    2: 2653 3: 2653 4: 2653 5: 2653 6: 2653 7: 2653 8: 2653 9: 2653 10: 2653
    11: 2653 12: 2653 13: 2653 14: 2653 15: 2653 16: 2653 bogomips: 121599
  Flags-basic: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a
    ssse3 svm
Graphics:
  Device-1: NVIDIA AD104 [GeForce RTX 4070 SUPER] vendor: ASUSTeK
    driver: nvidia v: 580.119.02 arch: Lovelace pcie: speed: 2.5 GT/s lanes: 16
    ports: active: DP-3 empty: DP-4,DP-5,HDMI-A-3 bus-ID: 01:00.0
    chip-ID: 10de:2783
  Device-2: Advanced Micro Devices [AMD/ATI] Granite Ridge [Radeon Graphics]
    vendor: Gigabyte driver: amdgpu v: kernel arch: RDNA-2 pcie: speed: 16 GT/s
    lanes: 16 ports: active: HDMI-A-2 empty: DP-1, DP-2, HDMI-A-1, Writeback-1
    bus-ID: 78:00.0 chip-ID: 1002:13c0 temp: 37.0 C
  Display: wayland server: Xwayland v: 24.1.9 compositor: gnome-shell
    driver: gpu: amdgpu,nv_platform,nvidia,nvidia-nvswitch display-ID: 0
  Monitor-1: DP-3 model: Lenovo T24v-20 res: 1920x1080 dpi: 93
    diag: 604mm (23.8")
  Monitor-2: HDMI-A-2 model: Dell S2715H res: 1920x1080 dpi: 82
    diag: 686mm (27")
  API: OpenGL v: 4.6.0 vendor: nvidia v: 580.119.02 glx-v: 1.4
    direct-render: yes renderer: NVIDIA GeForce RTX 4070 SUPER/PCIe/SSE2
    display-ID: :0.0
  API: EGL Message: EGL data requires eglinfo. Check --recommends.
  Info: Tools: api: glxinfo gpu: nvidia-settings x11: xdriinfo, xdpyinfo,
    xprop, xrandr
Audio:
  Device-1: NVIDIA AD104 High Definition Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel pcie: speed: 16 GT/s lanes: 16
    bus-ID: 01:00.1 chip-ID: 10de:22bc
  Device-2: Advanced Micro Devices [AMD/ATI] Radeon High Definition Audio
    driver: snd_hda_intel v: kernel pcie: speed: 16 GT/s lanes: 16
    bus-ID: 78:00.1 chip-ID: 1002:1640
  Device-3: Advanced Micro Devices [AMD] Ryzen HD Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel pcie: speed: 16 GT/s lanes: 16
    bus-ID: 78:00.6 chip-ID: 1022:15e3
  API: ALSA v: k6.17.12-300.fc43.x86_64 status: kernel-api
  Server-1: JACK v: 1.9.22 status: off
  Server-2: PipeWire v: 1.4.9 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin
Network:
  Device-1: Qualcomm WCN785x Wi-Fi 7 320MHz 2x2 [FastConnect 7800]
    vendor: Foxconn driver: ath12k_pci v: N/A pcie: speed: 8 GT/s lanes: 1
    bus-ID: 0c:00.0 chip-ID: 17cb:1107
  IF: wlp12s0 state: down mac: <filter>
  Device-2: Realtek RTL8126 5GbE vendor: Gigabyte driver: r8169 v: kernel
    pcie: speed: 8 GT/s lanes: 1 port: e000 bus-ID: 0d:00.0 chip-ID: 10ec:8126
  IF: enp13s0 state: down mac: <filter>
  IF-ID-1: enp118s0u2 state: unknown speed: -1 duplex: unknown mac: <filter>
Bluetooth:
  Device-1: Foxconn / Hon Hai driver: btusb v: 0.8 type: USB rev: 1.1
    speed: 12 Mb/s lanes: 1 bus-ID: 1-10:6 chip-ID: 0489:e10d
  Report: btmgmt ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
    rfk-block: hardware: no software: yes address: <filter> bt-v: 5.3 lmp-v: 12
  Device-2: Samsung Galaxy series misc. (tethering mode) driver: rndis_host
    v: kernel type: USB rev: 2.1 speed: 480 Mb/s lanes: 1 bus-ID: 5-2:5
    chip-ID: 04e8:6863
Drives:
  Local Storage: total: 1.37 TiB used: 34.3 GiB (2.4%)
  ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO 1TB size: 931.51 GiB
    speed: 6.0 Gb/s serial: <filter>
  ID-2: /dev/sdb vendor: HGST (Hitachi) model: HTS545050A7E680
    size: 465.76 GiB speed: 6.0 Gb/s serial: <filter>
  ID-3: /dev/sdc vendor: Kingston model: DataTraveler 3.0 size: 7.33 GiB
    type: USB rev: 3.0 spd: 5 Gb/s lanes: 1 serial: <filter>
Partition:
  ID-1: / size: 211.31 GiB used: 33.47 GiB (15.8%) fs: btrfs dev: /dev/dm-0
    mapped: luks-a373551d-555e-409f-9cb5-811ad722ecd2
  ID-2: /boot size: 973.4 MiB used: 769.7 MiB (79.1%) fs: ext4
    dev: /dev/sda5
  ID-3: /boot/efi size: 96 MiB used: 44.5 MiB (46.4%) fs: vfat
    dev: /dev/sda3
  ID-4: /home size: 211.31 GiB used: 33.47 GiB (15.8%) fs: btrfs
    dev: /dev/dm-0 mapped: luks-a373551d-555e-409f-9cb5-811ad722ecd2
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 40.1 C mobo: N/A gpu: amdgpu temp: 37.0 C
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 32 GiB note: est. available: 30.45 GiB used: 4.31 GiB (14.2%)
  Processes: 465 Power: uptime: 34m wakeups: 0 Init: systemd v: 258
    default: graphical
  Packages: pm: rpm pkgs: N/A note: see --rpm Compilers: gcc: 15.2.1
    Shell: Bash v: 5.3.0 running-in: ptyxis-agent inxi: 3.3.39

Here is a pastebin of journalctl, you can see gnome-shell core dumps at Dec 30 22:02:37: jctl-crash-12-30-22-02 - Pastebin.com

Here are briefer dmesg logs from the same time frame. I’ve tried to highlight lines that may be interesting:

[ 3259.438003] amdgpu 0000:78:00.0: amdgpu: Dumping IP State
[ 3259.439050] amdgpu 0000:78:00.0: amdgpu: Dumping IP State Completed
[ 3259.439106] amdgpu 0000:78:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
[ 3259.439107] amdgpu 0000:78:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
[ 3259.439108] amdgpu 0000:78:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=23367, emitted seq=23369   <---- ERROR?
[ 3259.439110] amdgpu 0000:78:00.0: amdgpu:  Process gnome-shell pid 21782 thread gnome-shel:cs0 pid 21826
[ 3259.439111] amdgpu 0000:78:00.0: amdgpu: Starting gfx_0.1.0 ring reset
[ 3259.615223] amdgpu 0000:78:00.0: amdgpu: Ring gfx_0.1.0 reset failed
[ 3259.615224] amdgpu 0000:78:00.0: amdgpu: GPU reset begin!
[ 3259.672490] amdgpu 0000:78:00.0: amdgpu: MODE2 reset
[ 3259.679491] amdgpu 0000:78:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 3259.679576] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
[ 3259.679589] amdgpu 0000:78:00.0: amdgpu: PSP is resuming...
[ 3259.701132] amdgpu 0000:78:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
[ 3259.900536] amdgpu 0000:78:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 3259.906231] amdgpu 0000:78:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 3259.906232] amdgpu 0000:78:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
[ 3259.906234] amdgpu 0000:78:00.0: amdgpu: SMU is resuming...
[ 3259.906932] amdgpu 0000:78:00.0: amdgpu: SMU is resumed successfully!
[ 3259.907135] amdgpu 0000:78:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
[ 3259.910437] amdgpu 0000:78:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x05002C00
[ 3259.946504] amdgpu 0000:78:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 3259.946506] amdgpu 0000:78:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
[ 3259.946506] amdgpu 0000:78:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
[ 3259.946507] amdgpu 0000:78:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
[ 3259.946508] amdgpu 0000:78:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 3259.946508] amdgpu 0000:78:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 3259.946508] amdgpu 0000:78:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 3259.946509] amdgpu 0000:78:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 3259.946509] amdgpu 0000:78:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 3259.946510] amdgpu 0000:78:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 3259.946510] amdgpu 0000:78:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
[ 3259.946511] amdgpu 0000:78:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
[ 3259.946511] amdgpu 0000:78:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[ 3259.946512] amdgpu 0000:78:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
[ 3259.946512] amdgpu 0000:78:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
[ 3259.946513] amdgpu 0000:78:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[ 3259.948207] amdgpu 0000:78:00.0: amdgpu: GPU reset(3) succeeded!
[ 3259.948221] amdgpu 0000:78:00.0: [drm] device wedged, but recovered through reset
[ 3260.005330] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!   # <-------- ERROR 
[ 3261.250885] rfkill: input handler enabled
[ 3262.242167] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 3262.303728] rfkill: input handler disabled

Welcome to Fedora @also-hat-1827

Have you tested to deactivate all extensions in the extension application?
Then test for a while to see if the issue still appears. If it disappears you can switch every ca 30 mins one to see if the error comes back.

I saw also that you have two GPU’s. With which GPU active, do you get that issue?

Looks like a crash in the amdgpu driver, which takes down the rest of the system as it fails to reinitialise cleanly.

As noted by @ilikelinux, does the crash disappear if you switch to using only the nvidia card?

1 Like

@anothermindbomb, as they mentioned, they are quite new to linux … you might have to help with some further instructions to do that. Hoever the first would be the extensions test :wink: , so we can exclude issues from this side.

Thank you @ilikelinux and @anothermindbomb for the help.

It seems I do not have any extensions enabled. I have not installed any so what is shown below may have come with the Fedora install, they are all in /usr/share/gnome-shell/extensions/

gnome-extensions list -d | grep -e 'Enabled' -e '^[a-z]'

apps-menu@gnome-shell-extensions.gcampax.github.com
  Enabled: No
background-logo@fedorahosted.org
  Enabled: No
launch-new-instance@gnome-shell-extensions.gcampax.github.com
  Enabled: No
places-menu@gnome-shell-extensions.gcampax.github.com
  Enabled: No
window-list@gnome-shell-extensions.gcampax.github.com
  Enabled: No
appindicatorsupport@rgcjonas.gmail.com
  Enabled: No

I will try disabling GPUs next. My BIOS has an option to disable the integrated graphics which I think is what’s using that amdgpu driver. I was less successful finding options for disabling the Nvidia GPU but I will see how it goes w/ the integrated graphics first.

Alright - since disabling my integrated GPU I’ve logged about 6 hours of work and several restarts. Gnome has not crashed once in this time and I’m not seeing any suspicious logs.

I don’t really need 2 GPUs running, so this solves the issue for me. But, I’m interested to carry on troubleshooting the amdgpu driver, though I would need support to continue the process.

If no one can spare the time I understand, I’ll accept an answer in a day or two otherwise.

You’re correct in your annotation of the log where you indicate “<— error?” - indeed - this is the start of it.

Some commands were given to the GPU, and they never finished - the sequence numbers show that. Why was it never finished - no idea. A debug driver and some way of replicating the situation would be required. Anyway - long story short - stuff was sent out to GPU to be done, and it never came back.

The driver recognised this, and kicked off a ring reset which failed.

Things escalated, and the equivalent of a “turn it off an on again” was started. That apparently worked ok according to the reset(3) succeeded message. The driver consider the device “wedged” (knackered) but recovered, so it’s all good… right.

Nope - the userspace device driver kicks back in and is utterly confused by what now it sees after the “switch the card off and on again” that just happned, declares the entire thing to be untrustworthy and throws the ECANCEL error (the -125 error code).

At this point it’s black screen time and a reboot.

Why this happens - no idea - I suspect it’ll take an amdgpu driver developer to poke around and work out why, but as we can’t replicate the fault repliably and on demand, that’s going to make it much harder.

It’s a problem with amd-gpu-firmware, last stable version is 20251021. You can downgrade it.

For example in Fedora Silverblue:

sudo rpm-ostree deploy 4d99657dbc38dcc217256822760019a3fb19455e219ed279f61ca4658fa66fad

This command deploy last stable version without that issue, but remember that Nvidia drivers from PRM Fusion now can’t installed on this version.

He uses workstation so the downgrade command would be:

sudo dnf downgrade amd-gpu-firmware
Version Stable Testing
F43 20251125-1.fc43 20251021-1.fc43

While probably the F43 testing would revert some changes ?!
Source: amd-gpu-firmware - Fedora Packages

  • Update to 20251021 - Revert “update firmware for MT7922 WiFi device” - QCA: Update Bluetooth WCN6856 firmware 2.1.0-00653 to 2.1.0-00659 - iwlwifi: add Bz/Fm and gl FW for core98-161 release - iwlwifi: update Bz/Hr and Bz/Gf firmwares for core98-161 release - iwlwifi: update ty/So/Ma firmwares for core98-161 release - iwlwifi: update cc/Qu/QuZ firmwares for core98-161 release - intel: qat: Fix missing link - amdgpu: DMCUB updates for various ASICs - nvidia: add generic bootloader for GSP-enabled systems - qcom: sync audioreach firmwares from v1.0.0 build - qcom: vpu: rename firmware binaries - Intel IPU7: Update product signed firmware binary - i915: DMC Xe2LPD v2.29 / Xe3LPD v2.32 / Xe3LPD_3002 v2.27 - WHENCE: nvidia: rearrange GSP-RM firmware lines - Add ISH firmware file for Intel Pather Lake platform - Update firmware file for Intel Magnetar/BlazarU/BlazarI core