AMD Gpu Crash while watching video | Mesa 24.1

I Got this problem after watching video using celluloid (mpv based). The GPU Crash, system freeze, and then the HDMI signal disconnect and reconnect again. Same thing happen when play video using vlc. But didn’t happen when play video in browser using microsoft edge (i think it because edge using software video decoder?).
VLC and Celluloid didn’t give any usefull information.
But, in dmesg after crash in celluloid:

[   86.225209] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:40 vmid:2 pasid:32773)
[   86.225215] amdgpu 0000:05:00.0: amdgpu:  for process celluloid pid 3744 thread celluloid:cs0 pid 3774)
[   86.225217] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000ffffffe00000 from IH client 0x1b (UTCL2)
[   86.225222] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00240C50
[   86.225224] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPG (0x6)
[   86.225226] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[   86.225227] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[   86.225228] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
[   86.225230] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[   86.225231] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x1
[   96.274883] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=9887, emitted seq=9889
[   96.275248] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process celluloid pid 3744 thread celluloid:cs0 pid 3774
[   96.275469] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
[   96.369722] amdgpu 0000:05:00.0: amdgpu: MODE2 reset
[   96.369804] amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
[   96.370015] [drm] PCIE GART of 1024M enabled.
[   96.370018] [drm] PTB located at 0x000000F47FC00000
[   96.370123] amdgpu 0000:05:00.0: amdgpu: PSP is resuming...
[   97.089189] amdgpu 0000:05:00.0: amdgpu: reserve 0x400000 from 0xf47f800000 for PSP TMR
[   97.378826] amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
[   97.389689] amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
[   97.389692] amdgpu 0000:05:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[   97.389696] amdgpu 0000:05:00.0: amdgpu: SMU is resuming...
[   97.389933] amdgpu 0000:05:00.0: amdgpu: SMU is resumed successfully!
[   97.390479] [drm] DMUB hardware initialized: version=0x01010028
[   97.599158] [drm] kiq ring mec 2 pipe 1 q 0
[   97.602214] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[   97.602257] [drm] JPEG decode initialized successfully.
[   97.602261] amdgpu 0000:05:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[   97.602263] amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[   97.602265] amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[   97.602266] amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[   97.602268] amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[   97.602269] amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[   97.602270] amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[   97.602272] amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[   97.602273] amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[   97.602274] amdgpu 0000:05:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[   97.602276] amdgpu 0000:05:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
[   97.602277] amdgpu 0000:05:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8
[   97.602279] amdgpu 0000:05:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8
[   97.602280] amdgpu 0000:05:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8
[   97.602281] amdgpu 0000:05:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8
[   97.604100] amdgpu 0000:05:00.0: amdgpu: recover vram bo from shadow start
[   97.604103] amdgpu 0000:05:00.0: amdgpu: recover vram bo from shadow done
[   97.604128] amdgpu 0000:05:00.0: amdgpu: GPU reset(2) succeeded!

After that celluloid play my video, but i think didn’t use hardware (vaapi / vdpau).

After try using vlc, same thing happen:

[  157.956589] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:40 vmid:4 pasid:32777)
[  157.956599] amdgpu 0000:05:00.0: amdgpu:  for process vlc pid 3936 thread vlc:cs0 pid 4048)
[  157.956602] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000ffffffe00000 from IH client 0x1b (UTCL2)
[  157.956607] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00440C50
[  157.956609] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPG (0x6)
[  157.956612] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[  157.956614] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[  157.956615] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
[  157.956617] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[  157.956619] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x1
[  168.467240] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=14073, emitted seq=14075
[  168.467601] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vlc pid 3936 thread vlc:cs0 pid 4048
[  168.467859] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
[  168.566147] amdgpu 0000:05:00.0: amdgpu: MODE2 reset
[  168.566225] amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
[  168.566426] [drm] PCIE GART of 1024M enabled.
[  168.566429] [drm] PTB located at 0x000000F47FC00000
[  168.566464] amdgpu 0000:05:00.0: amdgpu: PSP is resuming...
[  169.286877] amdgpu 0000:05:00.0: amdgpu: reserve 0x400000 from 0xf47f800000 for PSP TMR
[  169.571494] amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
[  169.581966] amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
[  169.581969] amdgpu 0000:05:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  169.581972] amdgpu 0000:05:00.0: amdgpu: SMU is resuming...
[  169.582195] amdgpu 0000:05:00.0: amdgpu: SMU is resumed successfully!
[  169.582637] [drm] DMUB hardware initialized: version=0x01010028
[  169.799338] [drm] kiq ring mec 2 pipe 1 q 0
[  169.801531] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[  169.801575] [drm] JPEG decode initialized successfully.
[  169.801578] amdgpu 0000:05:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[  169.801580] amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  169.801582] amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  169.801584] amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[  169.801586] amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[  169.801587] amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[  169.801589] amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[  169.801590] amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[  169.801592] amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[  169.801594] amdgpu 0000:05:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[  169.801595] amdgpu 0000:05:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
[  169.801597] amdgpu 0000:05:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8
[  169.801599] amdgpu 0000:05:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8
[  169.801600] amdgpu 0000:05:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8
[  169.801602] amdgpu 0000:05:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8
[  169.803548] amdgpu 0000:05:00.0: amdgpu: recover vram bo from shadow start
[  169.803551] amdgpu 0000:05:00.0: amdgpu: recover vram bo from shadow done
[  169.803591] amdgpu 0000:05:00.0: amdgpu: GPU reset(4) succeeded!
[  169.803577] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

But vlc didn’t want to open, even after i disabled hardware decoding in input/output configuration.

After GPU Crash, my GPU Utilization become always 100%:


Only back to normal if i do restart, and crash again if play video.

First thing this incident happen when watching movie, with 1080p resolution and strange size (9,4 GB). Then, suddenly the incident happen to other of my video as well.
Currently, i can play video flawlessly using ffplay (didn’t use hardware decoding i guess).
I also learn, clicking video properties in nautilus also make GPU Crash.

My System:

  • CPU/GPU: AMD Ryzen 5 5600G.
  • Mother Board: Asrock A520M-HVS.
  • OS: Fedora 40.
  • DE: Gnome on Wayland (also happen in X11).
  • Kernel: Linux 6.9.6-200.fc40.x86_64 (also happen when try boot in previous kernel version).

I want to try live iso or other os, but i lost my usb stick :sweat_smile:
Anyone can help me?

Can you post the output of inxi -Fzxx please so we have more details.

Do you have an example web page that shows this problem that I can test against?

inxi -Fzxx:

System:
  Kernel: 6.9.6-200.fc40.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.41-37.fc40
  Console: pty pts/1 wm: gnome-shell DM: GDM Distro: Fedora Linux 40
    (Workstation Edition)
Machine:
  Type: Desktop Mobo: ASRock model: A520M-HVS serial: <filter> UEFI: American
    Megatrends LLC. v: L3.44 date: 02/27/2024
CPU:
  Info: 6-core model: AMD Ryzen 5 5600G with Radeon Graphics bits: 64
    type: MT MCP arch: Zen 3 rev: 0 cache: L1: 384 KiB L2: 3 MiB L3: 16 MiB
  Speed (MHz): avg: 832 high: 2994 min/max: 400/4464 cores: 1: 400 2: 400
    3: 2994 4: 400 5: 2992 6: 400 7: 400 8: 400 9: 400 10: 400 11: 400 12: 400
    bogomips: 93431
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
    driver: amdgpu v: kernel arch: GCN-5 pcie: speed: 8 GT/s lanes: 16 ports:
    active: HDMI-A-2 empty: DP-1,HDMI-A-1 bus-ID: 05:00.0 chip-ID: 1002:1638
    temp: 37.0 C
  Display: server: X.Org v: 24.1 with: Xwayland v: 24.1.0
    compositor: gnome-shell driver: X: loaded: amdgpu
    unloaded: fbdev,modesetting,vesa dri: radeonsi gpu: amdgpu display-ID: :0
    screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96
  Monitor-1: HDMI-A-2 mapped: HDMI-2 model: HDMI res: 1920x1080 dpi: 92
    diag: 604mm (23.8")
  API: EGL v: 1.5 platforms: device: 0 drv: radeonsi device: 1 drv: swrast
    surfaceless: drv: radeonsi x11: drv: radeonsi inactive: gbm,wayland
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.1.2 glx-v: 1.4
    direct-render: yes renderer: AMD Radeon Graphics (radeonsi renoir LLVM
    18.1.6 DRM 3.57 6.9.6-200.fc40.x86_64) device-ID: 1002:1638
  API: Vulkan v: 1.3.283 surfaces: xcb,xlib device: 0 type: integrated-gpu
    driver: N/A device-ID: 1002:1638 device: 1 type: cpu driver: N/A
    device-ID: 10005:0000
Audio:
  Device-1: AMD Renoir Radeon High Definition Audio driver: snd_hda_intel
    v: kernel pcie: speed: 8 GT/s lanes: 16 bus-ID: 05:00.1 chip-ID: 1002:1637
  Device-2: AMD Family 17h/19h HD Audio vendor: ASRock driver: snd_hda_intel
    v: kernel pcie: speed: 8 GT/s lanes: 16 bus-ID: 05:00.6 chip-ID: 1022:15e3
  API: ALSA v: k6.9.6-200.fc40.x86_64 status: kernel-api
  Server-1: JACK v: 1.9.22 status: off
  Server-2: PipeWire v: 1.0.7 status: n/a (root, process) with:
    1: pipewire-pulse status: active 2: wireplumber status: active
    3: pipewire-alsa type: plugin
Network:
  Device-1: Realtek RTL8188EE Wireless Network Adapter driver: rtl8188ee
    v: kernel pcie: speed: 2.5 GT/s lanes: 1 port: e000 bus-ID: 03:00.0
    chip-ID: 10ec:8179
  IF: wlp3s0 state: up mac: <filter>
  Device-2: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
    vendor: ASRock driver: r8169 v: kernel pcie: speed: 2.5 GT/s lanes: 1
    port: d000 bus-ID: 04:00.0 chip-ID: 10ec:8168
  IF: enp4s0 state: down mac: <filter>
  IF-ID-1: docker0 state: down mac: <filter>
Drives:
  Local Storage: total: 476.94 GiB used: 262.62 GiB (55.1%)
  ID-1: /dev/sda vendor: V-Gen model: 07SM22SCY512HY size: 476.94 GiB
    speed: 6.0 Gb/s serial: <filter> temp: 34 C
Partition:
  ID-1: / size: 467.89 GiB used: 262.59 GiB (56.1%) fs: ext4 dev: /dev/sda2
  ID-2: /boot/efi size: 511 MiB used: 26.2 MiB (5.1%) fs: vfat
    dev: /dev/sda1
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 46.4 C mobo: N/A gpu: amdgpu temp: 37.0 C
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 16 GiB note: est. available: 13.51 GiB used: 2.86 GiB (21.2%)
  Processes: 357 Power: uptime: 1m wakeups: 0 Init: systemd v: 255
    target: graphical (5) default: graphical
  Packages: pm: flatpak pkgs: 12 Compilers: clang: 18.1.6 gcc: 14.1.1
    Shell: fish v: 3.7.0 running-in: kgx inxi: 3.3.34

This problem happen when playing video from my storage disk using celluloid/mpv/vlc. Not in the browser, in my browser (edge), video play perfectly.

Update, hardware accelerated video decoding works perfectly on firefox:


So i guess, it only happen in celluloid/mpv/vlc/maybe other video player as well (except ffplay).

Which mesa version do you have installed? There is a possibly related bug, which was fixed now with version 24.1.2-7.fc40, according to this comment.


24.1.2…

UPDATE:

I downgrade mesa version to: 24.0.5-1.fc40
And everything working perfectly…
So, it’s bug in mesa then…

1 Like

Mesa freeworld packages maybe, which didn’t yet get the fix?

Maybe swapping temporarily back to the mesa drivers from fedora repos might be the solution.

Downgrading all mesa package to 24.0.5-1.fc40 fix my problem, thanks for your insight…

2 Likes

Sure, happy it helped.

However, the current version of mesa should already have the fix, it’s just that it is not yet available in the RPM Fusion repos.

Oh, i see… Why there is no “conflict” when i upgrading that fedora package?

I think when i’m upgrading my packages, when rpmfusion didn’t have the version yet. The package supposed not to update…

I don’t think there should be any conflict. The package names are different. You probably followed the instructions from RPM Fusion’s Multimedia page, and did the following:

sudo dnf swap mesa-va-drivers mesa-va-drivers-freeworld
sudo dnf swap mesa-vdpau-drivers mesa-vdpau-drivers-freeworld

Same problem here, can’t play video using totem, crash every time. Weird because i can use apps like Davinci Resolve, OBS.

Please consider opening a new thread. This thread has a solution. If that does not work for you we would need more details.

1 Like

The fedora mesa backports are responsible for this issue.
There is zero coordination between rpmfusion and mesa.

https://bugzilla.rpmfusion.org/show_bug.cgi?id=6986

https://download1.rpmfusion.org/free/fedora/updates/40/x86_64/repoview/mesa-va-drivers-freeworld.html

1 Like