AMDGPU page fault on kernel 6.17.9-300 – works on 6.17.8-300 (Fedora 43)

AMD GPU generates a page‑fault (GCVM_L2_PROTECTION_FAULT_STATUS) when running Ollama or ComfyUi on kernel 6.17.9. The same hardware works fine on kernel 6.17.8.

Environment

  • Distro: Fedora 43 Kde Plasma
  • Working kernel:kernel.x86_64 6.17.8-300.fc43
  • Problematic kernel: kernel.x86_64 6.17.9-300.fc43
  • Processor: AMD Ryzen™ AI Max+ 395 - Strix Halo
  • Graphics Model: Radeon 8060S Graphics
  • Affected applications:
    • ComfyUI python script (PID 9205) – same fault as below
    • ollama (PID 2053) – same fault as below

Steps to reproduce

  • Boot kernel: kernel.x86_64 6.17.9-300.fc43
  • Run any GPU‑using workload (e.g., ComfyUi or ollama serve).
  • Check journalctl – the same page‑fault entries appear.
dic 04 16:58:09 fedora kernel: [drm] pre_validate_dsc:1635 MST_DSC dsc precompute is not needed
dic 04 16:58:09 fedora kernel: snd_hda_intel 0000:c5:00.1: bound 0000:c5:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32775)
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:  Process ollama pid 2053 thread ollama pid 2058
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:   in page starting at address 0x00007f7bdc149000 from client 10
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          Faulty UTCL2 client ID: CPF (0x4)
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          MORE_FAULTS: 0x0
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          WALKER_ERROR: 0x1
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          MAPPING_ERROR: 0x1
dic 04 16:58:20 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          RW: 0x0

dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32801)
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:  Process python pid 9205 thread python pid 9205
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:   in page starting at address 0x00007fc602ce8000 from client 10
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          Faulty UTCL2 client ID: CPF (0x4)
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          MORE_FAULTS: 0x0
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          WALKER_ERROR: 0x1
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          MAPPING_ERROR: 0x1
dic 04 17:24:16 fedora kernel: amdgpu 0000:c5:00.0: amdgpu:          RW: 0x0

Can you help me? Sorry, I’m trying to learn.
Thank you

This sounds similar, but not necessarily identical:

To confirm:

  1. Do you have version 20251125 of the linux-firmware package?
  2. Does the bug disappear simply when you boot your system into kernel 6.17.8 without changing anything else?

Yes, is linux-firmware-20251125-1.fc43.noarch

Yes!

Thank you !

Interesting, so not necessarily the same as the linked issue, which needed a firmware downgrade. (That said, there’s no evidence that the person there tried reverting to kernel 6.17.8.)

If no one else has better ideas here, the best way forward might be to create a bug ticket in Red Hat Bugzilla, where Fedora bugs are tracked.

1 Like

I have the same issue. Persistent crash (GCVM_L2_PROTECTION_FAULT_STATUS) with 6.17.8-300 and no amount of parameter turning or environment variables would work around it.
I tried 6.18.0-65 but that had the same issue.

Downgrading to 6.17.8-300 was the only resolution after hours of troubleshooting.

/proc/cmdline:
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.17.8-300.fc43.x86_64 root=UUID=7f4809d2-a5f8-4dfb-9f56-85d6c80408e5 ro rootflags=subvol=root rhgb quiet iommu=soft amdgpu.gttsize=126976 ttm.pages_limit=32505856

linux-firmware-20251125-1.fc43

I create a bug ticket in Red Hat Bugzilla.

How can I keep kernel version 6.17.8-300.fc43.x86_64, despite updates?
I was thinking of using

sudo dnf versionlock add 6.17.8-300.fc43.x86_64

Is this okay? What do you recommend? Sorry, I’m just trying to learn :slightly_smiling_face:
Thank you !

versionlock doesn’t work with kernels. (Kernel packages - where you can have multiple versions installed at the same time - behave a bit differently from normal packages where only one version at a time is installed.) There’s some background in this thread.

However, you can rely on the fact that the kernel currently in use is never removed.

So imagine your running kernel is 6.17.8, your other installed kernels are 6.17.9 and 6.17.10, and the new kernel 6.17.11 has just become available.

If you ran an upgrade, then the system would remove 6.17.9 (not 6.17.8 which is the running kernel) and install 6.17.11.

Thank you very much.

Update the post for reference: I tried also 6.17.10-300 but that had the same issue.
Downgrading to 6.17.8-300 was the only resolution.

Hi folks,

I’ve also faced the issue too. Based on DRM issue analysis in freedesktop, these files seem to have its root cause.

  • gc_11_5_1_mes_2.bin
  • gc_11_5_0_mes_2.bin
  • gc_10_3_6_rlc.bin
  • smu_14_0_3.bin

Unfortunately, Fedora42,43 and even Rawhide does not update firmware RPMs yet, so I do following steps for a quick (and dirty :wink: ) workaround.

# git clone https://gitlab.com/kernel-firmware/linux-firmware
# cd linux-firmware/amdgpu
# cp gc_11_5_1_mes_2.bin gc_11_5_0_mes_2.bin gc_10_3_6_rlc.bin smu_14_0_3.bin   /lib/firmware/amdgpu
# sudo dracut --force

At least, my server works without any GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932.