GPU hang: how could I investigate/fix this?

To close the loop here: after 3 days I can confidently say that disabling dynamic power management for amdgpu (amdgpu.dpm=0) solves the problem.

I have filed a bug with the kernel here: 2054948 – AMD Vega64 GPU Freeze when dynamic power management on (I still need to collect some info, such as trying it out on the latest rawhide kernel).

Thank you everyone for your help!

1 Like