System restart while playing games. AMD IGPU

Hello! When playing games like Forza horizon 4 or Genshin Impact the system restarts after approximately 1 hour but it’s dynamic. Upon inspecting error logs I got the below errors:

 0.269526] thermal_sys: Registered thermal governor 'fair_share'
[    0.269526] thermal_sys: Registered thermal governor 'bang_bang'
[    0.269527] thermal_sys: Registered thermal governor 'step_wise'
[    0.269529] thermal_sys: Registered thermal governor 'user_space'
[    0.783730] RAS: Correctable Errors collector initialized.
[    7.119060] amdgpu 0000:09:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
[    7.210409] amdgpu 0000:09:00.0: amdgpu: [drm] Failed to setup vendor infoframe on connector HDMI-A-2: -22 
[    9.801351] MCE: In-kernel MCE decoding enabled.

My system is using AMD Ryzen 5 3400g, no Dedicated gpu and KDE Plasma spin. From online surfing and talking with chatbots it appears to be driver or firmware issue(but not sure). Why this type of error and restarts occurs? is it thermal issue or just software problem? Is there a solution to this? Thanks.

What kind of CPU temperatures are you getting?

I used to have the Ryzen 5 2400G and (on Windows) it got pretty hot when playing a game that taxed both the CPU and iGPU.

I didn’t check the temperature continuously. but like during idle in the game(genshin impact no playing just it’s running in the background) it stays 80 degree. I suspect it can go beyond that while playing.

Right, that does sound pretty hot if it’s already at 80 when it’s not fully loaded.

The spec says the max operating temperature is 95 degrees, so I wouldn’t be surprised if you were getting thermal shutdowns.

yeah. But can’t be sure if thermal is the main problem. I’ll do a test run with monitoring the temps. but what about this two?

[ 7.119060] amdgpu 0000:09:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0x7)

[ 7.210409] amdgpu 0000:09:00.0: amdgpu: [drm] Failed to setup vendor infoframe on connector HDMI-A-2: -22

Those look like logs from early boot, so I’m not sure they are necessarily relevant to a crash 1 hour later.

But I don’t have an AMD GPU anymore so I don’t know the detail of what they mean.

It would be interesting if journalctl showed anything relevant happening just before the crash.

these are the errors after the system rebooted itself(so from the initial boot done by me). I have went through journalctl after the crash and I can assure you that this line was present. I can even try again to see if it crashes again and if it logs this message. I’ll provide the update afterwards. but I surely need a better cooler right?:laughing: this apu is hot

I know :slight_smile: But what I mean is, it would be interesting to see if anything is logged right before the crash - rather than when the system reboots after the crash.

Right! The iGPU is not bad for an iGPU, but of course it has to share the same physical package with the CPU and it seems like you get some concentrated heat.

Thanks for you kind aid !! I will try again to see if I can get anything interesting. can you give me some clues on how those logs look like?

I would suggest:

  • After a crash and restart, sudo journalctl -b -1 to bring up the logs from the previous boot.
  • We’re interested in the last things that happened before crash, so just look at the last 50-100 lines to see if anything looks suspicious. Particularly anything that seems related to amdgpu or to some warning about temperatures.
1 Like

Got it. I’ll share if I find them. Thanks.

Sorry for being late. I’m now on a vacation. But before moving I did a test run and the temps went up to 85 in just story mode. It’s didn’t go beyond that and I didn’t face any restart. I played around 2 hours. Most of the times the restart happens in that timeline. So, you were right :+1:. There is a very high chance that the restart is caused by overheating or even just hitting the thermal threshold. It could also happen so fast that the system dosen’t get the time to log something about it. Thanks for your help :slight_smile:.