Hardware error, possily CPU issue?

Hello so I’m posting here to understand where I can start debugging this error, sometimes this happens randomly without any clear cause hopefully it’s not my CPU dying.

CPU is a Ryzen 5 3600. Here’s the error:

mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1755643535 SOCKET 0 APIC 8 microcode 8701034
mce: [Hardware Error]: TSC 0 ADDR 1ffffc0a01e46 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000000000108

Have you checked to see if the system is over heating?

Random crashes may be due to bad RAM or power as well as overheated CPU, and often first appear with the system is hot, so first step, as suggested, is:

If temperatures are OK, run the standalone memtest86+ overnight, preferably for a couple nights. Common causes of bad power include failed electrolytic capacitors (look for bulging tops) but may need a repair shop with test equipment. Some vendors provide diagnostic software.

I’ll check for temperatures then.

In the case of RAM I’ve had this problem with both old and new RAM installed recently so might be safe to say that’s not the cause.

If the CPU belongs to that family, try GitHub - DimitriFourny/MCE-Ryzen-Decoder: MCE Ryzen Decoder for AMD 17h (23) family .

python run.py 5 bea0000000000108
Bank: Execution Unit (EX)
Error: Watchdog Timeout error (WDT 0x0)

That seems to be the case, I’ll test over the next days if this fix works for me

Unfortunately that didn’t work, still getting some random crashes

After some investigation I’ve found more logs related that might help find the culprit

set 07 10:12:53 fedora kernel: ccp 0000:0b:00.1: enabling device (0000 -> 0002)
set 07 10:12:53 fedora kernel: ccp 0000:0b:00.1: ccp: unable to access the device: you might be running a broken BIOS.
set 07 10:12:53 fedora kernel: ccp 0000:0b:00.1: psp enabled 

and

set 07 10:12:53 fedora kernel: x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood event

These two shows up right before the machine check error

1 Like

IOMMU is already disabled, I’ll check if disabling virtualization solves it

1 Like

Some news, disabling virtualization does not affect anything, errors still showing up

So eventually after my computer stopped working I’ve decided to take it to a nearby hardware store, they discovered that my GPU is basically dead so in the end it wasn’t any of my suspicions.

Thanks to everyone who helped me with this.

I’ll open a new discussion for which GPU model to buy.

1 Like