Hardware errors

,

I have been getting the following hardware errors. I asked GPT-OSS, which worried me even more. It said it could be hardware fault with the CPU, RAM, or chipset! Given the current hardware prices, needing to replace hardware sounds scary. Can someone help me understand the errors and figure out what is actually wrong?

Message from syslogd@systemname at May 23 21:17:25 ...
 kernel:[Hardware Error]: System Fatal error.

Message from syslogd@systemname at May 23 21:17:25 ...
 kernel:[Hardware Error]: CPU:1 (19:21:2) MC12_STATUS[Over|UE|MiscV|AddrV|PCC|SyndV|UECC|Deferred|Poison|Scrub]: 0xffffffffffffffff

Message from syslogd@systemname at May 23 21:17:25 ...
 kernel:[Hardware Error]: Error Addr: 0x0000000000000000

Message from syslogd@systemname at May 23 21:17:25 ...
 kernel:[Hardware Error]: IPID: 0x0000000000000000, Syndrome: 0x0000000000000000

Message from syslogd@systemname at May 23 21:17:25 ...
 kernel:[Hardware Error]: Bank 12 is reserved.

Message from syslogd@systemname at May 23 21:17:25 ...
 kernel:[Hardware Error]: cache level: L3/GEN, tx: RESV

Message from syslogd@systemname at May 26 08:10:54 ...
 kernel:[Hardware Error]: System Fatal error.

Message from syslogd@systemname at May 26 08:10:54 ...
 kernel:[Hardware Error]: CPU:1 (19:21:2) MC13_STATUS[Over|UE|MiscV|AddrV|PCC|SyndV|UECC|Deferred|Poison|Scrub]: 0xffffffffa6b2950b

Message from syslogd@systemname at May 26 08:10:54 ...
 kernel:[Hardware Error]: Error Addr: 0x0000000000000000

Message from syslogd@systemname at May 26 08:10:54 ...
 kernel:[Hardware Error]: IPID: 0x0000000000000000, Syndrome: 0x0000000000000000

Message from syslogd@systemname at May 26 08:10:54 ...
 kernel:[Hardware Error]: Bank 13 is reserved.

Message from syslogd@systemname at May 26 08:10:54 ...
 kernel:[Hardware Error]: cache level: L3/GEN, tx: GEN

Message from syslogd@systemname at May 26 18:38:57 ...
 kernel:[Hardware Error]: System Fatal error.

Message from syslogd@systemname at May 26 18:38:57 ...
 kernel:[Hardware Error]: CPU:1 (19:21:2) MC22_STATUS[Over|UE|MiscV|AddrV|PCC|SyndV|UECC|Deferred|Poison|Scrub]: 0xffffffffa4e13ca0

Message from syslogd@systemname at May 26 18:38:57 ...
 kernel:[Hardware Error]: Error Addr: 0x0000000000000000

Message from syslogd@systemname at May 26 18:38:57 ...
 kernel:[Hardware Error]: IPID: 0x0000000000000000, Syndrome: 0x0000000000000000

Message from syslogd@systemname at May 26 18:38:57 ...
 kernel:[Hardware Error]: Bank 22 is reserved.

Message from syslogd@systemname at May 26 18:38:57 ...
 kernel:[Hardware Error]: cache level: RESV, tx: INSN

Message from syslogd@systemname at Jun  3 23:28:58 ...
 kernel:[Hardware Error]: System Fatal error.

Message from syslogd@systemname at Jun  3 23:28:58 ...
 kernel:[Hardware Error]: CPU:1 (19:21:2) MC19_STATUS[Over|UE|MiscV|AddrV|PCC|SyndV|UECC|Deferred|Poison|Scrub]: 0xffffffff88d883e0

Message from syslogd@systemname at Jun  3 23:28:58 ...
 kernel:[Hardware Error]: Error Addr: 0x0000000000000000

Message from syslogd@systemname at Jun  3 23:28:58 ...
 kernel:[Hardware Error]: IPID: 0x0000000000000000, Syndrome: 0x0000000000000000

Message from syslogd@systemname at Jun  3 23:28:58 ...
 kernel:[Hardware Error]: Bank 19 is reserved.

Message from syslogd@systemname at Jun  3 23:28:58 ...
 kernel:[Hardware Error]: cache level: RESV, tx: INSN

The CPU is brand new (~2 months), everything else is a few yrs old. I’ve never had any kind of hardware/stability issues. My uptime is also generally pretty high.

My system info:

# inxi -MmCG
Machine:
  Type: Desktop System: ASUS product: N/A v: N/A serial: N/A
  Mobo: ASUSTeK model: TUF GAMING B550M-PLUS (WI-FI) v: Rev X.0x
    serial: XXXXXXXX Firmware: UEFI vendor: American Megatrends v: 2806
    date: 10/27/2022
Memory:
  System RAM: total: 32 GiB available: 31.22 GiB used: 22.52 GiB (72.1%)
  Array-1: capacity: 128 GiB slots: 4 modules: 4 EC: None
  Device-1: DIMM_A1 type: DDR4 size: 8 GiB speed: 3200 MT/s
  Device-2: DIMM_A2 type: DDR4 size: 8 GiB speed: 3200 MT/s
  Device-3: DIMM_B1 type: DDR4 size: 8 GiB speed: 3200 MT/s
  Device-4: DIMM_B2 type: DDR4 size: 8 GiB speed: 3200 MT/s
CPU:
  Info: 8-core model: AMD Ryzen 7 5800X bits: 64 type: MT MCP cache: L2: 4 MiB
  Speed (MHz): avg: 3882 min/max: 556/4854 cores: 1: 3882 2: 3882 3: 3882
    4: 3882 5: 3882 6: 3882 7: 3882 8: 3882 9: 3882 10: 3882 11: 3882 12: 3882
    13: 3882 14: 3882 15: 3882 16: 3882
Graphics:
  Device-1: Intel DG2 [Arc A750] driver: i915 v: kernel
  Display: x11 server: X.Org v: 21.1.22 with: Xwayland v: 24.1.11 driver: X:
    loaded: modesetting dri: iris gpu: i915 resolution: 1: 1920x1080~60Hz
    2: 2560x1440~60Hz
  API: OpenGL v: 4.6 vendor: intel mesa v: 26.0.6 renderer: Mesa Intel Arc
    A750 Graphics (DG2)
  API: Vulkan v: 1.4.341 drivers: intel,llvmpipe surfaces: N/A
  API: EGL Message: EGL data requires eglinfo. Check --recommends.
  Info: Tools: api: glxinfo,vulkaninfo de: xfce4-display-settings
    gpu: corectrl, gputop, intel_gpu_top, lsgpu x11: xdriinfo, xdpyinfo, xprop,
    xrandr

I don’t know what those errors mean, but the first thing I’d probably do is update your system BIOS to the latest available and then run a memory/RAM tester such as memtest86+ to check your RAM. I

I would also recommend running some form of memtest, it might also be worth checking if it’s any more stable with fewer DIMMs and/or lower RAM clocks. If the CPU is the only new factor though, it could just be a faulty CPU and it might be worth exploring that possibility with AMD.

I got this response from another LLM (Kimi K2.5).

[…] In plain terms, the CPU’s built-in hardware error-detection logic logged a fatal error in one of its internal “banks” (Bank 9). However, the signature of this error is extremely suspicious, and it is far more likely to be a firmware glitch / phantom error than a real hardware failure.
[…]
Look at this line:
MC9_STATUS[…]: 0xffffffffffffffff
A status register of all Fs (111111… in binary) means every single error flag is set simultaneously,
[…]
then look at the next lines:

  • Error Addr: 0x0000000000000000
  • IPID: 0x0000000000000000
  • Syndrome: 0x0000000000000000

If AddrV and SyndV were truly valid, the address and syndrome should contain real data, not all zeros. Real hardware errors almost never produce all flags set to 1 with all data registers zeroed out. This pattern is the hallmark of reading a reserved/unimplemented bank or receiving garbage data from the firmware (AGESA/SMU).

The kernel confirms this by adding: Bank 9 is reserved. On your Zen 3 CPU, the kernel does not have a mapping for Bank 9, and the fact that it returns all Fs strongly suggests the register was not populated with a real error.

The reasoning above makes sense to me. I think I’ll follow what @digitalman suggested, and see if that resolves the issue. Since I don’t really have a way to reproduce the errors, I guess I’ll have to wait about a week or so to be certain the BIOS update addressed the issue.