Hello Fedora Community,
I am Testing Fedora Server 41 on Gigabyte MJ11-EC1 AMD EPYC 3151(also used) and separately bought used 2 x 16 GB 2400 MT/s ECC Ram and tested it also over 40 Hours with Memtest86+ from the Grub Boot Menu with 15 passes without an Error
This should be a small NAS with ZFS so I am not sure if I will be using Fedora for it so I am testing it just with an 275 GB Sata SSD.
The Problem is I am getting corrected ECC errors in the BMC and sometimes Fedora Server crashes Really fast sometimes it takes a bit longer with load or in idle no clue what is happening.
So tried setting mce=ignore_ce that was probably not good then I found mce=dont_log_ce
with that option is seemed to work longer but maybe I was just lucky today with the same option it crashed after 5 mins after boot.
So lucky I have set up a Kerne crash dump and I have one from the 29 of December because it is not crashing totally all the the time.
kexec-dmseg but I neded to cut out a lof of lines so could post it at all I hope it is enough
[Sun Dec 29 12:21:53 2024] Linux version 6.6.64-200.fc41.x86_64 (mockbuild@b5e5162930154f93adf20cf52866eea1) (gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3), GNU ld version 2.43.1-4.fc41) #1 SMP PREEMPT_DYNAMIC Mon Dec 9 14:52:02 UTC 2024
[Sun Dec 29 12:21:53 2024] Command line: elfcorehdr=0xaf000000 BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.6.64-200.fc41.x86_64 ro rootflags=subvol=root resume=UUID=f589f7dc-516d-49d7-b4a7-3f5f35e0ed3e rhgb irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0 pcie_ports=compat disable_cpu_apicid=0
[Sun Dec 29 12:21:53 2024] BIOS-provided physical RAM map:
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x0000000076db0000-0x0000000076ffffff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000af0e00b0-0x00000000ceffffff] usable
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000d55ae000-0x00000000d5605fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000d57a1000-0x00000000d57a1fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000d8bab000-0x00000000da073fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000da074000-0x00000000da08efff] ACPI data
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000da08f000-0x00000000da112fff] ACPI NVS
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000da113000-0x00000000da641fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000dc000000-0x00000000dfffffff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000efff0000-0x00000000efff0fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fedc0000-0x00000000fedc0fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fedc2000-0x00000000fedc5fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fedc7000-0x00000000fedc7fff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x00000000fedc9000-0x00000000fedcafff] reserved
[Sun Dec 29 12:21:53 2024] BIOS-e820: [mem 0x000000081f380000-0x000000081fffffff] reserved
[Sun Dec 29 12:21:53 2024] random: crng init done
[Sun Dec 29 12:21:53 2024] NX (Execute Disable) protection: active
[Sun Dec 29 12:21:53 2024] APIC: Static calls initialized
[Sun Dec 29 12:21:53 2024] e820: update [mem 0xceff8550-0xceff855f] usable ==> usable
[Sun Dec 29 12:21:53 2024] e820: update [mem 0xceff8530-0xceff854f] usable ==> usable
[Sun Dec 29 12:21:53 2024] e820: update [mem 0xceff84c0-0xceff852f] usable ==> usable
[Sun Dec 29 12:21:53 2024] extended physical RAM map:
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x0000000000000000-0x0000000000000fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x0000000000001000-0x000000000009ffff] usable
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x0000000076db0000-0x0000000076ffffff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000af0e00b0-0x00000000ceff84bf] usable
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000ceff84c0-0x00000000ceff855f] usable
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000ceff8560-0x00000000ceffffff] usable
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000d55ae000-0x00000000d5605fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000d57a1000-0x00000000d57a1fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000d8bab000-0x00000000da073fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000da074000-0x00000000da08efff] ACPI data
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000da08f000-0x00000000da112fff] ACPI NVS
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000da113000-0x00000000da641fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000dc000000-0x00000000dfffffff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000efff0000-0x00000000efff0fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fedc0000-0x00000000fedc0fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fedc2000-0x00000000fedc5fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fedc7000-0x00000000fedc7fff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x00000000fedc9000-0x00000000fedcafff] reserved
[Sun Dec 29 12:21:53 2024] reserve setup_data: [mem 0x000000081f380000-0x000000081fffffff] reserved
[Sun Dec 29 12:21:53 2024] efi: EFI v2.6 by American Megatrends
[Sun Dec 29 12:21:53 2024] efi: TPMFinalLog=0xda0cd000 ACPI 2.0=0xda095000 ACPI=0xda095000 SMBIOS=0xda4e9000 SMBIOS 3.0=0xda4e8000 ESRT=0xd57a1798 MEMATTR=0xd4d6e018 MOKvar=0xda50c000 TPMEventLog=0xda089018
[Sun Dec 29 12:21:53 2024] efi: Remove mem00: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map
[Sun Dec 29 12:21:53 2024] e820: remove [mem 0xff000000-0xffffffff] reserved
[Sun Dec 29 12:21:53 2024] efi: Remove mem01: MMIO range=[0xfee00000-0xfeefffff] (1MB) from e820 map
[Sun Dec 29 12:21:53 2024] e820: remove [mem 0xfee00000-0xfeefffff] reserved
[Sun Dec 29 12:21:53 2024] efi: Not removing mem02: MMIO range=[0xfedc9000-0xfedcafff] (8KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem03: MMIO range=[0xfedc7000-0xfedc7fff] (4KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem04: MMIO range=[0xfedc2000-0xfedc5fff] (16KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem05: MMIO range=[0xfedc0000-0xfedc0fff] (4KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem06: MMIO range=[0xfed80000-0xfed8ffff] (64KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem07: MMIO range=[0xfed40000-0xfed44fff] (20KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem08: MMIO range=[0xfed00000-0xfed00fff] (4KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem09: MMIO range=[0xfec10000-0xfec10fff] (4KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Not removing mem10: MMIO range=[0xfec00000-0xfec00fff] (4KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Remove mem11: MMIO range=[0xfea00000-0xfeafffff] (1MB) from e820 map
[Sun Dec 29 12:21:53 2024] e820: remove [mem 0xfea00000-0xfeafffff] reserved
[Sun Dec 29 12:21:53 2024] efi: Not removing mem12: MMIO range=[0xefff0000-0xefff0fff] (4KB) from e820 map
[Sun Dec 29 12:21:53 2024] efi: Remove mem13: MMIO range=[0xeff00000-0xeff7ffff] (0MB) from e820 map
[Sun Dec 29 12:21:53 2024] e820: remove [mem 0xeff00000-0xeff7ffff] reserved
[Sun Dec 29 12:21:53 2024] secureboot: Secure boot disabled
[Sun Dec 29 12:21:53 2024] SMBIOS 3.2.1 present.
[Sun Dec 29 12:21:53 2024] DMI: GIGABYTE G431-MM0-OT/MJ11-EC1-OT, BIOS F09 09/14/2021
[Sun Dec 29 12:21:53 2024] tsc: Fast TSC calibration using PIT
[Sun Dec 29 12:21:53 2024] tsc: Detected 2699.882 MHz processor
[Sun Dec 29 12:21:53 2024] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[Sun Dec 29 12:21:53 2024] e820: remove [mem 0x000a0000-0x000fffff] usable
[Sun Dec 29 12:21:53 2024] last_pfn = 0xcf000 max_arch_pfn = 0x400000000
[Sun Dec 29 12:21:53 2024] total RAM covered: 3520M
[Sun Dec 29 12:21:53 2024] Found optimal setting for mtrr clean up
[Sun Dec 29 12:21:53 2024] gran_size: 64K chunk_size: 1G num_reg: 3 lose cover RAM: 0G
this was with SMT disabled but it happens also with SMT enabled.
If I am not misaken this should do it
mce=dont_log_ce
by temporarly adding it with editing in Grub
because a corrected ECC error should not crash the system or?
I am running LTS Kernel 6.6 as you can see luckly because it chrashed ones during a Kernel 6.12 update and since then I am getting Kernel panic for Kernel 6.12 maybe I shuld fix ALSO that but I do now know how and more 6.12 updates did not fix that.
I can post more infos after makig the post because it seems I am limited as a new user
Kind regards Kurogane1412