How to not crash Fedoras Server 41 with corrected ECC errors

Sadly I can not afford that at the moment

I thought I made that clear at the beginning
But Linux comes to the rescue it seems so thanks for the Linux community doing awesome things

I tried putting the "Bad stick " to A1 with memetest=7 took a min longer to boot but after a short test there were no correctable errors in the BMC

then putting the “Good stick” to A0 together with the “Bad stick” in A1 with memtest=7 at grub for testing and I am not getting any errors on a short test

With some luck this can work hopefully longer.

The Original Idea was for the Board to use True NAS but I do not like the Ubuntu Base so I could stick with Fedora Server 41 and make a NAS for non Critical Data out of it and later when can I afford to buy something better for Critical Data that I do not want to lose

:+1+

If the ecc errors continue to occur I suggest removing the ‘bad’ dimm and just run with one 16GB dimm. For most systems, especially an nas server at home, that would be more than adequate RAM until the system becomes heavily loaded.

I have workstation on my daily driver and I see this with normal operations: gnome desktop and gui.

# free
               total        used        free      shared  buff/cache   available
Mem:        32775452     9206844     7056216      531328    17513776    23568608
Swap:        8388604           0     8388604

Only using about 9G of ram total and no swap at all. Last booted 6 days ago.

With one DIMM you could monitor usage and know that until the system is really loading down the ram it would be adequate. The ECC errors would not show up in most cases until that portion of ram on the ‘bad’ dimm was being accessed even if using 2 dimms.

Yes that would be more wise and I will probably do that.

Yesterday after setting memtest=7 and 6 hours stress test I just got one correctable ECC error so much better and today with memtest=17 that took 2,5 min longer to boot just got also one correctable ECC error in idle.

For now I do not have a case or drives so I will just be testing and maybe with memmap I can map out the bad sectors on the bad stick if can find out the addresses at all because if Memtest86+ from Grub can not find it with A1 and A0 but memtest=N Kernel parameter can but no idea if it is somewhere stored so I can reuse it for memmap.