Today I was building OpenBLAS on my Fedora 31 machine that has I9 9900K CPU and received multiple system failure notifications. They all said “The kernel log indicates that hardware errors were detected. This is most likely not a software problem”. My kernel version is 5.5.7-200.fc31.x86_64.
I searched about this issue for a while and found that some guys said that this may be caused by overheating of CPU. So, I installed lm-sensors to detect my CPU core temperatures and found that when my CPU cores reached about 90 degrees Celsius during the OpenBLAS building process, I got the system failure notifications again. As such, I suspect that my CPU I9 9900K may have a problem of overheating on Fedora 31 in some cases. I further tried building OpenBLAS on a Fedora 31 machine with I7 CPU, and this problem disappeared.
I noticed that Fedora 32 will be shipping Intel’s thermal daemon (thermald) by default to improve the CPU thermal management capabilities. I would like to know how to install the thermald on Fedora 31 and will it lower the temperature of my CPU cores?
To add to the above questions - is the system overclocked and did you apply the thermal paste properly?
A well-manufactured well-cooled CPU shouldn’t be hitting thermal throttling temps even at sustained 100% load, no matter what OS and what program. I wouldn’t expect intels to start erroring out already at 90°C but that’s for sure a lifespan shortening temperature. You’ll want to change something in your setup if it’s ever getting that hot.
Regarding thermald, I don’t know what it does. The only things I can think of that can be done on the OS side are starting throttling at lower temps, and ramping up fan speeds faster (on modern mobos for high-end CPUs the BIOS usually handles fan speed curves).
Hmm. Scythe is a good manufacturer. I don’t have an i9 personally, I’m getting up to high 60s/low 70s at sustained 100% load with air cooling on a 8c/16t ryzen with the same TDP, but it could be that 9900K truly does run so hot that it needs liquid cooling, that sort of talk is the reason I passed up the i9 tbf
You could try x264 or Prime95 benchmark and see how your numbers and temps compare to other people on the internet. If you have any doubts about thermal paste, better repaste the CPU and reseat the heatsink, that would be the cheapest way out. You could also underclock if this stability problem is important enough for you