Since just before the upgrade to F32, my system has been behaving oddly. It doesn’t just hang or crash, but after I’ve been using it for several days, and usually while I’m away, which is less than 50% of the time these days, parts of the system start to disappear.
The first thing I usually notice is that BOINC isn’t running. But, sometimes that’s not the first to, but usually. The next thing to fail is the lockscreen. Sometimes the montior will not wake, but when it does I can interact but unlock fails. Then I try to login via SSH. Sometimes I cannot, no reply, but sometimes I can or it authenticates but hangs starting the shell. It seems like many subsystems are still functioning, though; no disk errors are reported after reset and sometimes I can initia a shutdown by hitting the power button, although that has never finished successfully. SysReq sometimes will show a message on the console saying SysReq is disabled.
So, it’s just some things that fail, not the whole system. And what is really weird is that there are no errors recorded. That would make sense if it was a crash or reset, but it’s not. Some other messages, like CRON, are recorded, at least for a while, but the journal seems to be one of the first casualties, too.
I’ve tried replacing all the memory and GPU. I also have been upgrading kernels, trying testing kernels and even a copr with newer kernels from upstream. The problem persists. It seems a little like hardware, but also not. For one thing, my hardware is pretty new. Also, the hardware is no more stressed when the problem occurs; it runs fine for several days. That’s unusual for a hardware thing, not to say that I haven’t see things like that before.
I ran memtester for several loops, but I haven’t tried memtest86. Someone mentioned the memory controller, and I have all new memory modules, so it is not a particular spot of memory that is bad. Do you think memtester was a good test or should I still try memtest86?
Here is my boot log: https://paste.centos.org/view/afee842c