Fedora 37 random restart

I was running some CPU and RAM intensive tasks overnight and woke up to find that the system had restarted for some unknown reason in the middle of these tasks. I do this sort of thing all the time and have never experienced this. I looked at system logs of the previous boot by running sudo journalctl --boot=-1 and I don’t see anything notable to indicate what could have caused it. The output is too large to post here but the shortened version of the last two hours or so before the restart is like this:

May 03 04:12:27 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 04:12:39 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 04:12:39 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 04:12:50 joelsdesktop systemd[1]: Starting dnf-makecache.service - dnf makecache...
May 03 04:12:51 joelsdesktop dnf[4087737]: Copr repo for PyCharm owned by phracek          5.8 kB/s | 2.1 kB     00:00
May 03 04:12:51 joelsdesktop dnf[4087737]: Fedora 37 - x86_64                               46 kB/s |  25 kB     00:00
May 03 04:12:52 joelsdesktop dnf[4087737]: Fedora 37 openh264 (From Cisco) - x86_64        2.2 kB/s | 989  B     00:00
May 03 04:12:52 joelsdesktop dnf[4087737]: Fedora Modular 37 - x86_64                       58 kB/s |  25 kB     00:00
May 03 04:12:53 joelsdesktop dnf[4087737]: Fedora 37 - x86_64 - Updates                     27 kB/s |  23 kB     00:00
May 03 04:12:55 joelsdesktop dnf[4087737]: Fedora 37 - x86_64 - Updates                    315 kB/s | 515 kB     00:01
May 03 04:12:56 joelsdesktop dnf[4087737]: Fedora Modular 37 - x86_64 - Updates             55 kB/s |  24 kB     00:00
May 03 04:12:56 joelsdesktop dnf[4087737]: google-chrome                                   3.8 kB/s | 1.3 kB     00:00
May 03 04:12:57 joelsdesktop dnf[4087737]: RPM Fusion for Fedora 37 - Free                 3.5 kB/s | 3.4 kB     00:00
May 03 04:12:57 joelsdesktop dnf[4087737]: RPM Fusion for Fedora 37 - Free - Updates       7.2 kB/s | 3.2 kB     00:00
May 03 04:12:58 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 04:12:58 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 04:12:58 joelsdesktop dnf[4087737]: RPM Fusion for Fedora 37 - Nonfree               15 kB/s | 6.6 kB     00:00
May 03 04:12:59 joelsdesktop dnf[4087737]: RPM Fusion for Fedora 37 - Nonfree - NVIDIA Dri  10 kB/s | 6.3 kB     00:00
May 03 04:12:59 joelsdesktop dnf[4087737]: RPM Fusion for Fedora 37 - Nonfree - Steam       11 kB/s | 6.1 kB     00:00
May 03 04:13:00 joelsdesktop dnf[4087737]: RPM Fusion for Fedora 37 - Nonfree - Updates     11 kB/s | 6.1 kB     00:00
May 03 04:13:00 joelsdesktop dnf[4087737]: Visual Studio Code                              2.8 kB/s | 1.5 kB     00:00
May 03 04:13:01 joelsdesktop dnf[4087737]: Metadata cache created.
May 03 04:13:02 joelsdesktop systemd[1]: dnf-makecache.service: Deactivated successfully.
May 03 04:13:02 joelsdesktop systemd[1]: Finished dnf-makecache.service - dnf makecache.
May 03 04:13:02 joelsdesktop audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-makecache comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 03 04:13:02 joelsdesktop audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-makecache comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 03 04:13:02 joelsdesktop systemd[1]: dnf-makecache.service: Consumed 3.160s CPU time.
May 03 04:13:12 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 04:13:12 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 04:13:28 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...

...

May 03 04:41:24 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 04:41:36 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 04:41:36 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 04:41:42 joelsdesktop cupsd[1591]: REQUEST localhost - - "POST / HTTP/1.1" 200 183 Renew-Subscription successful-ok
May 03 04:41:48 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 04:41:48 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 04:41:59 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...

...

May 03 05:16:25 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 05:16:35 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 05:16:36 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 05:16:46 joelsdesktop systemd[1]: Starting dnf-makecache.service - dnf makecache...
May 03 05:16:47 joelsdesktop dnf[93493]: Metadata cache refreshed recently.
May 03 05:16:47 joelsdesktop systemd[1]: dnf-makecache.service: Deactivated successfully.
May 03 05:16:47 joelsdesktop systemd[1]: Finished dnf-makecache.service - dnf makecache.
May 03 05:16:47 joelsdesktop audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-makecache comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 03 05:16:47 joelsdesktop audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-makecache comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 03 05:16:49 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 05:16:49 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 05:17:01 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 05:17:01 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 05:17:13 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...

...

May 03 05:39:59 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 05:39:59 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 05:40:02 joelsdesktop cupsd[1591]: REQUEST localhost - - "POST / HTTP/1.1" 200 183 Renew-Subscription successful-ok
May 03 05:40:11 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 05:40:11 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 05:40:27 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...

...

May 03 06:03:54 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 06:03:54 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.
May 03 06:04:07 joelsdesktop systemd[2191]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 03 06:04:07 joelsdesktop systemd[2191]: Started tracker-extract-3.service - Tracker metadata extractor.

where I have ommitted al lot of the redundant “starting tracker” lines to keep within the character limit. This is the last line before the restart.

Here is hardware info:

System:
  Kernel: 6.2.8-200.fc37.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.38-25.fc37 Desktop: GNOME v: 43.4 Distro: Fedora release 37 (Thirty
    Seven)
Machine:
  Type: Desktop System: Gigabyte product: X670E AORUS MASTER v: -CF
    serial: <superuser required>
  Mobo: Gigabyte model: X670E AORUS MASTER v: x.x
    serial: <superuser required> UEFI: American Megatrends LLC. v: F6
    date: 08/24/2022
Battery:
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse MX Master 3
    charge: 100% (should be ignored) status: discharging
CPU:
  Info: 16-core model: AMD Ryzen 9 7950X bits: 64 type: MT MCP arch: Zen 4
    rev: 2 cache: L1: 1024 KiB L2: 16 MiB L3: 64 MiB
  Speed (MHz): avg: 2989 high: 3000 min/max: 3000/5880 boost: enabled cores:
    1: 3000 2: 3000 3: 3000 4: 3000 5: 3000 6: 3000 7: 3000 8: 3000 9: 3000
    10: 3000 11: 3000 12: 3000 13: 3000 14: 3000 15: 3000 16: 3000 17: 3000
    18: 3000 19: 2651 20: 3000 21: 3000 22: 3000 23: 3000 24: 3000 25: 3000
    26: 3000 27: 3000 28: 3000 29: 3000 30: 3000 31: 3000 32: 3000
    bogomips: 287998
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: NVIDIA GA104 [GeForce RTX 3060 Ti] driver: nvidia v: 530.41.03
    arch: Ampere bus-ID: 01:00.0
  Device-2: AMD Raphael vendor: Gigabyte driver: amdgpu v: kernel
    arch: RDNA-2 bus-ID: 37:00.0 temp: 32.0 C
  Display: wayland server: X.Org v: 22.1.9 with: Xwayland v: 22.1.9
    compositor: gnome-shell driver: X: loaded: modesetting,nouveau,nvidia
    unloaded: fbdev,vesa gpu: nvidia,nvidia-nvswitch
    resolution: 3840x2160~60Hz
  API: OpenGL v: 4.6.0 NVIDIA 530.41.03 renderer: NVIDIA GeForce RTX 3060
    Ti/PCIe/SSE2 direct-render: Yes
Audio:
  Device-1: NVIDIA GA104 High Definition Audio driver: snd_hda_intel v: kernel
    bus-ID: 01:00.1
  Device-2: AMD Rembrandt Radeon High Definition Audio driver: snd_hda_intel
    v: kernel bus-ID: 37:00.1
  Device-3: AMD Family 17h/19h HD Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel bus-ID: 37:00.6
  Sound API: ALSA v: k6.2.8-200.fc37.x86_64 running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.67 running: yes
Network:
  Device-1: Intel Ethernet I225-V vendor: Gigabyte driver: igc v: kernel
    port: N/A bus-ID: 0c:00.0
  IF: enp12s0 state: down mac: <filter>
  Device-2: Intel Wi-Fi 6 AX210/AX211/AX411 160MHz driver: iwlwifi v: kernel
    bus-ID: 0d:00.0
  IF: wlp13s0 state: up mac: <filter>
Bluetooth:
  Device-1: Intel AX210 Bluetooth type: USB driver: btusb v: 0.8 bus-ID: 1-9:2
  Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends
Drives:
  Local Storage: total: 20.01 TiB used: 863.73 GiB (4.2%)
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 980 PRO 2TB size: 1.82 TiB
    temp: 36.9 C
  ID-2: /dev/nvme1n1 vendor: Samsung model: SSD 980 PRO 2TB size: 1.82 TiB
    temp: 37.9 C
  ID-3: /dev/sda vendor: Seagate model: ST8000DM004-2U9188 size: 7.28 TiB
  ID-4: /dev/sdb vendor: SanDisk model: ST8000DM004-2CX188 size: 7.28 TiB
  ID-5: /dev/sdc type: USB vendor: Samsung model: Portable SSD T5
    size: 1.82 TiB
Partition:
  ID-1: / size: 1.82 TiB used: 300.55 GiB (16.1%) fs: btrfs
    dev: /dev/nvme1n1p3
  ID-2: /boot size: 973.4 MiB used: 366.8 MiB (37.7%) fs: ext4
    dev: /dev/nvme1n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 17.4 MiB (2.9%) fs: vfat
    dev: /dev/nvme1n1p1
  ID-4: /home size: 1.82 TiB used: 300.55 GiB (16.1%) fs: btrfs
    dev: /dev/nvme1n1p3
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 42.0 C mobo: N/A gpu: amdgpu temp: 32.0 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 545 Uptime: 4h 30m Memory: 124.94 GiB used: 8.02 GiB (6.4%)
  Init: systemd target: graphical (5) Compilers: gcc: 12.2.1 Packages: 25
  note: see --rpm Shell: Bash v: 5.2.15 inxi: 3.3.25

Would a memory error have triggered a restart or would it have just shut down? Would that output anything notable in the logs? I monitor temperatures all the time and they are always fine. It’s a desktop so it would not have been a battery issue.

A sudden restart with no logged messages whatsoever is most likely a hardware issue.

How would I even begin to diagnose which component could be at fault? When I first set up the system last October, I ran memtest 86 for several days and zero errors popped up, so I think the memory is fine. (Memory always seems to be the most finnicky component.) The only difference is last night I left an external SSD plugged in, which I don’t usually do.

memtest86 and prime95 (it has a linux version) are popular testing tools. I never used anything outside of that, I think, so can’t really recommend anything further.

1 Like

Thanks. This is the first time I have experienced this, so I guess worst case scenario if it is memory-related, one memory error in 7-8 months is not terrible, but a minor annoyance. My guess is that memory errors this rare would not show up on a test such as memtest86 running for a couple days at most. If this becomes a frequent thing then that is a bit worrisome. I have down clocked the memory (strictly speaking, the memory is only supported up to 3600 MHz but I had it slightly overclocked at 4000MHz) just to reduce the chances of this happening.

It may have been a power fluctuation, enough to cause the restart if the bios is configured to restart on a power loss. Probably would be no log entries related.

That could be, although I have a second machine plugged into the same outlet and nothing happened with it, but it could still be power-supply related.

Did you see anything related to

systemd-oomd 

?
Although difficult to image that killing processes leads to restarting the machine.

No, the only really noteworthy thing that I can see is that a couple of hours before the restart, some packages were automatically updated.

The Zen4 platform with four dual-rank DIMMs installed can be tricky, and generally memory errors can surface weeks or months later on seemingly stable configurations when hit with a unique load scenario.

Updating the EFI to the latest version (really important) and slightly increasing the SoC voltage (within safe margins, <= 1.30V) could help alleviate instabilities.

I recommend testing with y-cruncher, not memtest86. memtest86 is fine when testing for defective RAM modules, but it tends to not reliably uncover system instabilities caused by overclocked RAM in a timely manner. Where memtest86 can run without errors for days, y-cruncher usually uncovers them within a few hours.

For the 7950x and 128 GiB RAM, this configuration should work fine:

Save as stresstest.cfg:

{
    Action : "StressTest"
    StressTest : {
        AllocateLocally : "true"
        LogicalCores : [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
        TotalMemory : 115964116992
        SecondsPerTest : 200
        SecondsTotal : 0
        StopOnError : "true"
        Tests : [
            "N32"
            "N64"
            "HNT"
            "VST"
            "C17"
        ]
    }
}

Run with:

./y-cruncher pause:1 config stresstest.cfg

Thanks for the suggestion. I am currently running the stress test. I would certainly hope it would not have trouble with this, seeing as currently it is not overclocked. (speed is currently at 3600 MHz and timings are set to default.)

1 Like

So far it has passed 35 iterations of all the tests (10.15 hours). Will it just keep going forever until I stop it? How long would you recommend running it for? I notice there are other tests not being run. Is there a reason for not running these or should I run those as well?

After about 13 hours I had to stop the test and restart the machine. The CPU load was so intense that all I/O became unusable. (Extreme lag for ~10 hrs, then froze up completely after this.) I am hoping this is sufficient.

This seems to indicate that the stress test found a weakness. Whether due to temperature or something else you do not say, but I hope you were monitoring temps while the test was in progress.

So far it has passed 35 iterations of all the tests (10.15 hours). Will it just keep going forever until I stop it? How long would you recommend running it for?

I personally would call 35 iterations to be sufficient. If you do not stop the test, it would proceed until something went wrong, either detecting a memory error and exiting with a warning or the machine crashing.

I notice there are other tests not being run. Is there a reason for not running these or should I run those as well?

The other tests are not as demanding for testing RAM and would waste more time and energy, but you can of course run them as well.

After about 13 hours I had to stop the test and restart the machine. The CPU load was so intense that all I/O became unusable. (Extreme lag for ~10 hrs, then froze up completely after this.) I am hoping this is sufficient.

A slow machine is to be expected when running the test since it loads all CPU cores and most of your RAM, but a freeze should not have happened. If there was a thermal problem, it should have occurred much earlier since thermally saturating the hardware should not take 13 hours.

On Zen4, usually when there is an unstable memory configuration, the machine just resets and does not freeze, or the errors go undetected when not using ECC memory, and some bits get flipped unless you are specifically looking for wrong data, like y-cruncher does.

You could do another test with some cores excluded, and less RAM utilized to increase system responsiveness during testing. If you want to go that route, just modify the LogicalCores array and remove some (maybe up to 8) cores, and reduce TotalMemory from the stresstest.cfg so the system has some more free RAM to work with. Monitoring system temperatures might also be advisable.

What I was able to observe in htop was that the CPU cores were not just loaded, they were extremely overloaded. It looked like it was constantly trying to run 60-70 processes simultaneously on 32 logical cores. My guess is that a sustained load like this will eventually cause extreme scheduling issues such that after a while basic functionality like bluetooth and wifi starts getting pushed out and stops working. For example, the wifi would continually disconnect and reconnect.

I can also say that the thermals are as expected for this chip. Unfortunately AMD designed them to run a bit hot, but even under full load the highest I saw was about 88 C (typically 70s).

I should clarify that what I mean by freeze is that the machine goes to the Lock Screen and bluetooth and usb devices are unresponsive. The test could very well be continuing to run behind the lock screen, but I can’t say. I’m locked out of the machine since bluetooth stops working. The time on the Lock Screen was stuck at 6:13pm for a couple of hours as well, so clearly the clock was not working properly as well. Here is the system log output for the duration of the test:

 6:07:30 PM kernel: System encountered a non-fatal error in __audit_sockaddr()
 6:07:30 PM kernel: System encountered a non-fatal error in __audit_sockaddr()
 6:07:27 PM kernel: System encountered a non-fatal error in btrfs_alloc_delayed_item()
 6:07:23 PM kernel: System encountered a non-fatal error in __audit_sockaddr()
 6:07:06 PM kernel: logitech-hidpp-device 0005:046D:B023.0021: Device not connected
 5:38:48 PM systemd: Failed to start dnf-makecache.service - dnf makecache.
 5:31:36 PM kernel: logitech-hidpp-device 0005:046D:B023.0020: Device not connected
 5:18:49 PM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
 5:16:18 PM kernel: logitech-hidpp-device 0005:046D:B023.001E: Device not connected
 4:47:35 PM gdm-session-wor: gkr-pam: the password for the login keyring was invalid.
 4:45:50 PM kernel: logitech-hidpp-device 0005:046D:B023.001C: Device not connected
 4:03:13 PM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
 4:01:30 PM kernel: logitech-hidpp-device 0005:046D:B023.001A: Device not connected
 3:35:45 PM gdm-session-wor: gkr-pam: the password for the login keyring was invalid.
 3:35:00 PM kernel: logitech-hidpp-device 0005:046D:B023.0018: Device not connected
 3:28:41 PM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
 3:18:00 PM kernel: logitech-hidpp-device 0005:046D:B023.0016: Device not connected
 2:47:52 PM kernel: Bluetooth: hci0: ACL packet for unknown connection handle 3586
 2:07:59 PM kernel: logitech-hidpp-device 0005:046D:B023.0012: Device not connected
 1:36:32 PM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
 1:35:47 PM systemd: Failed to start dnf-makecache.service - dnf makecache.
 1:35:42 PM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
12:37:12 PM kernel: Bluetooth: hci0: ACL packet for unknown connection handle 3585
12:27:59 PM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
12:27:16 PM kernel: logitech-hidpp-device 0005:046D:B023.000D: Device not connected
12:27:03 PM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
12:23:20 PM kernel: logitech-hidpp-device 0005:046D:B023.000C: Device not connected
11:20:56 AM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
 9:38:17 AM systemd: Failed to start dnf-makecache.service - dnf makecache.
 9:12:09 AM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
 8:23:59 AM bluetoothd: profiles/audio/avctp.c:avctp_connect_cb() HUP or ERR on socket: Connection timed out (110)
 8:12:51 AM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...
 8:10:51 AM gdm-session-wor: gkr-pam: the password for the login keyring was invalid.
 7:07:15 AM kernel: iwlwifi 0000:0d:00.0: Not associated and the session protection is over already...

Is there a way to set a maximum number of iterations? Or alternatively I suppose I could have the results output to a log file. But if it can’t even maintain bluetooth or wifi, I’m not sure how it would manage writing to an SSD.

It looked like it was constantly trying to run 60-70 processes simultaneously on 32 logical cores.

It does the same on my system, though I never experienced that kind of scheduling issues you are describing. But I am not using bluetooth peripherals, and I am running with CPUSchedulingPolicy=idle.

Something like this:

systemd-run --user --pty --nice=19 --property=CPUSchedulingPolicy=idle ./y-cruncher [...] 

[…] the highest I saw was about 88 C (typically 70s)

That seems fine for that CPU, yes.

Is there a way to set a maximum number of iterations?

Not that I know of. Just set the maximum runtime by setting SecondsTotal to a sane value and maybe reduce the LogicalCores to 16 values.

If you keep having trouble with y-cruncher after that, maybe switch tools or call it stable enough after 35 successful iterations if you don’t want to bump up the RAM speeds again.

Yes, I think I am going to call it stable for now. I’m not particularly interested in overlocking the RAM. I’d much rather have stability. If I experience any more crashes, I’ll do more testing. Thanks for your help!

This actually may not even be the cpu or ram.

I did an upgrade on my F37 system (workstation) 4 days ago during which kernel 6.2.14 was installed and just after the reboot began getting major crashes (some causing a reboot) and constant kernel oops. I wound up with ~14000 oops files in /var/spool/abrt in about 10 hours.
I don’t feel it was kernel related since the issue remained even when I rebooted with both the 6.2.13 and 6.2.12 kernels.

Never did find out what the cause was, but I did a new clean install of F38 and the problems stopped. Something that was upgraded with that transaction seemed the cause, but over 40 packages were updated at that time so did not try to identify it any further…

I am in the process now of reinstalling all the software I was using and hopefully it will not restart the problems.

I did try running y-cruncher and never was able to complete even one test since it gave repeated errors on either one or 2 different cpus and random ones each time…

Yeah, my computer just randomly rebooted as well. This happened an hour after finishing a 24 hr error-free y-cruncher test, so I don’t think it’s cpu or ram-related either. The thing is this is kind of difficult to reproduce because it often doesn’t happen for days at a time.