I just upgraded my main PC, and nearly everything is crashing. Mainly steam, steam games, firefox, discord (vesktop client), and xwayland are not happy. Since all of these things were working before the upgrade, i figure its something to do with the hardware. Does anyone know what the heck i should be doing?
There doesn’t seem to be any rhyme or reason to what causes a crash, and nothing is showing up in dmesg that would give me pause.
Specs:
CPU - Ryzen 7 9700x
GPU - Intel Arc A770 16GB
Mobo - Asrock B650 Pro RS
RAM - G.Skill Flare X5 64GB
Boot drive - WD Blue SATA M.2 1TB
Hardware upgrades risk a) defective hardware, and b) incompatible BIOS and linux configuration settings. Some vendors provide standalone bootable hardware test systems.
The Fedora Live USB installer provides a memory test.
Have you tried booting a recent (better chance that it supports new hardware) USB installer?
You can try using the grub2 editor to add <space>3 to the end of the kernel command line to boot to a text console. Then use journalctl --no-hostname -b -1 p <N> to look for error messages. I usually start with N=3. If I don’t find the error I increase N to get lower “priority” errors. There are many other journalctl options to filter out the key records from the mass of data it collects – see man journalctl. You can also run inxi -Fzxx and post the output as pre-formatted text in case someone recognizes a hardware-specific issue.
I ended up just reinstalling with a fresh F41 install. Same problems. The RAM test is a great idea, so I’ll do that.
I also had no idea that journalctl could filter error messages like that, neat!
I’ll try this when i get a chance.
edit:
There is some good known working RAM in my mother’s PC, so I will try it in mine when I get the chance. It will be faster than doing a memory scan, but I’ll probably still do one just to be thorough.
I’d recommend mem-testing from Windows (more tools available); HCI’s memtest showed errors overnight for me when memtest86 didn’t, and I had Vulkan crashes with AMD GPUs whereas DX11 stuff was fine with the bad memory config.
In my case I had to lower memory speed (2700X didn’t like 4 sticks at their rated 3666 but fine at 3200 and upped 1.5V).
At the very least, I installed F41 Workstation last night, and fully-updated I haven’t seen anything crash yet with Intel UHD 630.
Not saying it is your problem, but I would use the autoconfigured RAM speeds as stated in How to Configure and Overclock your RAM in the BIO... - AMD Community .
As I understand it there is a relation between RAM speed and CPU cycles.
That guide also recommends putting the RAM in the two furthest slots.
That looks like a hardware problem, there’s too many unrelated softwares crashing to be likely a software issue.
You might want to hold back the core dumps a bit, a lot of those core files are huge. I make an “drop-in” file path like this at /etc/systemd/coredump.conf.d/override.conf
and put in that override.conf file
[Coredump]
#Storage=external
#Compress=yes
# On 32-bit, the default is 1G instead of 32G.
#ProcessSizeMax=32G
#ExternalSizeMax=32G
#JournalSizeMax=767M
MaxUse=100M
#KeepFree=
#EnterNamespace=no
That MaxUse=100M keeps the total disk usage to core files to 100M or less. You can see “man coredump.conf” and “man coredump.conf.d” for more information, like what MaxUse and KeepFree do. Pick whatever “MaxUse” size you want, it would remove old core files as “MaxUse” is exceeded.
Thanks, this helps a lot. I’m going to run a SMART self test on my boot SSD, since I have a sneaking suspicion that that might be part of it.
I appreciate all the help so far
edit: CPU appears to be fine, and not causing stability problems. Ran a prime number benchmark on all cores and it didn’t bat an eye. I’m still thinking storage, since I also ran a RAM test with the memtester package and it tested good.
I burned so many times with Btrfs in the past that I decided to revisit it in the next 10 years or so once I can trust my data to it.
Since moving on from Universal Blue to NixOS a year ago I opted to use the trusty ext4 filesystem and I don’t regret my decision a bit.
There’s a nasty bug with kernel 6.13 and the Intel drivers (a regression fixed in 6.14) that completely locked my laptop a couple of times and I had to force hard shutdowns. Upon booting back, ext4 was able to self-heal the two times, using the journaling data.
I don’t doubt Btrfs is an advanced filesystem with tons of fancy features, but in my personal experience it always has been unreliable. I believe it makes sense to use it when implementing a snapshots/rollback system like openSUSE has, but otherwise I learned to stay as far as possible from it.
DYOR, of course, but I’d recommend you consider using anything else than Btrfs next time you install your system if you will not use its advanced features!