Game installs are corrupting BTRFS on Fedora 40

Hello there,

I originally opened an issue about this on GitHub here and have found a relevant post on this forum here (apologies for the necro).

I am having getting disk read errors and corrupt update files on steam when trying to download games. I’ve gone ahead and watched dmesg and sure enough I encountered plenty of BTRFS errors, seen here:

[ 2479.397510] BTRFS warning (device dm-0): csum failed root 256 ino 188739 off 147456 csum 0x355bd35c expected csum 0x36d69a3a mirror 1
[ 2479.397537] BTRFS error (device dm-0): bdev /dev/mapper/luks-ffffffffffffff errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 2479.438946] BTRFS warning (device dm-0): csum failed root 256 ino 188739 off 147456 csum 0xb9577995 expected csum 0x36d69a3a mirror 1
[ 2479.438968] BTRFS error (device dm-0): bdev /dev/mapper/luks-ffffffffffffff errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[ 2479.453308] BTRFS warning (device dm-0): csum failed root 256 ino 188739 off 147456 csum 0xb9577995 expected csum 0x36d69a3a mirror 1
[ 2479.453333] BTRFS error (device dm-0): bdev /dev/mapper/luks-ffffffffffffff errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[ 2488.515036] BTRFS warning (device dm-0): csum failed root 256 ino 188739 off 147456 csum 0xb9577995 expected csum 0x36d69a3a mirror 1
[ 2488.515052] BTRFS error (device dm-0): bdev /dev/mapper/luks-ffffffffffffff errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
[ 2488.515308] BTRFS warning (device dm-0): csum failed root 256 ino 188739 off 147456 csum 0xb9577995 expected csum 0x36d69a3a mirror 1
[ 2488.515323] BTRFS error (device dm-0): bdev /dev/mapper/luks-ffffffffffffff errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
[ 2488.515436] BTRFS warning (device dm-0): csum failed root 256 ino 188739 off 147456 csum 0xb9577995 expected csum 0x36d69a3a mirror 1
[ 2488.515446] BTRFS error (device dm-0): bdev /dev/mapper/luks-ffffffffffffff errs: wr 0, rd 0, flush 0, corrupt 6, gen 0

At first I thought this could be an issue with the drive (all SMART data came back fine), but I replaced it anyway. I grabbed a new fresh install of Fedora 40 and low and behold, the issue persisted. I searched around the internet and this seems to be a bit of a rare issue, but there are other complaints out there without solution.

What I’m trying to accomplish here is the proper triage so that I can pass along a wealth of information and be available to test to wherever this gets fixed. I am not sure if I should be looking into opening a bugzilla ticket (for which component exactly?), or if I this would be solely isolated to steam and something with that process in particular (already opened a GitHub issue). Any help would be appreciated, thank you.

1 Like

You have replace the disk and still see an issue so would still suspect a hardware issue.

Have you tested your RAM?

2 Likes

Do you get similar errors if you reinstall Fedora on the new disk with a different filesystem, such as ext4?

1 Like

Hello @jorp ,
You could try taking your original disk and create a new fedora install without LUKS encryption and see if there is something Steams packaging is doing that affects BTRFS without it. Possibly this is more a problem with how Steam handles it’s own packaging than with how Fedora uses BTRFS.

I am running with btrfs luks and use steam without errors a d have been for a a few years at this point.

Re ext4: given its lack of checksums it might be corrupt and not spot the issue.

Hi there, thanks for this suggestion. It does appear to be the RAM…

I ran memtest86 and received over 600 errors about 30 minutes into testing (I stopped when the number kept climbing).

I was able to grab some RAM from a donor PC and now seems to be able to download/retry those games without issue.

There are quite a few issues open related to this problem on GitHub… so I found it hard to believe that we all had RAM issues… but, who knows?

1 Like

Sorry for the side comment :thinking:

This further reinforces my assumptions on BTRFS. It just doesn’t cut it for me. ext4 or XFS don’t have this many issues. . .

XFS to the rescue.

I used to work with SGI gear that had ECC memory. Solar flares like this week’s event would cause an increase in the number of ECC error corrected reports, but the RAM was not damaged. You could check dates of the BTRFS errors using journalctl -g 'BTRFS.*error'.

How recently was this happening?

The reason I ask is that back in the dim and distant past chips where in ceramic packages that had the, unfortunate, feature that a cosmic ray hitting the packing material would cause a scintillation effect that showed the chip with high intensity particles changing memory values.

That is why chips now use plastic packaging that does not scintillate.

Why is it hard to believe? There are 1,000’s of people using steam and btrfs.
Are there only a hand full of reports? If so its quite possible that hardware errors are responsible.

Especially as the point of the checksums in btrfs, xfs etc is to detect problems caused by hardware and stop report to the user things are bad.

20+ years. I would not be surprised if ceramic packages were used.

Some gamers overclock their CPU and then start running into memory issues.

Since data is enormously large compared to file system metadata, it’s a far bigger target for corruption. Btrfs will catch this. Other file systems will not.

1 Like

touche, thanks for sharing your insight.

So I guess I should be thankful in this situation? BTRFS inadvertently gave me an early warning about RAM issues that could’ve reared its ugly head through other vectors at some later point in time.

Anyway, as an update to anyone else reading this thread now, or in the future, new (to me) donor RAM fixed my issue (I ran memtest86 for a control on it beforehand and got 0 errors).

Thank you to everyone for your help.

I’ve ran memtest86 and the open-source one for multiple passes overnight on a Ryzen X470 board and it had no errors. I had crashing on Vulkan and DX12 stuff with a RX 6600 XT and knew that was from bad RAM before, and ran HCI’s memtest and found errors.

I’m not too sure on differences between different mem test programs, but I suggest getting more results from different programs.

1 Like