Upgrade from F41 to F42 resulted in broken dnf

I upgraded to F42 last night on a ThinkPad. Everything seemed to go fine, but after the system booted into F42 after upgrade, I’m unable to use dnf as I get the following error:

SQL statement execution failed: "PRAGMA locking_mode = NORMAL; PRAGMA journal_mode = WAL; PRAGMA foreign_keys = ON;": (10) - disk I/O error

This appears to be caused by:

[ 1434.163623] BTRFS warning (device dm-0): csum failed root 258 ino 9939202 off 82141184 csum 0x84334bfd expected csum 0x6e4e098a mirror 1
[ 1434.163636] BTRFS error (device dm-0): bdev /dev/mapper/luks-2e54214c-e365-40ec-9ab4-846f1ade02a8 errs: wr 0, rd 0, flush 0, corrupt 37, gen 0
[ 1434.163903] BTRFS warning (device dm-0): csum failed root 258 ino 9939202 off 82141184 csum 0x84334bfd expected csum 0x6e4e098a mirror 1
[ 1434.163914] BTRFS error (device dm-0): bdev /dev/mapper/luks-2e54214c-e365-40ec-9ab4-846f1ade02a8 errs: wr 0, rd 0, flush 0, corrupt 38, gen 0
[ 1434.164132] BTRFS warning (device dm-0): csum failed root 258 ino 9939202 off 82141184 csum 0x84334bfd expected csum 0x6e4e098a mirror 1
[ 1434.164142] BTRFS error (device dm-0): bdev /dev/mapper/luks-2e54214c-e365-40ec-9ab4-846f1ade02a8 errs: wr 0, rd 0, flush 0, corrupt 39, gen 0

The file in question appears to be (inode 9939202) :

/usr/lib/sysimage/libdnf5/transaction_history.sqlite-wal

rpm appears to be fine, I’ve been able to remove packages etc., it’s just dnf. I checked the error logs and smart logs for the NVMEe storage device and there aren’t any. This appears to solely be a problem with btrfs reporting a checksum error for this file. I’m at a loss to why this would have occurred. Is it a hardware issue, bad memory etc. that resulted in corrupt file, a bug in btrfs, I dunno.

I did fully backup my home directory before the upgrade, so I could burn the whole thing down and re-install, but maybe there is a way to fix this? I’m also considering booting to a usb live image and running memory checker etc.

Any suggestions?

Thanks!

Note: I’ll be slow to respond to any follow-up questions.

Update (1): I ran the opensource memtest booted from USB, no memory errors found for whatever that’s worth.

Update(2): Work around for me was to manually remove and re-install dnf5 using rpm. This removed and replaced the corrupted file.

Your filesystem is damaged and has corrupted files.
Maybe the disk is failing?
Have a look in dmesg to see if there are messages about disk errors.
Also look at the SMART info with sudo smartctl -e /dev/YOUR-DISK for counters indicating pending failure. You can post here for us to review.

Hmm… smartctl does not have the -e option

alevykh@prsnl:~$ smartctl --version
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.14.2-300.fc42.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

smartctl comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License; either
version 2, or (at your option) any later version.
See https://www.gnu.org for further details.

smartmontools release 7.4 dated 2023-08-01 at 10:59:45 UTC
smartmontools SVN rev 5530 dated 2023-08-01 at 11:00:21
smartmontools build host: x86_64-redhat-linux-gnu
smartmontools build with: C++17, GCC 15.0.1 20250114 (Red Hat 15.0.1-0)
smartmontools configure arguments: [hidden in reproducible builds]
reproducible build SOURCE_DATE_EPOCH: 1737244800 (2025-01-19 07:00:00)
alevykh@prsnl:~$ sudo smartctl -h | grep -- '-e'
alevykh@prsnl:~$ 

As I mentioned in my initial post, I checked the NVMe device for errors using the nvme cli. No device errors or smart log messages there.

I just managed to work around this by using rpm to remove dnf5 and manually installing again using rpm. This was a bit painful, but when I removed the packages the corrupted file was removed and subsequently replaced with a good one on re-install.

I’ve already run rpm -Va, and things look ok. I’ll continue to splunk around and see if I can locate any other damage.

2 Likes

My mistake it’s smartctl -x, sorry for the confusion.

1 Like

I would have proposed you to use dnf4 to reinstall dnf5.x86_64 to fix it. However RPM worked too :wink:

I booted to a live usb image and ran a btrfs check which detected no errors. A btrfs scrub is showing two checksum errors. However, when I ran dnf update followed by a scrub I’m getting these warnings in addition to the previous errors:

[  585.375754] BTRFS error (device dm-0): unable to fixup (regular) error at logical 58368720896 on dev /dev/mapper/luks-2e54214c-e365-40ec-9ab4-846f1ade02a8 physical 56296734720
[  585.375842] BTRFS warning (device dm-0): checksum error at logical 58368720896 on dev /dev/mapper/luks-2e54214c-e365-40ec-9ab4-846f1ade02a8, physical 56296734720, root 258, inode 15126207, offset 958464, length 4096, links 1 (path: usr/lib/sysimage/libdnf5/transaction_history.sqlite-wal)
[  585.375850] BTRFS error (device dm-0): unable to fixup (regular) error at logical 58368720896 on dev /dev/mapper/luks-2e54214c-e365-40ec-9ab4-846f1ade02a8 physical 56296734720
[  585.375886] BTRFS warning (device dm-0): checksum error at logical 58368720896 on dev /dev/mapper/luks-2e54214c-e365-40ec-9ab4-846f1ade02a8, physical 56296734720, root 258, inode 15126207, offset 958464, length 4096, links 1 (path: usr/lib/sysimage/libdnf5/transaction_history.sqlite-wal)

I’m not sure why I’m getting these warning pertaining to:
/usr/lib/sysimage/libdnf5/transaction_history.sqlite-wal

This is the same file and warnings that were preventing dnf from running before, but dnf appears to be running transactions ok.

Note: The systems logs are showing no btrfs errors/warnings before the upgrade to F42.

Do you have a snapshot that might still be holding a reference to the old, corrupt version of the file?

No snapshots in use