smartctl show the nvme as PASSED, but i’m not sure, thanks for any help
Boot another copy of Fedora (choice of media is not important) run the following command against the umounted partition:
fsck -f /dev/nvme0n1X where X is the faulty partition.
You may be prompted to allow some corrections. Allow them. Then rerun the command WITHOUT the -f parameter. If you get no errors, all is well.
If you do end up buying a new SSD avoid the brand that just failed for you.
The SSD hardware should have been designed to no correct itself on a power fail.
Datacenter grade nvme SSDs usually have end to end power fail protection but SSDs in common use do not. The risk associated with power failure is potential data loss. This is much more likely to occur with a dramless model.
Nvme SSDs with 2G of onboard dram will usually write fast enough to ensure that the partiition table is properly updated but may truncate a large file if power failure occurs.
What I picked up on is the drive is reporting that the data being read is corrupted. Which I assume means that a write of torn.
Do you expect on consumer ssd?
how do i find the fault partition?, i only see this log in ther kernel, it’s a default fedora with btrfs
lsblk
nvme0n1 259:0 0 238,5G 0 disk
├─nvme0n1p1 259:1 0 600M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1G 0 part /boot
└─nvme0n1p3 259:3 0 236,9G 0 part /home
sudo fsck -f /dev/nvme0n1p1
fsck from util-linux 2.40.4
fsck.fat 4.2 (2021-01-31)
/dev/nvme0n1p1: 24 files, 4935/153296 clusters
sudo fsck -f /dev/nvme0n1p2
fsck from util-linux 2.40.4
e2fsck 1.47.2 (1-Jan-2025)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nvme0n1p2: 59/65536 files (20.3% non-contiguous), 144883/262144 blocks
The last one said to run BTRFS check command
sudo btrfs check /dev/nvme0n1p3
Opening filesystem to check…
Checking filesystem on /dev/nvme0n1p3
UUID: 0e1bf84e-4c4a-4472-874f-577aedaa9a2d
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
[4/8] checking free space tree
[5/8] checking fs roots
[6/8] checking only csums items (without verifying data)
[7/8] checking root refs
[8/8] checking quota groups skipped (not enabled on this FS)
found 120660201472 bytes used, no error found
total csum bytes: 114702888
total tree bytes: 1874968576
total fs tree bytes: 1617559552
total extent tree bytes: 116064256
btree space waste bytes: 345200943
file data blocks allocated: 393149526016
referenced 172258525184
Apparently everything is fine?
Isn’t your original error message telling you where on the disk the bad blocks were found? This is the LBA (logical block address) number is the number of logical sectors from the beginning of the drive.
If the above is correct, you can do sudo gdisk -l /dev/nvme0n1
to see the beginning and end logical sectors for each partition.
The message also seems to be saying you have a problem with 8 logical blocks. Say your logical sector size is 512 bytes, this error occurs over the breadth of 4 KiB.
One way to check this section might be to see if it made sense to insert a spare USB drive large enough to handle the size of the offending partition stored as an img file, boot a USB live ISO that you have installed on a different USB drive, then use ddrescue
to copy/rescue that partition on your now-quiescent drive to an img file on the first (sufficiently large) USB file system. ddrescue
should find any bad blocks on the SSD, and attempt to repair them to the extent it actually can do that to the img file.
Again, I’d check with those with more experience than I have that any of that makes sense first.
Good luck!