No initramfs file after update > restart > loss of power while restarting

Hi, I’m actually having the exact same issue in this discussion:

Except, both of the kernel options that the system is giving me are producing the error:

initramfs-6.15.9-101.fc41.x86_64.img not found

and

initramfs-6.15.9-100.fc41.x86_64.img not found

respectively

I tried using a USB with Fedora but it seems as if my btrfs filesystem was corrupted. I didn’t try to repair it or anything, in case it was destructive.

Does this have any repair or am I lost forever?

It is highly unlikely that your personal files are lost. However, there is no “one button” you can click to recover them. Reinstalling your OS is probably the easiest option. But if your files are important or if you just want to learn more about how Linux works, it is possible (with effort) to recover from that sort of error.

2 Likes

I tried running a Live USB and I got this error while checking the drive (p3 is the partition where all the data is. GRUB is fine):

Disclaimer: I am not a Btrfs guru. If you want someone who really knows what they are doing with regard to Btrfs, tag your question with the btrfs tag and at-mention Chris Murphy.

That said, I think you might be able to access your files with mount -o ro,rescue=all /dev/nvme1n1p3 /mnt as was reportedly done here. If that mount command succeeds, you should find your files under the /mnt directory. You can then copy the ones you want/need and then reinstall your OS. Actually repairing the existing filesystem would be more involved and you should ask Chris for instructions on how to do that.

1 Like

It actually mounted, but the weird thing is, it shows whatever is in root but nothing that was in home (the actual important stuff) It was all in a unified partition.

I’ll be checking it further and do as recommended.

I think your home files should be under /mnt/home (but I’m not sure).

1 Like

My initial thought was that it should but it’s empty. The fstab file shows it should be there:

You might try unmounting it (umount /mnt) and then remount it with mount -o ro,rescue=all,subvol=home,compress=zstd:1 /dev/nvme1n1p3 /mnt. Again, I know next to nothing about Btrfs, so I don’t know if that should be necessary or if something more significant is wrong.

1 Like

Ah there’s the superblock error again:

There’s new information on dmesg

Well, you can enter dmesg and scroll back to see if you can find an error that might explain why it failed. I would think if it could read the superblock to access the root subvolume, it should be able to read the superblock to access the home subvolume, but I don’t know. If the files are important to you, try to get hold of Chris. I think he is more active on Matrix (https://chat.fedoraproject.org/#/room/#btrfs:fedoraproject.org) than he is around here.

1 Like

It is very strange that your Btrfs filesystem is corrupt and your boot partition is missing at least two initramfs files. Those are two separate filesystems that are stored on different areas of the disk. It seems unlikely that a single power failure would corrupt both of those filesystems at the same time. Is it possible that you overwrote your system disk somehow? Or maybe your disk is failing? You might run smartctl -x /dev/nvme1n1 and see if it reports any problems with your disk.

That’s the weird part, I don’t see any errors on smartctl:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.14.0-63.fc42.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Dogfish SSD 1TB
Serial Number:                      202210200000000010
Firmware Version:                   V0506B0
PCI Vendor/Subsystem ID:            0x126f
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Aug 11 20:13:03 2025 UTC
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     86 Celsius
Critical Comp. Temp. Threshold:     93 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +   4.5000W       -        -    0  0  0  0        0       0
 1 +   2.4000W       -        -    1  1  1  1        0       0
 2 +   0.6000W       -        -    2  2  2  2        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000    5000
 4 -   0.0040W       -        -    4  4  4  4     5000   25000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        52 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    11,507,762 [5.89 TB]
Data Units Written:                 35,676,309 [18.2 TB]
Host Read Commands:                 142,412,647
Host Write Commands:                857,508,670
Controller Busy Time:               17,050
Power Cycles:                       134
Power On Hours:                     18,218
Unsafe Shutdowns:                   51
Media and Data Integrity Errors:    0
Error Information Log Entries:      30
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               52 Celsius
Temperature Sensor 2:               71 Celsius

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

From what I’ve read in similar contexts, this looks like a hardware issue (RAM, bit flip, etc.).
If you mounted the device with this command:
mount -o ro,rescue=all /dev/nvme1n1p3 /mnt
you should see the root and home folders, which are the two subvolumes created by Fedora.
If the home folder is missing, then there might be a problem with the home subvolume.

While waiting for Chris input, there are some tools you can try to recover the files, for example, this tool (you’ll need a second device to copy the data).

Hope this helps.

2 Likes

The “Invalid Field” does not inspire confidence, but I see that with other nvme drives. Try sudo nvme self-test-log /dev/nvme1n1.

If you have important data and no backup you can (e.g., with Gnome Disks in the Live USB) make an image of the drive on external media before attempting repairs. After you have a copy, run the long self-test. If the drive is bad, you may still be able to recover data using the saved image.

1 Like

Future does look grim. This is the output of the test:

Device Self Test Log for NVME device:nvme1n1
Current operation  : 0
Current Completion : 0%
Self Test Result[0]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[1]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[2]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[3]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[4]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[5]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[6]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[7]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[8]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[9]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[10]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[11]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[12]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[13]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[14]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[15]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[16]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[17]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[18]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[19]:
  Operation Result             : 0
  Self Test Code               : 0
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0

Thanks. I will save this. I have a new HDD to arrive tomorrow.

I’ve been reading about this and this perhaps has more to do with the brand and model of the drive rather than it’s health, but I don’t know…

I’m confused because that discussion is solved, and is the result of a missing initramfs due to DKMS confusion. I don’t see how that can relate to Btrfs because kernel and initramfs are co-located, and even if your setup is non-default by using Btrfs for /boot, if the Btrfs can’t be read then that means neither kernel nor initramfs can be read in which case it’s definitely not the exact same issue. Hence my confusion.

From the first screen shot of btrfs check it thinks /dev/nvme1n1 is not a Btrfs file system. That means the super block is not merely damaged, it’s missing. The magic (signature) is missing.

What’s the result from blkid ? I want to know what libblkid thinks is on nvme1n1p3, is this really the correct device or is it LUKS and it needs to be opened first?

It’s a very very very good idea to not make changes to the file system until we understand the problem. Repairs are always (unfortunately) irreversible and therefore can make things worse and not reversible.

A few screenshots later, there’s a failed mount command. When mount fails and says more information is in dmesg, check it and post all items that occur after the command was issued. I’m guessing the kernel really doesn’t see a Btrfs here at all.

The typical failure mode of flash storage is to return transient zeros or garbage, followed by either persistently returning zeros or garbage - or even sometimes going permanently read only (the device itself, not merely the file system) while showing some earlier state of the data on the device. The latter state is obviously pretty ideal even if the behavior is unexpected and not immediately obvious (the first time I experienced it the device was read-only for some very long period before I was able to infer that this is what had happened).

Are any other partitions on this drive working? Or is this the only partition that is not working?

What do you get for:

btrfs rescue -v super-recover /dev/nvme1n1p3
btrfs inspect dump-s -fa /dev/nvme1n1p3

If the second command doesn’t work, then adding -F might give a clue, but note that forcing the display of what’s at the locations for superblocks could reveal private/secret information. LIke, we don’t know what happened before the problem, so it’s all speculative at the moment. But if there aren’t super blocks where they should be then those locations have been overwritten somehow - but with what? That can give a clue but it’s in the vicinity of 99.9999% user error because the kernel just isn’t going to overwrite file system super blocks, there’s safeguards against that.

OK I see the 1st screenshot might have contained a typo on the btrfs check but I don’t see a reattempted btrfs check on /dev/nvme1n1p3 so can that be provided?

I see in screenshot 4 that the btrfs file system is seen by the kernel but it has lots of problems. The corruption counter is 147. This could be 147 different problems, or it could be 1 problem encountered 147 times. It’s just a simple counter.

If mount -o ro,rescue=all /dev/nvme1n1p3 /mnt doesn’t succeed, then hopefully there’s a recent backup. It’s probably easier to just reformat, reinstall, restore from backup in this case.

If there isn’t a recent backup, then scraping the data with btrfs restore is the next option, but this is an offline tool that’s quite ugly for end users to use and takes quite a bit of patience to learn the tool. It’s just not obvious how to use it, but it is very capable.

Best to find out though if it’s worth the effort with btrfs check though.