/home seems to be wiped out after a reboot

So it all started when I woke up my PC (with Fedora 40, the KDE Plasma spin) from sleep, and it was frozen on the login screen. It does that often, or sometimes the screen would be black after waking it up. Either way, I force-rebooted the PC as always do in this case… and then I found myself in emergency mode. It took me a while to figure it out, but… it seems like the /home directory has been wiped. It’s empty. How the hell could that happen? And is there a way to restore the user, or did I just lose everything?

Emergency mode told me to do journalctl -xb, so the results of that can be found here. Thanks for any help you can give me.

The /home directory (in the default disk setup — let us know if you did anything differently!) is a separate btrfs subvolume. Or, in older setups, it might be a separate ext4 partition.

Either way, from your logs, that volume is corrupt in some way and the system is declining to mount it for safety.

I’m looking at:

ul 19 00:34:00 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1 contains a file system with errors, check forced.
Jul 19 00:34:00 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Inode 19056415 extent tree (at level 1) could be narrower.  IGNORED.
Jul 19 00:34:00 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Inode 19138157 extent tree (at level 1) could be narrower.  IGNORED.
Jul 19 00:34:03 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Inode 24414951 extent tree (at level 1) could be narrower.  IGNORED.
Jul 19 00:34:03 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Inode 24532755 extent tree (at level 1) could be narrower.  IGNORED.
Jul 19 00:34:03 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Inode 24676348 extent tree (at level 1) could be narrower.  IGNORED.
Jul 19 00:34:03 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Inode 24676349 extent tree (at level 2) could be narrower.  IGNORED.
Jul 19 00:34:03 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Inode 24931582 extent tree (at level 1) could be narrower.  IGNORED.
Jul 19 00:34:07 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Directory inode 20987772, block #194, offset 0: directory has no checksum.
Jul 19 00:34:07 fedora systemd-fsck[864]: FIXED.
Jul 19 00:34:07 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: Directory inode 20987772, block #194, offset 4084: directory corrupted
Jul 19 00:34:07 fedora systemd-fsck[864]: /dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
Jul 19 00:34:07 fedora systemd-fsck[864]:         (i.e., without -a or -p options)
Jul 19 00:34:07 fedora systemd-fsck[861]: fsck failed with exit status 4.
Jul 19 00:34:07 fedora systemd[1]: systemd-fsck@dev-disk-by\x2duuid-75ef6eed\x2d5fc6\x2d4307\x2db765\x2d5f43aa508691.service: Main process exited, code=exited, status=1/FAILURE

Can you do cat /etc/fstab and paste that here, just to check some things…?

1 Like

Well, originally I used the default setup, but one day I woke up and the /home directory turned read-only by itself! After much struggle, I reinstalled the whole OS, but changed the partitions to ext4.

Here’s fstab:


#
# /etc/fstab
# Created by anaconda on Wed Dec 20 21:02:04 2023
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
UUID=66178548-51cf-451c-915c-12b112193eb4 /                       ext4    defaults        1 1
UUID=162f9bbe-7b83-49f7-b225-35d3f5c0830e /boot                   ext4    defaults        1 2
UUID=03F5-876B          /boot/efi               vfat    umask=0077,shortname=winnt 0 2
UUID=75ef6eed-5fc6-4307-b765-5f43aa508691 /home                   ext4    defaults,x-systemd.device-timeout=0 1 2
/swapfile_extend_32GB       none       swap    sw        0       0

Btrfs was catching an error on your file system or hardware. Ext4 cannot do that.

When BTRFS mounts your /home read only, that means it has found an error on the subvolume it is attempting to mount. This read only state is intended to allow you to recover your data prior to doing diagnostics and repairs to the filesystem.

I would run some of the SMART tests on the disk to verify it is good. Plus I would go back to the defaults. BTRFS is much superior to EXT4.

3 Likes

Assuming this is what you mean by “SMART tests”, here are the results of sudo smartctl -a /dev/sdd (ran from a live USB):

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.5-301.fc40.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Silicon Motion based SSDs
Device Model:     TS1TSSD230S
Serial Number:    I031640378
LU WWN Device Id: 5 7c3548 21a53e53a
Firmware Version: 22Z3V4EI
User Capacity:    1,024,209,543,168 bytes [1.02 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul 19 06:00:04 2024 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       1487
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       1005
148 Total_SLC_Erase_Ct      0x0000   100   100   000    Old_age   Offline      -       3572
149 Max_SLC_Erase_Ct        0x0000   100   100   000    Old_age   Offline      -       102
150 Min_SLC_Erase_Ct        0x0000   100   100   000    Old_age   Offline      -       0
151 Average_SLC_Erase_Ct    0x0000   100   100   000    Old_age   Offline      -       43
159 DRAM_1_Bit_Error_Count  0x0000   100   100   000    Old_age   Offline      -       0
160 Uncorrectable_Error_Cnt 0x0000   100   100   000    Old_age   Offline      -       0
161 Valid_Spare_Block_Cnt   0x0000   100   100   000    Old_age   Offline      -       105
163 Initial_Bad_Block_Count 0x0000   100   100   000    Old_age   Offline      -       14
164 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       25928
165 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       49
166 Min_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       5
167 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       16
168 Max_Erase_Count_of_Spec 0x0000   100   100   000    Old_age   Offline      -       3000
169 Remaining_Lifetime_Perc 0x0000   100   100   000    Old_age   Offline      -       100
177 Wear_Leveling_Count     0x0000   100   100   050    Old_age   Offline      -       0
181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       0
192 Power-Off_Retract_Count 0x0000   100   100   000    Old_age   Offline      -       38
194 Temperature_Celsius     0x0000   100   100   000    Old_age   Offline      -       33
195 Hardware_ECC_Recovered  0x0000   100   100   000    Old_age   Offline      -       0
196 Reallocated_Event_Count 0x0000   100   100   016    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0000   100   100   050    Old_age   Offline      -       0
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       100
241 Host_Writes_32MiB       0x0000   100   100   000    Old_age   Offline      -       169790
242 Host_Reads_32MiB        0x0000   100   100   000    Old_age   Offline      -       653730
245 TLC_Writes_32MiB        0x0000   100   100   000    Old_age   Offline      -       544488

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 1

ATA Error Count: 0
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 0 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 ec 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  c8 00 00 00 00 00 00 00      00:00:00.000  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       403         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

So should I do what you wrote there? You deleted it, so I’m unsure.

Did it anyway:

$ systemctl status home.mount
○ home.mount - /home
     Loaded: loaded (/etc/fstab; generated)
     Active: inactive (dead)
      Where: /home
       What: /dev/disk/by-uuid/75ef6eed-5fc6-4307-b765-5f43aa508691
       Docs: man:fstab(5)
             man:systemd-fstab-generator(8)

Jul 19 13:09:29 fedora systemd[1]: Dependency failed for home.mount - /home.
Jul 19 13:09:29 fedora systemd[1]: home.mount: Job home.mount/start failed with result 'dependency'.

$ sudo mount -a
$ grep -e /home /etc/mtab
/dev/mapper/luks-bc0bd211-8abc-4976-8a00-b9b8ae1d2ca1 /home ext4 rw,seclabel,relatime 0 0
2 Likes

It appears that you had at least one error for the device. Is the /home partition EXT4? If you didn’t change your fstab to reflect the filesystem change from BTRFS to EXT4 your /home likely wouldn’t mount correctly since it would look for a btrfs subvolume not an ext4 partition.

Indeed, my /home partition is ext4. It’s weird, because this partition worked great for more than half a year.

Well then, your fstab is likely ok. Can you try to mount it with the live usb as a temporary mount point then browse it?

1 Like

I haven’t used this command in a while, so there’s a good chance I did it wrong:

$ sudo mount /dev/sdd6
mount: /dev/sdd6: can't find in /etc/fstab.

I should probably note that my /home partition is encrypted. I remember the passphrase well though, so there’s no issue with that.

Edit: I can’t mount it through the command line, but I can mount it through Dolphin or KDE Partition Manager. I can browser the whole partition fine, except /home is empty, as said earlier.

That command should be sudo mount <device> <mount point> potentially with added -t (fs type) or -o (options) as well for some configs.

Use man mount to see how that command is used.

1 Like

I did sudo mount /dev/sdd6 /testuser and it worked. I now see the entire filesystem! /testuser/home is still empty though.

Added emergency-mode, ext4, filesystem, luks2, suspend-resume and removed kde, sleep

Yes, you should absolutely mention that :slight_smile: this is extremely relevant.

How do you encrypt your home drive? This is not default on Fedora, deviating from defaults always increases the need to troubleshoot yourself.

I assumed you use LUKS.

Yes KDE Dolphin and GNOME Nautilus (likely many others too) use udisks to mount drives.

I highly recommend udisks over mount, if you want a temporary solution, with write access etc.

udisksctl --help

It is way easier and safer to use, has sane defaults and also allows to unlock and mount

I will make a howto on udisksctl and LUKS in no time.

Done

2 Likes

I encrypted the drive with LUKS2.

So to mount encrypted drive (if the encrypted drive are indeed /dev/sdd6):

sudo cryptsetup open --type luks /dev/sdd6 tmp_home
# <enter_password>
sudo mount /dev/mapper/tmp_home /mnt

Now you should be able to browse you home in /mnt

1 Like

Hey, now I see my old home directory! Thanks for that. So how can I restore it to its rightful place?

By “old” do you mean the contents aren’t up to date with recent changes?

No, it’s just as it was yesterday. Sorry for being unclear. I should also note that I made a mistake, the LUKS partition is /dev/sdd7.

1 Like

But it still doesn’t mean that the issue is solved. I need to find a way to make it mount the partition in /home automatically on boot (after I decrypt it ofc), as it did before.