Storage management preparation before disaster

I lost some storage yesterday. It looked like this:

SSD[ LUKS{ LVG( LV<ext4> ) } ]

i.e. an entire ssd, luks-encrypted, logical volume group containing a single logical volume which itself contained a single ext4 partition. (There’s a PV in there, too, but I’m out of matching brackets.:slight_smile: ) The LUKS bit worked, but nothing I tried could make sense of the LVG. I ended up rebuilding it as

SSD[ LUKS{ ext4 } ]

b/c it’ll never be expanded and the extra layers just caused confusion. Fortunately/ridiculously, the whole thing is a backup volume for the local machine, so I just ran my backup script after wiping, reconfiguring, and mounting it at the same mount point. There’s no indication of hardware failure; for this discussion, let’s assume the hardware is okay.

My question is, could I have, or should I have, done something to simplify recovering from this sort of failure? Possibly w/o losing the month’s worth of daily backups? I’m an LVG/LV novice (as if that isn’t obvious). There seemed to be some options in the pv*, vg*, and lv* commands to restore or repair damaged headers etc., but they apparently expect to be given previously gathered data. The tutorials I initially followed to set this up didn’t go into gathering such info prophylactically or how to use it to recover from damage.

LVM is most useful on servers where you need flexible space management to dynamically create/remove/resize volumes, efficiently utilize non-contiguous space fully abstracting from the hardware, use RAID for redundancy and rely on high availability for live OS migration.

A simple backup service might not need all of the above features, but data integrity, compression and deduplication are important, and you can utilize it with Btrfs, although Ext4 is also a good choice if you prefer simplicity and stability above all else.

Unfortunately, discussing lost data recovery is problematic unless you have a full disk dump and can gather the necessary diagnostics.