I have (had) a logical volume consisting of 3 disks. One of the disks died completely, with no recovery of it possible. Is there any way to recover files on the other two disks? Yes, I know that, obviously, there will be some files that I can’t recover., but, there must be some files intact on those disks.
I had done this once before (I think) by forcing the activation of the logical volume with the partial flag, then running e2fsck and let it fix all errors. This time, though, e2fsck generates many “error while reading inode #xxx” (always the same inode), then it eventually generates an OOM.
it is highly unlikely that you will be able to recover any content from those drives without sending them to a recovery service. Assuming the drives were the same size, 33% of the filesystem has disappeared.
Spanning filesystems across non-redundant drives is extremely risky.
I always recommend using at least raid 1 and preferably raid 5 or 6 if multiple drives are used. That at least provides redundancy and the ability to have a single drive fail (2 with raid 6) without data loss.
Any method that uses multiple drives without providing redundancy simply multiples the failure risk.
Yes, but there still is 66% of the filesystem left, which I would like to recover. And I was hoping to find some utility to recover that (perhaps the same ones that the recovery service uses).
As I mentioned above, I had done the same thing, but on a much smaller filesystem. e2fsck was able to find the still available files and their directories, and put them in “lost+found”. This time, though, it keeps running out of memory and crashes. If you have any ideas about that problem, they would be definitely useful.
PS: Please note that I do understand the problems/dangers of having a filesystem span multiple disks. With that said, I just now need to figure out how to recover what ever is left.
While it’s true that 66% of the filesystem’s storage is still present, you can’t know how the filesystem’s metadata was laid out or even how the contents were laid out. For example, if LVM balanced all of the data across the three drives, then all of the inodes, block bitmaps, and other structures lost 33% of their content too.
The tools that recovery services use are likely not open source, and may not even be available for purchase (it is their business to do this, after all ).
At this point I’d suggest asking about this in a place where ext3/ext4 experts hang out, as they’d be better able to guide you in how to get e2fsck (or any other tool) to handle that broken filesystem. I’m not quite sure where that would be though since the ext3-users mailing list appears to be defunct and the linux-ext4 developers list isn’t really a suitable place for this type of discussion.
Think of a filesystem that is striped across 3 drives with raid 0 as similar to this.
It spreads the data for a single file across all the drives.
What do you envision as recoverable if #3 fails and the only remaining portion of that file is this?
The part that was on #3 is totally gone since the drive failed. You may be able to read the portion that was on #1 & #2 but since it is only a portion of the file (and you do not know what portion it is nor what may be missing) how can you anticipate recovery?
This all assumes that the filesystem was striped. And I believe that the “default” lvm logical volume isn’t. Note that I could be wrong about this. My experience with lvm is that you have to explicitly tell it a raid level, else it defaults to raid 0, or JBOD.
My setup is just JBOD, with an ext4 system. The order of the disks is written to the first sector of each disk (in ASCII if you want to check). That tells lvm the order of the disks; each disk, or physical volume as lvm calls it, has a UUID, so lvm can find the other disks easily. Then an ext4 filesystem is formatted to the JBOD, with no striping. So no file, or directory entry, should span disks. Once again, this is what I believe is happening with lvm.
As I said, I might be wrong, as I am no expert of lvm or ext4. This is just from my (humble) experience.
I have never had to try recovery of an LVM system with failure of one device in the VG. It is possible to shrink a VG (only with it inactive, or rather active but unused). Possibly if you can activate the VG while booted from live media you might be able to remove that PV from the VG and resize the VG to fit the remaining 2 devices. That does, of course, assume that all the LVs that remain would be entirely on the remaining devices.
Your description of what fsck told you seems likely that the inode failing was within the LV and on the missing device.
If you can get to the point where you can run fsck on the LV then it seems likely that you may be able to mount it (read only) and backup the remaining data before you do anything else.