2 disks (of 5) in LVM group apparently went bad. Unable to boot

johniliffe · August 2, 2023, 6:24pm

When I shut down last night the usual messages ended with a group of error messages. This morning I can’t boot. Apparently 2 volumes in the LVM have become unusable. (/dev/sdb and /dev/sde). Tried to remove them from the VG but get error from vgreduce. fdisk does give me the info for /dev/sdb so the disk must be running and readable. Smartctl has been showing read errors for about 2 weeks on /dev/sdb but I was delayed trying to replace the disk by other problems. What is the best way to approach this?

johniliffe · August 2, 2023, 7:57pm

After several reboots pvscan reports that /dev/sdb has come back online.

Can I safely do a pvreduce --remove missing --mirrorsonly ?

I am assuming that since we are looking at /dev/sde here it is probably (?) a mirror.

computersavvy · August 2, 2023, 11:42pm

One can do an initial bit of trouble shooting on raid devices by looking at the output of cat /proc/mdstat and sudo fdisk -l. If those were posted we may have a starting point for more suggestions.

Note that the content of /proc/mdstat only applies to a raid condition and that information seems required by what you have posted. Mentioning mirroring is the reason for that request.

The content of sudo vgscan, sudo pvscan, and sudo lvscan would also assist since you explicitly stated you are using LVM.

Note also that since you are using LVM with 5 disks in the VG there is a potential that a single drive failure may cause total loss of the data in the entire VG. Mirroring within the VG does not remove this risk. Mirroring to another VG may reduce the risk.

I recommend an immediate backup of the data in that VG, and suggest that in the future when smart reports a problem with a device that one not delay in obtaining a replacement.

johniliffe · August 3, 2023, 12:28am

Thanks Jeff:

this is a pure vanilla install of Fedora and it surprised me that the default is LVM, about which I know nothing. The 5 disks are the result of this machine being an old server where they were part of a RAID cluster, but not here on my workstation unfortunately.

vgscan: WARNING couldn’t find device with uuid 5L2Wa1-Aflwr…
WARNING VG fedora is missing PV 5L2Wa1- (same as above and is actually /dev/sdb)

pvscan
WARNING couldn’t find device with uuid 5L2Wa1 (same as above)
PV /dev/sda2 VG fedora lvm2 [<464.76 G GiB /0 free]
PV [unknown] VG fedora lvm2 [931.51 GiB / 0 free]
PV /dev/sdb1 VG fedora lvm2 [931.51 GiB / 0 free]
PV /dev/sdc1 VG fedora lvm2 [931.51 GiB / 0 free]
PV /dev/sdd1 VG fedora lvm2 [931.51 GiB / 0 free]

lvscan:
WARNING couldn’t find device with uuid 5LWa1… (same as above)
VG fedora is missing PV 5LWa1- …
ACTIVE /dev/fedora/swap [7.66 GiB] inherit
inactive /dev/fedora/home [<4.04 TiB} inherit
ACTIVE /dev/fedora/root [50.00 GiB] inherit

Sorry for hand typing all that; the network connection isn’t active and neither are the USB ports so far as I can see. As for smartctl, I ran it when I got back from vacation on 28 June and did a search on how to replace the disk when pvmove can’t work because all disks are full but everything I found said that the disk would probably work for many years with only one bad sector. Added to that the workstation was out of service for nearly two weeks with a bad Fedora update that had no usable video until last Monday night and then I had to catch up. I certainly had no plans to ignore it.

computersavvy · August 3, 2023, 2:18pm

If one installed the Workstation edition on a clean drive then the default is btrfs. If one uses already existing drives that are configured into LVM then that VG and drives are used as-is meaning the VG and LVM config will remain. The config is read directly from the drives and since they are part of an LVM VG which is intact then they will remain LVM unless the user changes that.

Note that sda, sdb, sdc & sdc are active and what appears to be sde is missing. You stated that the failing drive appears to be sdb so that is normal for the system to fill in the device names when it must skip one during boot.

One would need to run dmesg immediately after boot to see what sata port that device is connected to before disconnecting it…

Since it appears that your data in /home is potentially corrupted and missing because of a drive failure that becomes an issue if it is critical to recover.
OTOH, it seems that the root LV may be entirely on /dev/sda so that would be a good thing and would not necessarily require a reinstall.

However, as I noted above, when a VG is spread across multiple drives, as yours appears to be, there are multiple points of failure possible and a single failure can take out all the data in the VG (or at least all the LVs that have data on that device).

I use LVM, but I first created a raid5 array then created the VG using that array device. Thus one drive failure allows maintaining the data until I can get the drive replaced.

To remove the LVM config one would need to (for each drive) use something like gdisk and write a new empty partition table to the drive. Then a new install would use the default config for fedora on the drives.

On the bright side, it may be that only the LVM headers on that drive have gotten corrupted and the drive itself may be recoverable (though likely not the data it contains).

johniliffe · August 3, 2023, 5:21pm

Thank you for the very detailed explanation Jeff.

The disks pre install were a RAID cluster but LVM seems to have detected them as LVM. I should have paid more attention at the time but as usual I was in a big rush! The data that isn’t backed up is thee-mail contacts list and my Firefox links and it would be nice to try and get them back. Also this week’s
work in its entirety.

I’m not sure why anyone would use LVM since I don 't see that it brings anything positive to the table but I guess there must be some rationale somewhere! There should be a warning when you select it that all data on all disks can be lost on a single disk failure in the installation programme.

Thanks again.

John

computersavvy · August 3, 2023, 5:37pm

What about the info requested from sudo fdisk -l?
It should provide a lot of info about the failing device that may be helpful.

johniliffe · August 3, 2023, 8:26pm

fdisk -l (info for /dev/sdb only)
Disk /dev/sdb 931.51 GiB, 1000284886016 bytes 1953525168 sectors
Disk model ST1000DM003-9YN1
Units: sectors of 1 * 512 = 512 bytes
Sector size: logical/physical 512 bytes/4096 bytes
I/O size: (minimum/optimal) 4096 bytes/ 4096 bytes
Disklabel type: dos
Disk identifier: 0x2012a465

Device Boot/Start End Sectors Size Id Type
/dev/sdb1 2048 1953523711 931.56 0e Linux LVM

`note discrepancy in size between first and last lines. Same for all disks listed.`

All of the working disks are similar except for the id’s and sda is 465.76 GiB and has sda1 and sda2. sda2 is the LVM partition. sda1 is partition type 03.

computersavvy · August 3, 2023, 10:53pm

First line
Disk /dev/sdb 931.51 GiB, 1000284886016 bytes 1953525168 sectors
last line
/dev/sdb1 2048 1953523711 931.56 0e Linux LVM

Note the 931.5 on both as representing the size in GiB.

GigaBytes vs GibiBytes is distinctly different and should be so as shown on line 1.

The last line shows starting sector and number of sectors for the partition to yield the displayed size. The 0e is the partition type.

johniliffe · August 4, 2023, 4:49pm

Yes, I knew about the difference 1000 as compared to 10**10 = 1024. The partition type 03 is Xenix User according to my table, Linux RAID would be type fd so not quite sure how the installer managed to decide that it should be LVM.

Since there seems to be no solution here to recover the data I think I’ll close this issue and reinstall without LVM. I still have no idea why anyo ne would use it.

Thanks for your comments Jeff, I’m sure you have other work to do!

computersavvy · August 4, 2023, 5:52pm

Just a note:
According to wikipedia the partition type 0e is specifying a Fat32 LBA partition.

johniliffe · August 29, 2023, 2:05am

Jeff V - installed two brand new disks and when installed a new copy of Fedora it gave me great grief trying to format then as separate partitions; kept changing back to LVM even when I entered the partition information. Finally, after many tries, I was able to get a regular partition and the computer is working now. Just thought I’d let you know that at least from my Fedora 27 install DVD btrfs is not the default. The trick is to rescan the disks before you do anything, even if you have already clicked on “partitions” in the custom screen.

Anyhow, OK now, this is just a heads up on the original problem.

computersavvy · August 29, 2023, 2:32am

Right. It was about F33 or so that fedora switched to using btrfs as the default.

Good luck going forward.

Topic		Replies	Views
LVM: Metadata has wrong VG name Ask Fedora	7	2054	January 19, 2022
Can't boot into Fedora Ask Fedora	12	15948	November 23, 2019
BTRFS partition corrupted Ask Fedora filesystem	34	7286	May 18, 2023
Boot fails: btrfs errno=-17 Object already exists Ask Fedora intel	12	2245	December 20, 2022
RAID 5 disk failed; can't get it restarted Ask Fedora f35 , workstation	58	1527	April 3, 2024

2 disks (of 5) in LVM group apparently went bad. Unable to boot

John

note discrepancy in size between first and last lines. Same for all disks listed.

Related topics

`note discrepancy in size between first and last lines. Same for all disks listed.`