I’ve got a little home server, with the following RAID set up. Any help would be amazing, as its fallen over.
A. 500GB HDD - /boot, and /
B. 500GB SSD - /
C. 3TB HDD /family
D. 3TB HDD /family
E. 3TB HDD /family
The RAID is software, with mdadm, its been running a good while now and the LVM stuff wasn’t around back then.
A few of these have died, and I’ve replaced them. B. was a HDD, but I went for an SSD last time.
A has just died, so I have two questions …
When I buy a new drive, and set it up for RAID, how do I go about sorting out a new /boot partition and GRUB? Never had to do this before.
500GB HDD seems to be hard to come by, is there anything to be said for sticking with a HDD for reliability on / and buy a 1TB drive, but only use 500GB of it? I went for the other SSD for speed, and thought stick pairing with a HDD for reliability? There was some mdadm param I set to allow for the different write speeds.
I had read that, I found my notes when I set this up, and it was the “writemostly” flag I saw here that suggested to me it was okay to do this - Hybrid HDD + SSD RAID1 on Linux
If that page is wrong, then maybe I’m better getting a 2nd SSD.
In which case, its just working out how to sort out /boot and GRUB.
I like the info in that page, but have never tried the hybrid raid.
YMMV but it may be worth a try.
Smaller HDDs are difficult to find any more, and you must be careful about the way it writes.
HDDs that use SMR recording are worthless in most cases since after the drive gets a little bit of data written it begins ‘shingling’ the data (overlapping layers) and that really slows down the writes and potentially introduces read errors. Drives with CMR tech are the way to go for most and the only type I will accept.
I have a couple USB HDD that that old and still working. 15 years ago may have been “peak quality” for HDD. A few years later at work we bought a case of 2TB drives with 5-year warranty. One failed in under a year and was replaced under warranty, the others started failing just after 5 years – a testament to how well manufacturers have dialled in their processes.
The failure modes are very different. SSD and spinning drives share (small) risks of some electrical component failing, but SSD wear is based on write cycles, so SSD drives used mostly read-only (e.g., web servers) should last longer than drives used in a data pipeline where new data are written, processed to write more new products, then moved to archival storage (tape robot). With SSD’s you can see when spare cells are allocated and plan for replacement.
So far so good, failed the old drive with mdadm, removed it, added the new SSD, and the raid array rebuilt fine.
I am now onto trying to reinstall grub and the EFI partition, with a chroot.
Are these instructions anything like?
I managed to chroot and reinstall GRUB, but it didn’t give me a bootable system (no bootable disk it said), so must be missing something. I suspect its to do with the EFI partition.
As I understand it, the efi partition may cause problems if it is in raid. The same for /boot.
It appears those 2 partitions should be on a single drive outside of the raid array.
When using software raid, the raid does not become active until after the kernel is loaded.
This means that first the bios must read the efi partition to launch grub; then grub must reach the grub.cfg in /boot in order to load the kernel. All of this before the raid is activated.
That command will permanently break the ability to update kernels unless the user makes repairs or manually repeats the command above with every kernel update.
The file you overwrote is created by grub as a pointer to the real grub.cfg at /boot/grub2/grub.cfg. The location of the grub.cfg file was changed about the release of f32 (5 years ago) and most users seem aware of the change.
Once changed as you did with that command the system will no longer automatically update it since the file under /boot/efi is a static file and only a pointer.
The only way I am aware of repairing it at present is to follow the following steps. (I was told recently that there is a new command being provided with the release of f43 to do the repairs differently, but for now these steps are required)
Remove both grub.cfg files sudo rm /boot/efi/EFI/fedora/grub.cfg /boot/grub2/grub.cfg
rebuild both those files properly sudo dnf reinstall grub2-common
Once this is done then the system should perform updates properly.
The grub2-mkconfig command should only be used with the output file as one of /etc/grub2.cfg, /etc/grub2-efi.cfg (both symlinks) or /boot/grub2/grub.cfg (the real file).
I suspect that if a new kernel was installed as the update was performed it may not be bootable.
Did this actually do anything? On recent Fedoras it should refuse to write to /boot/efi and give you an error message explaining the correct place to write the config to (as @computersavvy said).
I don’t remember which, but on either f41 or f42 there was added an error message to indicate the command should not be run with that output file. The user I responded to seemed to indicate that the command completed properly, which was why I gave the proper fix when that happens.