I’ve got a little home server, with the following RAID set up. Any help would be amazing, as its fallen over.
A. 500GB HDD - /boot, and /
B. 500GB SSD - /
C. 3TB HDD /family
D. 3TB HDD /family
E. 3TB HDD /family
The RAID is software, with mdadm, its been running a good while now and the LVM stuff wasn’t around back then.
A few of these have died, and I’ve replaced them. B. was a HDD, but I went for an SSD last time.
A has just died, so I have two questions …
When I buy a new drive, and set it up for RAID, how do I go about sorting out a new /boot partition and GRUB? Never had to do this before.
500GB HDD seems to be hard to come by, is there anything to be said for sticking with a HDD for reliability on / and buy a 1TB drive, but only use 500GB of it? I went for the other SSD for speed, and thought stick pairing with a HDD for reliability? There was some mdadm param I set to allow for the different write speeds.
I had read that, I found my notes when I set this up, and it was the “writemostly” flag I saw here that suggested to me it was okay to do this - Hybrid HDD + SSD RAID1 on Linux
If that page is wrong, then maybe I’m better getting a 2nd SSD.
In which case, its just working out how to sort out /boot and GRUB.
I like the info in that page, but have never tried the hybrid raid.
YMMV but it may be worth a try.
Smaller HDDs are difficult to find any more, and you must be careful about the way it writes.
HDDs that use SMR recording are worthless in most cases since after the drive gets a little bit of data written it begins ‘shingling’ the data (overlapping layers) and that really slows down the writes and potentially introduces read errors. Drives with CMR tech are the way to go for most and the only type I will accept.
I have a couple USB HDD that that old and still working. 15 years ago may have been “peak quality” for HDD. A few years later at work we bought a case of 2TB drives with 5-year warranty. One failed in under a year and was replaced under warranty, the others started failing just after 5 years – a testament to how well manufacturers have dialled in their processes.
The failure modes are very different. SSD and spinning drives share (small) risks of some electrical component failing, but SSD wear is based on write cycles, so SSD drives used mostly read-only (e.g., web servers) should last longer than drives used in a data pipeline where new data are written, processed to write more new products, then moved to archival storage (tape robot). With SSD’s you can see when spare cells are allocated and plan for replacement.
So far so good, failed the old drive with mdadm, removed it, added the new SSD, and the raid array rebuilt fine.
I am now onto trying to reinstall grub and the EFI partition, with a chroot.
Are these instructions anything like?
I managed to chroot and reinstall GRUB, but it didn’t give me a bootable system (no bootable disk it said), so must be missing something. I suspect its to do with the EFI partition.
As I understand it, the efi partition may cause problems if it is in raid. The same for /boot.
It appears those 2 partitions should be on a single drive outside of the raid array.
When using software raid, the raid does not become active until after the kernel is loaded.
This means that first the bios must read the efi partition to launch grub; then grub must reach the grub.cfg in /boot in order to load the kernel. All of this before the raid is activated.