Clone a complete LUKS drive fail-resistant?

I am very sure my SSD has a hardware error somewhere (always crashes when opening too much stuff, other SSDs have 0 problems, no logs, …), but I can still barely use it, boot in it and copy my stuff.

When using rsync, I can copy my files to a disk, and if it failed I could just repeat the process 3 times or so and its done.

But I would like a complete disk image, so that in the best case I could just go on on another SSD.

Special things:

  • from NVME to SATA SSD, both “1TB”
  • the action sometimes fails, clonezilla for example just hung and didnt do anything. I need a way to repeat the process, compare the drives and finish the rest, like rsync does it. DD seems to blindly always flash everything.
  • both are LUKS encrypted and I need to keep them like this

I think ddrescue has that sort of functionality. It will copy the blocks it can in a first pass and it keeps a log of what sectors failed to copy. You can re-run it to re-try copying the sectors that failed on the first pass (or not, in witch case those sectors in the copy will be left null).

Edit: If you are redoing your system drive, you might want to consider using two SSDs and mirroring them to avoid a potential repeat of this situation. Modern file systems (Btrfs and ZFS) can “self heal” those sort of errors when there are two (or more) physical drives in a mirrored configuration. If a bad sector is encountered, the file system will just use the copy of that sector from the other drive when reading and it will also make a new copy on the original drive to restore the redundancy of that sector. With mirroring, you also get the benefit of doubling the speed of your sector reads since it will stripe reads across the two drives.

1 Like

thanks! It seems to have worked and I was able to use the system, then after reboot I get a kernel panic…

Might try to clone again, do you know if clonezilla has ddrescue integrated?

Btw I think the issue was that I bent the damn NVME using a silicon cool pad. Ironic.

1 Like

I don’t know about clonezilla. IIRC (which is a big if), it uses partclone underneath. The partclone command tries to work at the file system level, not at the block level, but it will fall back to using something akin to dd if it cannot decode the file system (I don’t know if it is closer in implementation to the plain dd or to ddrescue).

For data recovery, I prefer ddrescue. There’s a manual that helps understand how to use it effectively. This is best for copying an entire disk or partition, i.e. the “bottom” most layer of the storage stack. So if it’s an encrypted partition, ddrescue is lifting ciphertext not plaintext.

To clone a filesystem, clonezilla should work OK.

In any case, btrfs scrub the file system after you’ve cloned it. Btrfs checksums metadata and data, so it can unambiguously report whether there’s corruption in the duplicate.

(Extraneous btrfs specific feature info)

My preference for cloning file system when using Btrfs is to use the Btrfs seed+sprout feature. It clones at the file system extent level using the balance code path, so it should be faster than either block or file level copies, while preserving all file system metadata.

Possibly the non-obvious trick here is the removal of the seed. When the seed is removed it tells Btrfs to replicate extents to the sprout, creating a clone. This can take a while, so the command won’t return to a prompt right away. The one difference following completion is the sprout will have a different file system UUID from the seed.

A non-obvious advantage of this method of cloning is you can optionally prune the file system before replication. An example use case: l I want an identical copy of the original except several junk snapshots. After the rw,remount step, I delete the junk snapshots, resulting in metadata being COW only to the sprout device. The snapshots still exist on the ro seed, they just appear to be deleted because seed+sprout are currently mounted as one file system, like an overlay file system. Upon removing the seed, there’s no need to replicate the deleted junk snapshots to the sprout, so it saves some time and space. An aggressive example would be to delete everything but my home subvolume and its snapshots. Why replicate the operating system and applications, I can just reinstall them if needed?