This question might be based on some misunderstanding of btrfs
, so please correct me and/or link to any clear documentation if I’m misunderstanding things.
I want to do a large recursive copy (often a whole subvolume) from one partition to another with behavior equivalent to “inband” deduplication. I don’t care that the copy operation might take several times as long as a normal copy. I do care that it doesn’t even temporarily take as much space as would be required if I did an ordinary copy followed by “out of band” deduplication.
As an example, consider subvolume X in one btrfs filesystem and subvolume Y in another, where X and Y have mostly the same contents (even with file creation and modification dates different for otherwise matching files); Create subvolume Z in the same btrfs filesystem as X with contents (including directory info) matching Y. Taking advantage of the fact that Z’s file contents will mostly match X’s file contents, the incremental space used by Z should be small.
That example might have a good “backwards” solution and/or some special method I haven’t thought of. I’m interested both in that example and in some less simple cases, so if that example has a better answer than the general answer, I’d like to know both answers. By backwards, I mean snapshot X into Z and then apply all the differences from Y into Z (I don’t know a good tool for applying all the differences in that situation).
I’ve installed and tried duperemover
and read its documentation, but don’t understand enough about what it is really doing to know whether it would be practical for an incremental kludge: Copy a small fraction at a time and then deduplicate that fraction into a pre-existing database of what was there before. Hopefully, combined with temporarily tweaking the delayed write features of the destination fs, one could avoid ever actually writing the majority of contents that are immediately removed as duplicate.
I understand “in band” deduplication exists. I think I understand I would need to rebuild the kernel myself to get that feature. I think I understand that it would be a bad feature to have turned on all the time, maybe even a bad feature to have turned on for other use of the fs that occurs at the same time as the copy operation that wants it. But maybe turning that on during the copy operation is my best answer.