Btrfs send/receive is extremely slow and incremental sends are as slow as initial send

Although I’ve never tried to backup from a live filesystem with dd or partclone, it’s not recommended. Another approach would be preferable. Or unmount the partitions before cloning with partclone.brtfs. Partclone.btrfs should take less than 2 days.

About slowliness you reported, I asked a LLM (grok):

Your experience with btrfs send/receive taking significantly longer than expected, even with the -p (parent snapshot) option, and the comparison to dd being much faster, points to several factors related to how btrfs send/receive works, the nature of your data, and the specifics of your setup. Below, I’ll analyze why btrfs send/receive is slow in your case, why the -p option didn’t save time, and why dd was faster, while also considering the rsnapshot context. I’ll conclude with recommendations to optimize your backup process.

Why btrfs send/receive Is Slow

Nature of BTRFS send/receive:

Metadata-Intensive Operation: Unlike dd, which performs a raw block-level copy of the entire disk (used and unused blocks), btrfs send/receive operates at the filesystem level, serializing and transferring the BTRFS metadata and data for a subvolume or snapshot. This involves processing complex BTRFS structures like extents, subvolumes, and CoW (copy-on-write) references, which is computationally intensive, especially for large datasets.
Single-Threaded Operation: btrfs send is single-threaded, meaning it doesn’t fully utilize multi-core CPUs or parallelize I/O operations. This can bottleneck performance on large filesystems.
Sequential Processing: btrfs send processes the filesystem’s metadata and data sequentially, walking through the B-tree to generate a stream of operations (e.g., create files, set permissions, write data). For 6.4 TB of data, this process is inherently slower than a block-level copy like dd.


Impact of rsnapshot Usage:

Hard Links and Directory Structure: rsnapshot uses hard links to create incremental backups, which results in many files sharing the same data blocks (via hard links) but appearing as separate files in the filesystem. On a BTRFS filesystem, each hard-linked file has its own metadata entry, even if the data is shared. btrfs send must process all these metadata entries individually, which significantly increases the amount of metadata it needs to serialize and transfer.
Large Number of Files: If your 6.4 TB backup contains millions of files (common with rsnapshot due to its incremental snapshot approach), the metadata overhead becomes a major bottleneck. For example, each file’s inode, permissions, timestamps, and extended attributes must be processed, which can take substantial time, even if the actual data changes are small.
Fragmentation: Backups created by rsnapshot may lead to fragmented data on the BTRFS filesystem, especially if files are frequently modified or rotated. Fragmented extents increase the complexity of the send operation, as BTRFS must handle scattered data blocks.


Disk and System Performance:

I/O Bottlenecks: External disks (especially USB drives) often have slower read/write speeds compared to internal drives. If your 8 TB external disk is connected via USB 3.0 (max ~500 MB/s theoretical, often lower in practice), sequential read/write operations may be limited by the disk’s performance or USB interface.
CPU Overhead: btrfs send requires significant CPU resources to process metadata and generate the send stream. If your system has a low-performance CPU or is under load, this can slow down the process.
Target Disk Performance: The receiving disk (also BTRFS) may introduce additional overhead due to BTRFS’s CoW nature, which requires allocating new extents and updating metadata for every write. If the target disk is fragmented or nearly full, this can further slow down btrfs receive.


Initial Send Taking Two Weeks:

For 6.4 TB of used space, the initial btrfs send/receive operation must transfer all data and metadata for the subvolume or snapshot. If your backup contains many small files or hard links (as is typical with rsnapshot), the metadata processing dominates the time taken, not the actual data transfer.
Example: If your filesystem has millions of files, btrfs send might spend most of its time serializing metadata rather than transferring data. A rough estimate for 6.4 TB with many small files could mean processing tens or hundreds of millions of metadata entries, which can take days or weeks on consumer hardware.




Why the -p Option Didn’t Save Time
The -p option in btrfs send is designed to perform an incremental send, transferring only the differences between the parent snapshot (specified with -p) and the current snapshot. However, in your case, the incremental send took as long as the initial send (two weeks). Here’s why this might have happened:

High Metadata Overhead from rsnapshot:

Even if the data changes are minimal, rsnapshot creates new hard-linked directory structures for each backup rotation. These new directories and hard links generate new metadata entries in the BTRFS filesystem, even if the underlying file data hasn’t changed. btrfs send -p must process all these metadata changes, which can be nearly as time-consuming as the initial send if the number of files or hard links has increased significantly.
For example, if rsnapshot created a new backup snapshot with thousands of new directories or hard links, btrfs send -p must serialize all the metadata for these new structures, negating much of the expected time savings.


Large Snapshot Differences:

The effectiveness of -p depends on the similarity between the parent snapshot and the current snapshot. If significant changes occurred (e.g., new files, deleted files, or rotated backups in rsnapshot), the incremental send may still need to process a large portion of the filesystem’s metadata or data.
If rsnapshot rotated older backups or created new ones, it could have modified or added many files, increasing the diff size. For 6.4 TB of data, even a small percentage of changes (e.g., 5%) could mean hundreds of gigabytes of data or millions of metadata entries to process.


Parent Snapshot Not Optimal:

The -p option requires the parent snapshot to exist on both the source and target filesystems. If the parent snapshot wasn’t properly set up on the target (e.g., if it was modified or not correctly referenced), btrfs send might fall back to processing more data than necessary, behaving closer to a full send.
Additionally, if the parent snapshot itself is fragmented or contains a large number of metadata entries, the diff calculation can be slow.


Inefficient Incremental Processing:

btrfs send -p still needs to walk the BTRFS B-tree to compute differences between snapshots, which is computationally expensive for large filesystems with many files or complex subvolume structures. This process can be nearly as slow as a full send if the metadata changes are extensive, as is common with rsnapshot backups.




Why dd Was Much Faster

Block-Level Copy: dd copies the entire disk at the block level, including used and unused blocks, without interpreting the filesystem’s metadata. For an 8 TB disk, dd transfers data sequentially at the disk’s maximum I/O speed (e.g., ~100–200 MB/s for an external USB 3.0 drive), which explains why it took ~2 days.

Calculation: 8 TB = 8,000 GB. At 150 MB/s (a realistic speed for USB 3.0), copying 8 TB takes approximately:
$$\frac{8,000,000 \, \text{MB}}{150 \, \text{MB/s}} = 53,333 \, \text{seconds} \approx 14.8 \, \text{hours}$$
Accounting for overhead (e.g., disk latency, USB bottlenecks), 2 days is reasonable.


No Metadata Processing: Unlike btrfs send/receive, dd doesn’t process BTRFS metadata, subvolumes, or snapshots, so it avoids the computational overhead of serializing filesystem structures.
No CoW Overhead: dd writes data directly to the target disk without BTRFS’s CoW mechanism, which reduces write amplification on the target.

However, dd has significant downsides for BTRFS:

Copies Unused Space: dd copies the entire 8 TB disk, including unused space (1.6 TB in your case), whereas btrfs send/receive only transfers used data (6.4 TB).
No Incremental Backups: dd cannot perform incremental backups, so every backup requires a full copy, making it less efficient for regular updates.
UUID Conflicts: The cloned disk will have the same BTRFS UUID, requiring btrfstune -u to change it before mounting both disks simultaneously.
Potential for Corruption: If the source disk is active during dd, the copy may be inconsistent, especially for BTRFS’s complex metadata.


Recommendations to Optimize btrfs send/receive
To address the slow performance of btrfs send/receive and make incremental backups faster, try the following:

Optimize rsnapshot Configuration:

Reduce Hard Links: If possible, adjust rsnapshot to reduce the number of hard-linked files or directories per backup rotation. For example, limit the number of retained snapshots or use rsnapshot’s rsync --link-dest option more efficiently to minimize metadata changes.
Use BTRFS Snapshots Instead of rsnapshot: Since you’re using BTRFS, consider replacing rsnapshot with native BTRFS snapshots for backups. BTRFS snapshots are more efficient (they share data via CoW) and integrate better with btrfs send/receive. Tools like btrbk or snapper can automate snapshot creation and sending:
btrbk run
btrbk is optimized for incremental btrfs send/receive and can significantly reduce metadata overhead compared to rsnapshot.


Defragment the Source Filesystem:

Fragmentation can slow down btrfs send/receive. Before running send, defragment the source subvolume or snapshot:
btrfs filesystem defrag -r /source/subvolume
Note: Defragmenting can break CoW sharing for snapshots, so run it only on a snapshot you plan to send, and avoid defragmenting the live filesystem if you rely on snapshots.


Ensure Proper Parent Snapshot Setup:

Verify that the parent snapshot used with -p exists on both the source and target filesystems and is identical (not modified). For example:
btrfs send -p /source/parent_snapshot /source/current_snapshot | btrfs receive /target/backup

If the parent snapshot is missing or inconsistent, btrfs send may process more data than necessary. Use btrfs subvolume show to check snapshot details.


Use Compression:

Enable compression for btrfs send/receive to reduce the amount of data transferred, especially if your data is compressible (e.g., text-based files). Pipe the send stream through a compression tool like zstd:
btrfs send /source/snapshot | zstd | ssh user@remote 'zstd -d | btrfs receive /target/backup'
This can reduce transfer time if the bottleneck is I/O rather than CPU.


Parallelize Transfers:

If your backup consists of multiple subvolumes, run btrfs send/receive for each subvolume in parallel to utilize multiple CPU cores or disk I/O bandwidth. For example:
btrfs send /source/subvol1 | btrfs receive /target/backup1 &
btrfs send /source/subvol2 | btrfs receive /target/backup2 &
Be cautious with parallelization, as it may overload your disk or USB interface.


Use Faster Hardware:

Faster CPU: If your CPU is a bottleneck, upgrading to a faster processor or offloading the task to a more powerful system can help.
Faster Connection: If the external disk is connected via USB 3.0, ensure the cable and port support full USB 3.0 speeds (up to 5 Gbps). Switching to a USB 3.1/3.2 or Thunderbolt connection (if available) can further improve performance.
Internal Disk: If possible, use an internal disk as the target for btrfs receive to avoid USB bottlenecks, then copy the data to the external disk.


Use btrbk or snap-sync for Automation:

Tools like btrbk or snap-sync are designed to optimize btrfs send/receive for backups. They handle snapshot management, incremental sends, and cleanup automatically, reducing metadata overhead and ensuring consistency. Example with btrbk:
btrbk --source /source --target /target run
These tools are particularly effective for rsnapshot-style backups on BTRFS.


Consider Alternative Tools for Large Initial Transfers:

Since the initial btrfs send/receive is slow, you could use dd for the initial full backup to create a BTRFS filesystem on the target disk, then use btrfs send/receive with -p for subsequent incremental updates. Steps:

Copy the entire disk with dd:
dd if=/dev/source_disk of=/dev/target_disk bs=4M status=progress

Change the UUID of the target disk:
btrfstune -u /dev/target_disk

Mount the target filesystem and use btrfs send -p for incremental updates.
This hybrid approach leverages dd’s speed for the initial copy and btrfs send/receive’s efficiency for incremental backups.






Why dd Isn’t a Long-Term Solution
While dd was faster for your initial copy, it’s not ideal for ongoing backups:

No Incremental Support: Each dd operation copies the entire 8 TB, even if only a small amount of data changed, making it impractical for frequent backups.
Wastes Space: dd copies unused space (1.6 TB in your case), whereas btrfs send/receive only transfers used data.
Risk of Inconsistency: If the source disk is active during dd, the copy may be inconsistent, especially for BTRFS metadata.
No Subvolume/Snapshot Awareness: dd doesn’t preserve BTRFS-specific features like subvolumes or snapshots, which are critical for rsnapshot-style backups.


Conclusion
The slow performance of btrfs send/receive (both initial and incremental) is primarily due to:

Metadata Overhead: rsnapshot’s use of hard links creates a large number of metadata entries, which btrfs send must process individually.
Single-Threaded Processing: btrfs send doesn’t fully utilize modern hardware, making it slower than dd’s raw block copy.
Incremental Send Issues: The -p option didn’t save time because rsnapshot likely introduced significant metadata changes (e.g., new directories or hard links), causing btrfs send to process nearly as much metadata as a full send.

To improve performance:

Switch from rsnapshot to native BTRFS snapshots with tools like btrbk or snap-sync to reduce metadata overhead.
Defragment the source filesystem before sending.
Use compression or parallelization to optimize data transfer.
Consider a hybrid approach: use dd for the initial copy and btrfs send/receive for incremental updates.

If you want to stick with rsnapshot and btrfs send/receive, try reducing the number of hard-linked snapshots in rsnapshot and verify that the parent snapshot is correctly set up for incremental sends. If you share more details about your setup (e.g., number of files, snapshot frequency, hardware specs, or whether the disks are USB), I can provide more tailored optimizations!