My most recent DNF run actually reported significant savings due to the use of DeltaRPMs, thanks in large part to many texlive
packages that compress well and include a lot of files that don’t change from build to build. But that experience is far from the norm, and it got me thinking about the utility of DeltaRPMs in general.
First I whipped up some stats, surveying all of the /var/log/dnf.log
files I happen to have laying around on my machines. (In both cases, that comprised the last 2-4 months of logs, plus some older rotations from 2019 that never got reaped by logrotate for reasons.)
On my Fedora 32 desktop:
Date Before (MB) After (MB) Saved
--------------------------------------------
2019-07-01 1047.10 1002.70 4.1%
2019-07-05 30.40 29.60 2.1%
2019-07-05 106.20 106.10 0.1%
2019-07-06 132.60 86.30 34.1%
2019-07-09 93.60 92.70 0.1%
2019-07-09 46.80 39.80 14.1%
2019-07-10 1.30 0.40 68.1%
2019-07-13 125.10 121.40 2.1%
2019-07-14 32.60 5.40 83.1%
2019-07-15 101.80 68.80 32.1%
2019-07-26 1080.70 1064.00 1.1%
2020-10-09 238.80 221.80 7.1%
2020-10-11 20.00 10.60 47.1%
2020-10-21 1497.60 1496.60 0.1%
2020-10-27 446.90 434.80 2.1%
2020-11-02 246.80 203.40 17.1%
2020-11-05 522.00 517.40 0.1%
2020-11-08 302.00 300.20 0.1%
2020-11-19 203.50 197.50 2.1%
2020-11-21 570.80 570.30 0.1%
2020-11-23 119.50 104.60 12.1%
2020-11-24 2.20 1.90 13.1%
2020-11-26 112.30 117.00 -4.1%
2020-11-29 93.70 86.20 8.1%
2020-12-02 397.70 396.80 0.1%
2020-12-04 428.20 212.60 50.1%
And on my Fedora 33 fileserver:
Date Before (MB) After (MB) Saved
--------------------------------------------
2019-07-09 203.30 203.00 0.1%
2019-07-13 253.00 252.90 0.1%
2019-07-14 12.30 4.40 63.1%
2019-07-15 73.10 59.90 17.1%
2019-07-26 914.90 914.30 0.1%
2020-06-08 124.00 120.80 2.1%
2020-06-15 336.70 331.80 1.1%
2020-06-16 96.40 82.50 14.1%
2020-06-22 172.80 166.40 3.1%
2020-06-22 66.50 65.50 1.1%
2020-06-24 129.00 122.60 4.1%
2020-06-27 6.40 4.90 23.1%
2020-07-02 157.20 149.40 4.1%
2020-07-02 157.20 157.80 0.1%
2020-07-12 526.80 509.80 3.1%
2020-07-17 566.50 529.60 6.1%
2020-07-18 25.70 13.70 46.1%
2020-08-06 634.10 616.40 2.1%
2020-08-09 156.30 122.50 21.1%
2020-08-10 139.10 138.90 0.1%
2020-08-12 346.70 346.70 0.1%
2020-08-16 127.70 126.00 1.1%
2020-08-20 302.40 264.90 12.1%
2020-08-26 529.90 505.10 4.1%
2020-08-28 36.00 29.60 17.1%
2020-08-29 7.60 0.40 94.1%
2020-09-02 173.30 142.10 17.1%
2020-09-04 115.90 115.00 0.1%
2020-09-11 704.50 695.30 1.1%
2020-09-22 547.80 508.40 7.1%
2020-09-24 124.00 107.20 13.1%
2020-10-08 695.20 693.00 0.1%
2020-10-08 17.00 9.40 44.1%
2020-10-11 340.00 332.10 2.1%
2020-10-12 4.70 1.00 78.1%
2020-10-21 856.90 813.00 5.1%
2020-10-26 263.60 263.30 0.1%
2020-11-05 614.10 610.30 0.1%
2020-11-08 84.40 82.50 2.1%
2020-11-10 100.30 90.30 9.1%
2020-11-21 65.80 65.60 0.1%
2020-11-23 180.10 178.60 0.1%
2020-11-24 6.10 3.90 35.1%
2020-11-26 74.30 74.30 0.1%
2020-11-29 469.70 452.30 3.1%
The stats paint a bad enough picture all by themselves: Most of the time, the savings from using deltarpms is negligible, and even occasionally negative due to rebuild failures.
But what the stats don’t show is the amount of time and CPU cycles consumed by the use of deltas, which is significant.
For those who don’t know, here’s a quick sketch of the way deltarpms work:
- After a new RPM is generated, it gets compared with one or several previous RPMs for that same package.
- For each previous RPM, any files that were unchanged from that previous package get filtered out of the new package. The remaining (changed) files are packaged into a
.drpm
, which is specific to both the new package version and the previous version it was diffed with. - When you run
dnf upgrade
, DNF looks at all of the current versions of your packages that are being upgraded from, and if there’s a deltaRPM between that version and the version being installed, that gets downloaded instead of the full RPM - (Here’s the really damning part) When DNF receives a deltaRPM, before it can install the package it has to recreate the full RPM to be installed. It does this by grabbing the installed files on your system (from the previous RPM) and combining them with the files in the downloaded
.drpm
. That gets it the.rpm
for the updated version, which can then be installed. - If any of the local files don’t match the contents of the previous RPM (say, they included a file that you modified locally), then the reconstruction of the full RPM fails and DNF is forced to abandon the DeltaRPM and download the full version.
Steps 4-5 are where things go really bad. Step 4 takes a LONG time, comparatively, even when there are no problems. (If you go into step 5, then you’re hosed and all chance of saving any bandwidth is pretty much out the window.) But even when all of the reassemblies succeed, they are a highly CPU- and disk-intensive process that drags the performance of DNF down considerably.
DeltaRPMs are basically a tradeoff between network bandwidth and local time/cycles: In order to download less data from the network, your system has to reconstruct the RPMs to be installed, which requires significantly more work (that takes significantly more time).
And the more I think about it, the more backwards that tradeoff seems. Network bandwidth is cheap, over my broadband service downloading a few extra megabytes (or even a few hundred) takes at most a few seconds and costs nothing. So, what exactly is the advantage to piling all that extra complexity and delay into the dnf upgrade
process, just for the possibility of slightly smaller downloads? That’s kind of a terrible deal any way you slice it.