Hi everyone,
I’m trying to better understand how DNF5 handles parallel downloads internally.
When max_parallel_downloads is set to a higher value (for example, 10), and the transaction includes a mix of very small and very large packages, such as:
10 KB
30 KB
50 KB
50 MB
100 MB
500 MB
how does DNF decide which packages to download first?
From a scheduling perspective, I was wondering whether DNF/librepo applies any size-aware or progress-aware heuristics, such as:
Completing very small packages first
Preferring packages with the smallest remaining size
Allowing near-complete large downloads to finish before starting new ones
Or does it simply queue downloads in dependency/order resolution sequence and let parallelism handle the rest?
I understand that total download time is fundamentally limited by available bandwidth and TCP congestion control, so this question is not about changing bandwidth allocation at the kernel level. I’m more curious about the application-level scheduling strategy inside DNF/librepo.
The reason I’m asking is from a perceived responsiveness standpoint — finishing smaller packages early might free slots sooner and potentially reduce average completion time.
Has this been benchmarked before? And is there documentation explaining how DNF5 currently schedules parallel downloads?
Thanks in advance — I’m mainly trying to understand the internal behavior better.
I think it simply picks the downloads from a list and fill up its download threads.
The order does not matter as DNF will not do anything with the RPMs until all of them are downloaded.
To find out what DNF does in detail you would need to read the DNF5 sources.
As I said in the other thread I do not think you can improve the download performance using your ideas.
If you modify DNF to use you algorithm you can then benchmark your DNF against the original and see if your changes make any impact.
This will increase the completion time.
The worst case will be when the last file to download is the largest file.
Working from largest to smallest may give you a advantage.
You need to model that against files in a random order to see if its worth the complexity in the code.
Your best improvement of the DNF experience will be utilising the highest bandwidth mirror, considering your location, matched to your own download speed.