Hello.
I wanted to let you know that we’re actively working on building LLVM (clang, llvm, etc.) with PGO for Fedora. I can totally relate to problems with long build times or simply long feedback cycles due to many recompilations. But in the end I think it will be worth it.
We’re seeing a 22% performance improvement in compile time on rawhide on x86_64
.
I’ve learned a lot when doing my PGO-experiment with the still unmerged LLVM packages last year. Some of the things are worthless but it’s good to see that the performance improvement saw some gain from 9,7% to ~22%. Also, everything is much more streamlined nowadays and we can do the full PGO pipeline in one build instead of requiring multiple Copr projects with different sets of build conditions and complicated profile merging in a background process.
Workload
Currently we use the llvm-test-suite
as workload. You can look around in my branch and look for places in which the pgo
build condition is used. I must emphasize that we’re not yet done and we still need some thorough testing in order to land this.
Build times
Here’s a screenshot of build times on fedora-rawhide-x86_64
for (a) the old (unmerged) llvm packages until around March 2024. Then we see a new line showing up (b) that shows the build times of the merged llvm packages. The PGO build times (c) are rendered in the right most side. The bottom picture shows these PGO build times in more detail. Don’t be fooled by the drop in compile time around November 2023. That is when we started using high-performance x86_64
builders on copr.
Performance comparison
During each build we run a performance comparison of the system clang
against the just built PGOed clang
. And that is where we see the 22% increase in compile performance. That said, each of our build logs (e.g. this failing one) now contains a section like the following.
Result of Performance comparison between system and PGOed clang
+ echo 'Result of Performance comparison between system and PGOed clang'
+ cat /builddir/build/BUILD/llvm-20.0.0_pre20241029.g757d0e4764fffc-build/performance-of-pgoed-clang/results-system-vs-pgo.txt
Tests: 7
Metric: compile_time
Program compile_time
19.1.0 pgo-20.0.0~pre20241029.g757d0e4764fffc diff
tramp3d-v4/tramp3d-v4 18.81 16.41 -12.8%
mafft/pairlocalalign 6.89 5.54 -19.5%
sqlite3/sqlite3 8.19 6.56 -19.9%
consumer-typeset/consumer-typeset 6.56 5.21 -20.6%
SPASS/SPASS 10.02 7.88 -21.4%
Bullet/bullet 19.58 14.95 -23.7%
kimwitu++/kc 9.83 7.36 -25.2%
Geomean difference -20.5%
When you install a PGOed clang we will also install these two files:
/usr/share/llvm-pgo.profdata
/usr/share/results-system-vs-pgo.txt
Both files are purely informative. The first one is the merged PGO profile (~15 Megabytes). Right now we’re regenerating this data on every build. The second contains the output from performance comparison above.
The build log file also contains this section which lists the Top 10 functions with the largest internal block counts:
+ /builddir/build/BUILD/llvm-20.0.0_pre20241029.g757d0e4764fffc-build/bootstrapped-llvm/bin/llvm-profdata show --topn=10 /builddir/build/BUILD/llvm-20.0.0_pre20241029.g757d0e4764fffc-build/pgo-O3-profiles/O3.cmake.profdata
+ /builddir/build/BUILD/llvm-20.0.0_pre20241029.g757d0e4764fffc-build/bootstrapped-llvm/bin/llvm-cxxfilt
Instrumentation level: IR entry_first = 0
Total functions: 39991
Maximum function count: 6259866296
Maximum internal block count: 6255990084
Top 10 functions with the largest internal block counts:
llvm::SUnit::addPred(llvm::SDep const&, bool), max count = 6259866296
llvm::hash_code llvm::hash_combine<unsigned int, unsigned long>(unsigned int const&, unsigned long const&), max count = 3488439024
llvm::SmallVectorTemplateBase<unsigned int, true>::push_back(unsigned int), max count = 2960419755
llvm::hash_code llvm::hash_combine<llvm::MachineOperand::MachineOperandType, unsigned int, unsigned int, bool>(llvm::MachineOperand::MachineOperandType const&, unsigned int const&, unsigned int const&, bool const&), max count = 2625039396
llvm::SmallPtrSetImplBase::insert_imp(void const*), max count = 1747693109
llvm::APInt::APInt(unsigned int, unsigned long, bool, bool), max count = 1497545994
llvm::SmallPtrSetImplBase::find_imp(void const*) const, max count = 1226146744
llvm::APInt::operator=(llvm::APInt&&), max count = 799644853
llvm::hash_code llvm::hash_combine<llvm::MachineOperand::MachineOperandType, unsigned int, long>(llvm::MachineOperand::MachineOperandType const&, unsigned int const&, long const&), max count = 717543351
llvm::hash_code llvm::hash_combine<unsigned int, llvm::Type*, llvm::hash_code>(unsigned int const&, llvm::Type* const&, llvm::hash_code const&), max count = 635070555
When you’ve installed a PGOed clang you can as well do this inspection by running:
$ llvm-profdata show --topn=10 /usr/share/llvm-pgo.profdata | llvm-cxxfilt
Outlook
Let’s hope we can finish this quickly so everybody using clang
in Fedora (>=f41
) can benefit from this performance improvement. Of course, there’s other work related stuff that requires my attention.
Cheers