This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
Additional paths will be inserted into the search path used for executables on systems which have a compatible CPU. Those additional paths will mirror the AMD64 “microarchitecture levels” supported by the glibc-hwcaps mechanism:
x86_64-v4. Systemd will be modified to insert the additional directories into the
$PATH environment variable (affecting all programs on the system) and the equivalent internal mechanism in
systemd (affecting what executables are used by services). Individual packages can provide optimized libraries via the glibc-hwcaps mechanism and optimized executables via the extended search path. This optimized code will be used if the CPU supports it. Which packages provide the optimized code and at which level will be made by individual package maintainers based on benchmark results.
NOTE: I’m writing and filling this proposal on the last day allowed for system-wide proposals. It is too large for one person. If you are interested, please let me know or even add yourself to the list of Owners. I would love to have more people working on this.
Fedora binaries for the AMD64 architecture are compiled with code-generation flags that support almost all CPU variants. But newer generations of processors gained additional instructions that may be used to generate faster code. A vendor-independent x86-64 psABI supplement defines four “microachitecture levels”:
x86-64-v1 (the baseline, our code targets this),
SSE3, CentoOS targets this),
AVX512) . When code is compiled for a higher microarchitecture level it will crash (with
SIGILL, “illegal instruction”) on CPUs which do not support it. Benchmark results show small differences in performance: usually in the range from -5% to 10%, with no discernible difference for most code, but some applications benefit, with gains of 120% in some benchmarks [e.g. 2, 4].
Over the years, various people have expressed interest in raising the required microarchitecture levels. But we have been very conservative in making changes, because support is missing in many older CPUs that are still in use, and in fact, even in some CPUs produced and sold today. By raising the required level we would make Fedora completely unusable on many machines. It also seems that recompiling all packages with the changed options would largely be a waste of resources, because for most code it makes no difference. But for some of the numerical or cryptographic code there are noticeable gains and it seems to be worth the effort to provide optimized code. This also makes Fedora more attractive to people interested in optimization.
The dynamic linker already has the
glibc-hwcaps mechanism to load optimized implementations of shared objects . This means that packages can provide optimized libraries and they linker will be automatically load them from separate directories if appropriate. (For AMD64, this is
To extend the glibc-hwcaps mechanism to executables,
systemd will be modified to extend the search path with appropriate directories. When started, it will check the CPU capabilities and modify the executable search path it has internally and which is also used to set
$PATH for services. (For AMD64,
Note: the ELF format provides the IFUNC mechanism to dynamically select a variant of a function (symbol) when an executable is loaded . This is in particular used to load code using specific CPU instructions when those are supported. This mechanism is both more general (because it allows arbitrary selection criteria), more fine-grained (because there can be other variants than just a few fixed microarchitecture levels), and more efficient (becuase only the parts of the code that benefit from this need to be provided in multiple variants). In particular, glibc already makes extensive use of this to provide optimized code, which is then widely used by other libraries and programs. This means that even though we compile code in a way where the lowest baseline is supported, modern CPU instructions are already widely used. This is one of the reasons why compiling for a higher baseline often doesn’t make any difference in benchmarks. The IFUNC mechanism or an equivalent mechanism should generally be preferred. Nevertheless, that needs to be implented in the program or library itself, which is not trivial. The two mechanisms in this Proposal are intended for the packages which do not support IFUNCs or some other equivalent mechanism.
 SUSE Hack Week: Support glibc-hwcaps and micro-architecture package generation
 rfcs/0002-march.rst · master · Arch Linux / RFCs · GitLab
 The GNU C Library version 2.33 is now available
 CentOS ISA SIG Performance Investigation – Blog.CentOS.org
 GNU Indirect Function and x86 ELF ABIs | jasoncc.github.io
Glibc-hwcaps together with the new feature in systemd provide a generic mechanism. It will be up to individual packages to actually provide code which makes use of it. Individual package maintainers are encouraged to benchmark their packages after recompilation, and provide the optimized variants if useful. (I.e. the code in question is measureably faster and the program is ran often enough for this to make a difference.)
The Change Owners will implement the packaging changes for a few packages while developing the general mechanism and will submit those as pull requests. Other maintainers are asked to do the same for their packages.
Optimized variants of programs and libraries MAY be packaged in a separate subpackage. The general packaging rules should be applied, i.e. a separate package or packages SHOULD be created if it is files are large enough.
Available benchmark results [2,4] are narrow and not very convincing. We should plan an evaluation of results after one release. If it turns out that the real gains are too small, we can scrap the effort. On the other hand, we should also consider other architectures. For example, microarchitecture levels
ppc64le. Other architectures are not included in this Change Proposal to reduce its scope.
The developers who are interested in this kind of optimization work can perform it within Fedora, without having to build separate repositories. The users who have the appropriate hardware will gain performance benefits. Faster code is also more energy-efficient. The change will be automatic and transparent to users.
Note that other distributions use higher microarchitecture levels. For example RHEL 9 uses x86-64-v2 as the baseline, RHEL 10 will use x86-64-v3, and other distros provide optimized variants (OpenSUSE, Arch Linux). We implement the same change in Fedora in a way that is scoped more narrowly, but should provide the same performance and energy benefits.
- Extend systemd to set the executable search path using the same criteria as the dynamic linker.
- Implement packaging changes for at least one package with a library and at least one package with executables and submit this as pull requests.
- Provide a pull request for the Packaging Guidelines to describe the changes listed in Description above.
- Do benchmarking and implement packaging changes for other packages if beneficial.
Release engineering: #11864
Policies and guidelines: TBD.
Trademark approval: N/A (not needed for this Change)
Alignment with Community Initiatives:
/usr/bin/ld.so --helpto check which hwcaps are supported by the system.
- Install one or more packages which provide optimized code.
- Restart the system or re-login to reinitialize
- Check that appropriate directories are present in
- Run some benchmarks and check that the optimized code is indeed faster.
There should be no impact for users. If they optimized code is available and installed for their hardware, various tasks may finish faster and use less energy.
- Contingency mechanism: Undo the changes in packages which introduced them and recompile.
- Contingency deadline: Any time.
- Blocks release? No.
Packages which benefit from being compiled for higher AMD64 microarchitecture levels (
x86_64-v4) are now provided with optimized variants which will be used automatically on appropriate CPUs. This includes: TBD1, TBD2, TBD3.