FatELF project is dead and the proposed kernel patches were dismissed with prejudice.
This change has been accepted by FESCo for Fedora Linux 42. A full list of approved changes to date can be found on the Change Set Page.
To find out more about how our changes policy works, please visit our docs site.
Hi! Took an insterest of this just today. Is there any Fedora package, which already provides any optimized library using described glibc-hwcaps? I have not found anything by command dnf repoquery --whatprovides '*x86-64-v*'
. But the change mentions there already is solution for libraries and only binaries have no working solution. Yet on my rawhide instance I failed to find any package testing this or having any library provided. Have I used wrong names?
I somehow miss explanation how the programs should be built. Found some article about OpenSUSE way at Microarchitecture rpm macros ā Antonio Larrosa. Does that mean every package should try measuring differences and if it indeed brings singnificant improvement, then packager is now on their own to ship multiple builds somehow? But not even a single example is linked from the wiki page, is that intentional? Will this feature land in F42? I havenāt found any attempt myself.
We provide a POWER10 build of glibc: glibc-2.41-3.fc42.ppc64le | RPM Info | koji
Interesting. glibc was one of packages I tried listing, whether it already has such support. But have not found anything on x86_64 platform. I have not tried alternative platforms, especially because the change itself contains only AMD64. Is there a specific reason, why it were only attempted at ppc64 platform in glibc, but not (yet) on Intel platform? Were there any measured improvements at POWER10 optimized code?
What is the story for these packages? Iām asking because on my fedora-iot install,
# find /usr/bin/ -type l | wc -l
195
There are currently 195 packages, including gtar, xzcat, etc.
The symlink to the proxy-binary is managed by package maintainers when building the package i assume? So those packages can not participate in this, unless they change their build system to create multiple binaries instead of relying on symlinks?
Iām not familiar w/ Fedoras package guides, but are there already tests in CI happening in case maintains opt-in but accidentally break certain binaries that use symlinks? Would be nice to ensure this can never break at users.
In general, i think this is a great proposal, but iām not a fan that every time an executable is run this check needs to happen. This seems to be fundamentally wasteful to me, given this will happen at every binary invocation which seems to be unnecessary given that CPU hardware in my cases only change when i power off a system.
IMHO Ideally there should be two ways:
-
systems that might require āonlineā CPU changes: i would consider maintainers of such systems should either opt-out of this optimization at all (as itās probably very hard to guarantee to work) or āopt-inā to dynamic checking via the proxy binary.
-
systems that do not require āonlineā cpu changes (which, i assume, would be most systems out there) should have done this optimization only when changes happen. Changes to CPU hardware is only supported at with powered off systems.
For a āstaticā system, is it possible to consider something like the following?
Packages that want to support optimizations are looking like this:
/usr/bin/foo (standard installation, maybe packages can already symlink this to the v0 version by default)
$optimized_binaries_path/bin/foo/v0/foo (required, for downgrades)
$optimized_binaries_path/bin/foo/v1/foo (optional)
$optimized_binaries_path/bin/foo/v2/foo (optional)
$optimized_binaries_path/bin/foo/v3/foo (optional)
$optimized_binaries_path/bin/foo/v4/foo (optional)
Fedora ships the following:
-
A systemd target is provided āoptimize-binaries.targetā. Administrators can opt-out of optimizations happening automatically by disabling or ensure optimization can not be done by masking the target. This indicates that the sysadmin might anticipate problems, like āonline changesā to this installation. By default this could maybe
Require=usr.mount After=usr.mount
) to run as early as possible and maybe even beWantedBy=local-fs.target
. -
A systemd service
determine-binary-optimization-level.service
that dumps āv1ā or āv2ā or⦠somewhere in the filesystem, for example /etc/binary-optimization-level. This unit can be configred by drop-in files (/etc/determine-binary-optimization-level.d/*.conf
) to set the maximum level the admin decides. so can also be used lock in v2 even though the service would discover v4. This would be interesting for the use-case mentioned here that not all higher sets contain all lower sets for some intel CPUs etc. Fedora also has the possibility to later ship drop-in files that are limit the ceiling for systems with certain CPUs or something like that. -
A systemd service is provided
optimize-binary@.service
which can be activate likeoptimize-binary@usr-bin-foo
: This symlinks the file to the highest supported version (the on in/etc/binary-optimization-level
). Can also be pointed to a directory and will recursively call itself for all files in given directory. If done, maybe it can dump a file like/var/lock/optimize-binary/%f.lock
that we use as a condition to not run the service twice. So this would create a/var/lock/optimize-binary/usr.lock
after first boot and for example a/var/lock/optimize-binary/usr-newfilefrominstallation.lock
for optimizations that happen for binaries installed at a later point. -
A systemd path file āoptimize-binaries@.pathā file (that is
Require=optimize.binaries.target After=optimize-binaries.target
, so it wonāt activate in case the target is masked and optimization is disabled) that watches (DirectoryNotEmpty=%f
) that activates the optimization (Unit=optimize-binary@%i.service
). Fedora could by default enable aoptimize-binaries@usr-bin.service
to automatically optimize binaries installed at a later point in time or that are upgraded (i.e. overwrite the symlinks) -
A systemd service ārefresh-optimizationā that deletes the lock-files (and maybe also directly invokes the optimize-binary@usr-bin.service` or something like that) so have a command to re-generate.
Maybe this can also automatically called if ``determine-binary-optimization-level.service` detects changes to automatically up- or downgrade accordingly in boots that detect CPUs with lower/higher capabilities.
(Disclaimer: Iām not super familiar with building such setups in systemd, but i think it should be possible).
My thoughts about this approach:
- potential for timing issues: i.e. installation & invocation happens faster than the optimization service has relinked to a better version. IMHO this is acceptable. a restart of the service would pick the optimized version up.
- optimize-binaries.target could be configured to block some other target like
multi-user.target
?basic.target
?boot-complete.target
?) so we ensure most optimization happens before daemons or users invoke /usr/ binaries. This would only affect first boots - behavior of the path monitoring would need to be tested, esp. in context of package installs/uninstalls/upgrades (i.e. if a symlink is overwritten to the v0 binary again).
- the activation of
optimize-binary@.service
should check and wait for lock files of dnf. Iām unsure how this setup would work in atomic distros tbh. - I havenāt caught up to all discussions to the previous proposal, so iām sorry if this was already deemed not-feasable for one reason or the other (for example i donāt do fancy VM stuff where i unfreeze VMs on different hosts or whatever (again, imho admins that require this should consider opting out of running optimized binaries completely).
Why to do that at runtime during each run? Usually the CPU isnāt replaced, at least from generation X to generation Y. Usually people just buy a new computer and install the system from scratch. Maybe a better solution is making sym or hard links at the package installation/update time according to CPU with an optional ability to tune it later manually? Also think about small utility programs like awk, grep, sed, etc. that are used in shell scripts which may run them many times during one script run. Checking and rechecking CPU capabilities during each run of those utility programs could bring additional delays or even general performance degradation of such scripts.
I would expect I should be able to query packages depending on hwcaps-loader
package or alternative, but I have not found even review bug for such package (yet?). Is there any packaged loader already? From the proposal I expect having CPU optimized binaries should be exceptional for few specific binaries, where it were validated it actually helps. If the loader should be lightweight enough and trivial code as is assumed, minimal overhead should be okay. Executing new binary is expensive and quickly re-executed optimized command would likely not be performance bottleneck on well designed software. I admit maintaining multiple variants might be worth just couple of CPU ticks per new process.
Anyway, it were not updated anywhere in wiki page, but tracking bug mentions it should be moved to f43. Clearly there were not enough traction for F42 and not yet even current Rawhide.
Is there any experimental copr repo trying this at least?
There are too many cases where the CPU may change from boot to boot.
Possible for cloud hosting where the same image is used on a range of CPUs.
And in my case I install onto a removable SSD that is booted on a range of hardware.
Iām not too-into a feature implemented on a desktop OS that primarily benefits cloud.
Yeah really not into that.
I install kde plasma on a removable drive as my rescue disk.
Thatās a desktop use case.
Since at least 2016 Iāve only taken a single HDD with Linux installed and moved it to a different computer with different CPU, and really only did that just to see if itād work before wiping it
That doesnāt seem like a typical thing a user does, or at least to any degree worth doing an OS-wide slower unnecessary check on everyone else. Users doing the CPU swaps in unique set-ups should be installing an extra-checker imo.
Even if we presume (and I am not convinced that we should) that only the desktop use case is important, I do not agree that upgrading or even downgrading the CPU in a desktop or moving a hard drive from an old system to a new one after an upgrade or a hardware failure is an exotic use case that can be ignored.
Furthermore, consider that this facility would require nontrivial extra work by the maintainers of each package that wanted to use it ā extra work that would generally be justified by benchmarks and an understanding of how the program is used. This is not some kind of system-wide performance tax thatās going to be paid by every binary no matter how trivial.
Finally, I think it is strange that there is so much focus in this discussion on proposing generally much more complicated and brittle install-time or boot-time approaches to avoid a single level of indirection, not yet benchmarked but likely in the microseconds, for packages that have explicitly opted in. While I always want to avoid wasting CPU time and energy, and while I donāt want to fall victim to the fallacy of āwe must do something, and this is something, so we must do this,ā I also donāt want to see the opportunity to potentially speed up some kinds of useful workloads by minutes derailed by endless redesigning and bikeshedding.
The slow down is of the order is claimed to be of the order of micro seconds on the launch of the program.
In other words the expectation is you will never notice at the human scale.
This starts to feel a bit like the 64 kilobytes of RAM argument.
Why even have all these new CPU generations if weāre never going to use their optimizations? Itās good to be forward looking and forward moving, even if the benefits seem small initially.
The analysis that people from multiple distributions have done shows it is only a limited number of programs that will benefit from using the new instructions. Those programs can be delivered using the mechanism fedora has added.