"Unify /usr/bin and /usr/sbin" breaks Pacemaker cluster resource manager

Background: Fedora Rawhide has unified /usr/bin and /usr/sbin. An upgrade from F41 keeps the directories separate, while I presume that a fresh installation would have a single directory.

I’m a maintainer of the Pacemaker cluster resource manager.

If you insist on going through with this merge of directories, then Pacemaker must override it within our spec file, and continue using /usr/sbin as our default _sbindir.

How would you recommend we do this override, without introducing corner cases when building Pacemaker on other distros and older Fedora releases? I see that macros.filesystem uses %if "%{_sbindir}" == "%{_bindir}" as the check. I can envision a corner case in which a user has explicitly overridden both of them to be equal, or where this happens on some obscure other distro that I’m not aware of. I guess we could check for equality and then also compare against the string /usr/bin or %{_exec_prefix}/bin, and then set _sbindir to /usr/sbin… but again, if a user has explicitly specified _sbindir = /usr/bin, I would want to respect that. Likewise, I’m not sure what I’d want to do if _exec_prefix has been user-specified.

We’re aware of two ways this breaks Pacemaker so far.

  1. It causes resources to restart unnecessarily. I can provide more details about the mechanism upon request, but it’s complicated. Suffice it to say that part of the cluster resource configuration is generated on the fly; one resource parameter value is based on the sbindir configure constant; and any change to a resource parameter causes a checksum change, which forces the resource to restart. This is deeply embedded in Pacemaker’s design, and there are reasons for it. It’s not feasible to change the design, especially because of compatibility issues.

Now, this is a problem because the purpose of the software is to maintain high availability. In particular:
a. It would cause an additional resource restart during rolling upgrades from pre-Fedora 42 Pacemaker builds.
b. It breaks Pacemaker builds on Fedora Rawhide (because it breaks several regression tests, which check whether restarts would occur under certain circumstances).

  1. It breaks Pacemaker guest nodes by default. Pacemaker allows the configuration of docker/podman containers as “guest nodes” within “bundle resources.” We support basically arbitrary OS distros, OS versions, and Pacemaker versions within the container. By default, when Pacemaker starts a container, it runs /usr/sbin/pacemaker-remoted (based on the sbindir configure variable) on the container. But now, if the host has been upgraded to Fedora Rawhide, it tries to run /usr/bin/pacemaker-remoted on the container. Nothing exists at that path on the container, so the guest node fails to start. We can’t just toss a fix into the pacemaker-remote installation, because we support running older versions (almost arbitrarily old) on the containers.

We may find still more things that break; this is what we’ve seen so far.

I appreciate any guidance here, thank you.

I would have replied to F40 Change Proposal: Unify /usr/bin and /usr/sbin (System-Wide) - #30, where my colleague Michal reported breakage of pcs (a management tool for Pacemaker). However, that topic has been closed and I cannot reply.

Please remove any irrelevant tags and add any further ones needed.

@zbyszek ping

There will be a symlink so that /usr/sbin/xxx will still work.
With that in place do you still have a problem?

There is a lot to unpack here :slight_smile:

As you observe, the “host” and “guest” might have different configuration, so one cannot assume that if the host was configured with some value of %_sbindir, all of the guests were configured with the same value (or even use rpm). This was always the case, independently of the sbin-merge, so code that assumed that the path is identical everywhere was always potentially broken. The usual solution for such cross-system cases is to define a fixed path (or a set of possible paths and use the first found).

I’m confused by that initial set of questions, because one the one hand you expect everybody to use a fixed path, but at the same time you expect to cover cases where the user or some foreign distro changes the paths arbitrarily.

The sbin-merge is done in a way where the old paths remain valid during and after the merge. So /usr/sbin/pacemaker-remoted, /sbin/pacemaker-remoted , /bin/pacemaker-remoted , and /usr/bin/pacemaker-remoted are all valid. To retain compatibility with merged and split-sbin systems, /usr/sbin/pacemaker-remoted can be used, independently of how %_sbindir is defined in the build.

This would create a problem, because we need all packages to move their binaries from /usr/sbin/ to /usr/bin so that we can get rid of the former.

Yeah, so I think that it isn’t useful to look at the configuration value on the “host”. Just use a fixed path to call the binary on “foreign” systems.

I have a similar issue with zerotier-one, it’s all hardcoded paths.
I am going to retire/orphan the package as I have no intention to fix the hardcoded paths.

[leigh@mpd-pc zerotier-one master]$ grep /usr/sbin -r
ZeroTierOne-1.14.2/pkg/qnap/zerotier/package_routines:	rm -rf /usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/pkg/qnap/zerotier/shared/zerotier.sh:    ln -s $QPKG_ROOT/zerotier-one /usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/pkg/synology/dsm7-docker/Dockerfile:COPY --from=builder /src/zerotier-one /usr/sbin/
ZeroTierOne-1.14.2/pkg/synology/dsm7-docker/Dockerfile:  && ln -s /usr/sbin/zerotier-one /usr/sbin/zerotier-idtool \
ZeroTierOne-1.14.2/pkg/synology/dsm7-docker/Dockerfile:  && ln -s /usr/sbin/zerotier-one /usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/pkg/wd/zerotier/clean.sh:rm -f /usr/sbin/zerotier-one  2> /dev/null
ZeroTierOne-1.14.2/pkg/wd/zerotier/clean.sh:rm -f /usr/sbin/zerotier-cli  2> /dev/null
ZeroTierOne-1.14.2/pkg/wd/zerotier/remove.sh:rm -f /usr/sbin/zerotier-one  2> /dev/null
ZeroTierOne-1.14.2/pkg/wd/zerotier/remove.sh:rm -f /usr/sbin/zerotier-cli  2> /dev/null
ZeroTierOne-1.14.2/pkg/wd/zerotier/init.sh:ln -s $install_path/bin/zerotier-one /usr/sbin/zerotier-one
ZeroTierOne-1.14.2/pkg/wd/zerotier/init.sh:ln -s $install_path/bin/zerotier-one /usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:Requires(pre): /usr/sbin/useradd, /usr/bin/getent
ZeroTierOne-1.14.2/zerotier-one.spec:/usr/bin/getent passwd zerotier-one || /usr/sbin/useradd -r -d /var/lib/zerotier-one -s /sbin/nologin zerotier-one
ZeroTierOne-1.14.2/Dockerfile.ci:RUN cp zerotier-one /usr/sbin
ZeroTierOne-1.14.2/Dockerfile.ci:COPY --from=stage /zerotier-one /usr/sbin
ZeroTierOne-1.14.2/Dockerfile.ci:RUN ln -sf /usr/sbin/zerotier-one /usr/sbin/zerotier-idtool
ZeroTierOne-1.14.2/Dockerfile.ci:RUN ln -sf /usr/sbin/zerotier-one /usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/configure:# SysV /etc/install, /usr/sbin/install
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/configure:  /etc/* | /usr/sbin/* | /usr/etc/* | /sbin/* | /usr/afsws/bin/* | \
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/configure:    elif test -x /usr/sbin/sysctl; then
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/configure:      lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax`
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/m4/libtool.m4:    elif test -x /usr/sbin/sysctl; then
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/m4/libtool.m4:      lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax`
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:	    /usr/sbin/$sysctl 2>/dev/null || \
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:		UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $3}'`
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:		UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $4}'`
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:	# According to Compaq, /usr/sbin/psrinfo has been available on
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:	ALPHA_CPU_TYPE=`/usr/sbin/psrinfo -v | sed -n -e 's/^  The alpha \(.*\) processor.*$/\1/p' | head -n 1`
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:	IBM_CPU_ID=`/usr/sbin/lsdev -C -c processor -S available | sed 1q | awk '{ print $1 }'`
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:	if /usr/sbin/lsattr -El ${IBM_CPU_ID} | grep ' POWER' >/dev/null 2>&1; then
ZeroTierOne-1.14.2/ext/libpqxx-7.7.3/config/config.guess:	if [ -x /usr/sbin/sysversion ] ; then
ZeroTierOne-1.14.2/ext/installfiles/mac/get-proxy-settings.sh:export PATH=/bin:/usr/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/sbin
ZeroTierOne-1.14.2/ext/installfiles/mac/launch.sh:export PATH="/Library/Application Support/ZeroTier/One:/bin:/usr/bin:/sbin:/usr/sbin"
ZeroTierOne-1.14.2/ext/installfiles/mac/uninstall.sh:export PATH=/bin:/usr/bin:/sbin:/usr/sbin
ZeroTierOne-1.14.2/ext/installfiles/mac/postinst.sh:export PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin
ZeroTierOne-1.14.2/ext/installfiles/mac/preinst.sh:export PATH=/bin:/usr/bin:/sbin:/usr/sbin
ZeroTierOne-1.14.2/ext/installfiles/linux/zerotier-containerized/Dockerfile:COPY --from=builder /usr/sbin/zerotier-cli /usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/ext/installfiles/linux/zerotier-containerized/Dockerfile:COPY --from=builder /usr/sbin/zerotier-idtool /usr/sbin/zerotier-idtool
ZeroTierOne-1.14.2/ext/installfiles/linux/zerotier-containerized/Dockerfile:COPY --from=builder /usr/sbin/zerotier-one /usr/sbin/zerotier-one
ZeroTierOne-1.14.2/ext/installfiles/linux/zerotier-containerized/main.sh:export PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin
ZeroTierOne-1.14.2/ext/installfiles/linux/zerotier-one.init.rhel6:ZT="/usr/sbin/zerotier-one"
ZeroTierOne-1.14.2/ext/installfiles/mac-update/updater.tmpl.sh:export PATH=/bin:/usr/bin:/sbin:/usr/sbin
ZeroTierOne-1.14.2/debian/postinst:        useradd --system --user-group --home-dir /var/lib/zerotier-one --shell /usr/sbin/nologin --no-create-home zerotier-one
ZeroTierOne-1.14.2/debian/zerotier-one.init:PATH=/bin:/usr/bin:/sbin:/usr/sbin
ZeroTierOne-1.14.2/debian/zerotier-one.init:DAEMON=/usr/sbin/zerotier-one
ZeroTierOne-1.14.2/debian/zerotier-one.upstart:exec /usr/sbin/zerotier-one
ZeroTierOne-1.14.2/debian/zerotier-one.service:ExecStart=/usr/sbin/zerotier-one
ZeroTierOne-1.14.2/RELEASE-NOTES.md: * Fix Debian install scripts to set /usr/sbin/nologin as shell on service user.
ZeroTierOne-1.14.2/make-linux.mk:	mkdir -p $(DESTDIR)/usr/sbin
ZeroTierOne-1.14.2/make-linux.mk:	rm -f $(DESTDIR)/usr/sbin/zerotier-one
ZeroTierOne-1.14.2/make-linux.mk:	cp -f zerotier-one $(DESTDIR)/usr/sbin/zerotier-one
ZeroTierOne-1.14.2/make-linux.mk:	rm -f $(DESTDIR)/usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/make-linux.mk:	rm -f $(DESTDIR)/usr/sbin/zerotier-idtool
ZeroTierOne-1.14.2/make-linux.mk:	ln -s zerotier-one $(DESTDIR)/usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/make-linux.mk:	ln -s zerotier-one $(DESTDIR)/usr/sbin/zerotier-idtool
ZeroTierOne-1.14.2/make-linux.mk:	ln -s ../../../usr/sbin/zerotier-one $(DESTDIR)/var/lib/zerotier-one/zerotier-one
ZeroTierOne-1.14.2/make-linux.mk:	ln -s ../../../usr/sbin/zerotier-one $(DESTDIR)/var/lib/zerotier-one/zerotier-cli
ZeroTierOne-1.14.2/make-linux.mk:	ln -s ../../../usr/sbin/zerotier-one $(DESTDIR)/var/lib/zerotier-one/zerotier-idtool
ZeroTierOne-1.14.2/make-linux.mk:	rm -f $(DESTDIR)/usr/sbin/zerotier-cli
ZeroTierOne-1.14.2/make-linux.mk:	rm -f $(DESTDIR)/usr/sbin/zerotier-idtool
ZeroTierOne-1.14.2/make-linux.mk:	rm -f $(DESTDIR)/usr/sbin/zerotier-one
ZeroTierOne-1.14.2/doc/build.sh:export PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin
ZeroTierOne-1.14.2/entrypoint.sh.release:nohup /usr/sbin/zerotier-one &

@barryascott Yes. The issue is not that accessing the file via /usr/sbin fails, but rather that accessing the file via /usr/bin breaks Pacemaker in at least two ways.

@zbyszek

…one cannot assume that if the host was configured with some value of %_sbindir, all of the guests were configured with the same value (or even use rpm)… code that assumed that the path is identical everywhere was always potentially broken. The usual solution for such cross-system cases is to define a fixed path (or a set of possible paths and use the first found).

I’m confused by that initial set of questions, because one the one hand you expect everybody to use a fixed path, but at the same time you expect to cover cases where the user or some foreign distro changes the paths arbitrarily.

These are all good points. Unfortunately, we’re stuck with quite a lot of baggage for backward compatibility, and it often causes us headaches :wink: We need to try to find a solution such that anything that previously worked continues to work. I’m trying to do that without introducing new corner cases and without missing some obscure case.

To give some more detail: Pacemaker is configured via XML. Users can configure a “bundle” resource. This consists of a container, various supporting configuration (networking, storage, etc.), and an optional cluster-managed resource (“primitive”) to run within the container. At runtime, Pacemaker generates an in-memory configuration element for a docker or podman cluster resource to start/stop/monitor the container. Now, the explicit (e.g., on-disk) configuration has an optional run-command attribute. If run-command is unset AND a primitive is configured, then Pacemaker sets the generated element’s run_cmd attribute to SBIN_DIR "/pacemaker-remote". (SBIN_DIR is set to sbindir by configure.)

/usr/sbin and then SBIN_DIR have been used here since 2017: 8e322e4, 423d3f4.

Pacemaker then passes run_cmd to an external (i.e., out of our control) shell script, which basically executes podman run <other_opts> <run_cmd>. So that script expects run_cmd to be present on the container. Since we don’t own the script, we can’t have it do any remapping or retrying with different paths. And we only get one chance to run the script and “guess” the correct path.

I agree with you that the use of SBIN_DIR is already misleading and potentially broken, since we require the container to use the same SBIN_DIR as the host… in fact, I’ve wondered if simply hard-coding /usr/sbin (as was done originally) might be a viable solution. A “fixed path” as you said. The problem there, is that it would break non-RPM clusters that were built with an sbindir other than /usr/sbin and that use a default run-command. RPM installations would likely be unaffected, if RPM has always used /usr/sbin.

Also: when any cluster resource runs, we compute an md5 sum (“digest”) of its parameters and store it in the “resource history” section. If its digest ever changes, we restart the resource. So we have to be careful to ensure that the parameters of the generated resource don’t change, unless we’re actually trying to change the configuration. Changing the parameter from /usr/sbin/pacemaker-remoted to /usr/bin/pacemaker-remoted changes the digest, even if the path is valid. This causes simulations to fail on rawhide during regression tests (because the simulations predict resource restarts that we don’t expect), and this would cause unnecessary resource restarts during a rolling upgrade.

Again… this gives me a headache too. Many decisions from years ago continue to have repercussions.

The sbin-merge is done in a way where the old paths remain valid during and after the merge. So /usr/sbin/pacemaker-remoted … and /usr/bin/pacemaker-remoted are all valid. … /usr/sbin/pacemaker-remoted can be used, independently of how %_sbindir is defined in the build.

Understood. The problem is in the opposite direction. We can’t use /usr/bin/pacemaker-remoted, because it might not exist on the container and because it changes resource digests in simulations and rolling upgrades.

But wouldn’t those be broken already? The guest has /usr/sbin/pacemaker-remoted, any other configuration in the cluster will not work…

That’s what I wanted to suggest too :slight_smile:

Right. But it sounds like if it’s changed back, this particular issue would go away.

The idea is to use /usr/sbin/pacemaker-remoted. The old path works everywhere.