But when installing a new kmod version and a new kernel version in a single dnf upgrade, the default DKMS post transaction dracut --regenerate-all --force command fails due to missing folders:
Executing post-transaction command
# command: dracut --regenerate-all --force
dracut[F]: Can't write to /boot/efi/1eeff051ec8246a89c845ef144d9c1f9/6.12.0-124.38.1.el10_1.x86_64: Directory /boot/efi/1eeff051ec8246a89c845ef144d9c1f9/6.12.0-124.38.1.el10_1.x86_64 does not exist or is not accessible.
# exit code: 1
# elapsed time: 00:01:14
----------------------------------------------------------------
After the DKMS scriptlet, everything dnf did run successfully.
For context, here’s the rpm spec:
%global debug_package %{nil}
%global dkms_name led-class-multicolor
%undefine dkms_depends
%global dkms_location /kernel/drivers/leds
Name: kmod-%{dkms_name}-dkms
Version: 6.12.0
Release: 124.38.1%{?dist}
%global dkms_version %{VERSION}-%{RELEASE}
Summary: %{dkms_name} for EL 10
License: GPL-2.0
URL: https://kernel.org
BuildArch: noarch
Requires: dkms
%if 0%{?dkms_depends:1}
Requires: kmod-%{dkms_depends}-dkms >= %{VERSION}-%{RELEASE}
%endif
Source0: %{dkms_name}.tar.xz
Source1: dkms.conf.in
%description
This package provides the %{dkms_name} kernel module,
which is in-tree but not shipped by Red Hat.
%prep
%autosetup -c
cp %{SOURCE1} dkms.conf
sed -i -e 's/@NAME@/%{dkms_name}/g' dkms.conf
sed -i -e 's/@VERSION@/%{dkms_version}/g' dkms.conf
sed -i -e 's/@MODULE_NAME@/%{dkms_name}/g' dkms.conf
%if 0%{?dkms_depends:1}
sed -i -e 's/@MODULE_DEPENDS@/%{dkms_depends}/g' dkms.conf
sed -i -e 's|@SYMBOLS@|KBUILD_EXTRA_SYMBOLS=${dkms_tree}/%{dkms_depends}/%{VERSION}-%{RELEASE}/${kernelver}/${arch}/module/Module.symvers|g' dkms.conf
%else
sed -i -e '/@MODULE_DEPENDS@/d' dkms.conf
sed -i -e '/@SYMBOLS@/d' dkms.conf
%endif
sed -i -e 's|@MODULE_LOCATION@|%{dkms_location}|g' dkms.conf
%build
%install
mkdir -p %{buildroot}%{_usrsrc}/%{dkms_name}-%{dkms_version}/
cp -r * %{buildroot}%{_usrsrc}/%{dkms_name}-%{dkms_version}/
%post
dkms add -m %{dkms_name} -v %{dkms_version} --rpm_safe_upgrade
dkms install -m %{dkms_name} -v %{dkms_version} --force
%preun
dkms remove -m %{dkms_name} -v %{dkms_version} --all --rpm_safe_upgrade
%files
%license LICENSE
%{_usrsrc}/%{dkms_name}-%{dkms_version}
%changelog
That dracut --regenerate-all --force command should not be run. It was added to work around some problem between NVIDIA drivers and the Linux kernel.[1]
Try creating a /etc/dkms/framework.conf.d/override.conf file containing the following line and see if that fixes the problem.
post_transaction=""
With the above modification, DKMS should apply updates only to the new kernel being installed. It should not try to modify all the installed kernels (or rather, their initramfs images their kernel modules). Updating only the newest kernel was the old, working behavior before that NVIDIA workaround was applied.
I suspect the error you are seeing is DKMS trying to modify a kernel that has already been removed from your system.
The opposite actually, it’s trying to modify a kernel that hasn’t been fully installed yet. Kernel scriptlets executed afterwards, DKMS autoinstall tried to install the modules again and succeeded, but leaving an error dangling like that isn’t good practice.
Hmm, that’s right, DKMS should only be building the kernel objects that are (ultimately) stored under /lib/modules. It should not be messing with things under /boot/efi. That is Dracut’s domain and Dracut runs at a later stage.
$ ls -1 /usr/lib/kernel/install.d
40-dkms.install
50-depmod.install
50-dracut.install
90-loaderentry.install
90-uki-copy.install
92-tuned.install
I’d say that something else is fundamentally wrong with the DKMS module you are running. It is attempting to modify files that it should not be modifying.
EDIT: Never mind, I see. It is that post_transaction=... line. In your case, it is trying to call Dracut too early (I think).
Well, I don’t know anything about what it is you are working on, but if you are trying to do things with the initramfs, that should be done in a Dracut module. DKMS should “stay in its lane” and just build the modules.
Is your DKMS module exiting with a return code of 0? DKMS was changed quite some time ago such that any non-zero exit code (other than 77) will halt the whole build process.[1] Prior to that change, DKMS had a “continue on error” behavior where a DKMS module that returned a non-zero status would not affect anything else. The initramfs would just get built without the module that failed to build.
The command ran by DKMS failed, likely because it ran before the scriptlets of a new kernel since dracut reported directory does not exist. Everything dnf works as expected. No issue if I run it manually afterwards.
…maybe it’s dependency issue? Without explicit dependency dnf executed the scriptlets in arbitrary order?
Moving build commands to %posttrans appears to solve this issue.
Note that this breaks rpm DKMS autoinstall and DKMS itself doesn’t support install for all kernels. A more complex script that goes through all available kernels will be required when choosing this route. Error suppression remains the tried and true route used by popular packages.
Just for the record, it is not a good practice to blindly apply updates to the older kernel installations.
Will the user also boot back into all those previous kernel builds to make sure the rebuilt driver works? They won’t. And if it doesn’t, they will find out in the worst possible time – when they actually need to use the older kernel because something on the lastest built has failed. You are not supposed to continue updating the older kernels once you have moved on to a newer kernel. The older kernels are only meant to be used in previous known good fallback scenarios. Once you’ve removed a newer kernel if you find that it will not work with your hardware, you can resume updating the older kernel because it is now your “working” kernel, but you should never corrupt your known good fallback kernels.
Blindly no, but this is mandatory for the nvidia kernel module, otherwise you will
have no graphic if you boot the old kernel, because the version of the Xorg nvidia
module should match the version of the kernel module (I don’t know for Wayland).
If nvidia doesn’t work with the last known good kernel due to changes in userspace, the correct thing to do would be to revert the userspace as well in that case. That can be done pretty easily with log based filesystems like Btrfs or ZFS, as long as you keep snapshots that correlate with each kernel install. You also have to keep your OS and user data on separate subvolumes so that when you roll back your OS install, you do not lose any user data. It’s complicated, but that would be the correct way to handle that sort of problem.
The reason the new kernel fails doesn’t change the method of resolving the problem that nvidia userspace code does not work with older versions of its own driver. You still have to revert your userspace code to work around that problem. Applying updates to the fallback kernel installations is not the right way to allow falling back to previous states because in so doing, you effectively destroy the “known good” previous state.
But you said the graphics would not work because nvidia userspace does not work with its previous driver build, so there is no “immediate” workaround. You would have to do something like dropping to the dracut rescue shell and reverting your OS to get that to work. It is doable, but that is what you would have to do if you are using nvidia and it does not work with its previous driver. This problem is unique to nvidia as far as I know.
I used to configure (undreds of) Fedora systems with an nvidia card to build and install
the akmods kernel modules for all kernels at least at shutdown [1]. That may fail of
course, but worked most of the time: there is many more kernel updates than nvidia
driver ones.
Those machines were always ready thus to boot an older kernel with the nvidia graphic.
[1] Using the old, almost deprecated, akmods-shutdown.service
Right. Just understand that what I’m saying is that “worked most of the time” isn’t as good as “works 100% of the time”. To get the latter, you must not tamper with the last known good builds. You can get away with your method “most” of the the time, but it is not the right and ideal way to deal with the problem.