Fedora Silverblue 31: Installing nvidia drivers fails

Hi there,
I’m trying to install the nvidia drivers via following command:
rpm-ostree install akmod-nvidia xorg-x11-drv-nvidia

However, it fails with this error message:

error: Running %post for akmod-nvidia: Executing bwrap(/bin/sh): Child process killed by signal 1; run journalctl -t 'rpm-ostree(akmod-nvidia.post)' for more information

The logs are the following:

Jan 02 16:35:30 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: Building /usr/src/akmods/nvidia-kmod-440.44-1.fc31.src.rpm for kernel 5.3.16-300.fc31.x86_64
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: of_find_matching_node
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: timer_setup
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: tegra_get_platform
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: dma_direct_map_resource
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: dev_is_pci
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: kbasename
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: address_space_init_once
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: flush_cache_all
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: CONFTEST: vmf_insert_pfn
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: execvp: /bin/sh: Too many open files
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: *** [/tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/Kbuild:129: /tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/conftest/compile-tests/address_space_init_once.h] Error 127
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: *** Waiting for unfinished jobs…
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: *** [/tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/Kbuild:129: /tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/conftest/compile-tests/kbasename.h] Error 127
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: /bin/sh: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 24
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: *** [/tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/Kbuild:129: /tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/conftest/compile-tests/backlight_device_register.h] Error 127
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: *** Deleting file ‘/tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/conftest/compile-tests/backlight_device_register.h’
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: *** [/tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/Kbuild:129: /tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/conftest/compile-tests/register_cpu_notifier.h] Error 127
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[2]: *** Deleting file ‘/tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64/conftest/compile-tests/register_cpu_notifier.h’
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: rm: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 24
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[1]: *** [Makefile:1630: module/tmp/akmodsbuild.5wmDBBsu/BUILD/nvidia-kmod-440.44/_kmod_build_5.3.16-300.fc31.x86_64] Error 2
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make[1]: Leaving directory ‘/usr/src/kernels/5.3.16-300.fc31.x86_64’
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: make: *** [Makefile:81: modules] Error 2
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: error: Bad exit status from /var/tmp/rpm-tmp.wvuTdE (%build)
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: RPM build errors:
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: cannot open Packages database in /var/lib/rpm
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: user mockbuild does not exist - using root
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: group mock does not exist - using root
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: user mockbuild does not exist - using root
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: group mock does not exist - using root
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: cannot open Packages database in /var/lib/rpm
Jan 02 16:35:38 localhost.localdomain rpm-ostree(akmod-nvidia.post)[2942]: Bad exit status from /var/tmp/rpm-tmp.wvuTdE (%build)

I look forward to your help :slight_smile:

It seems that there’s no specific error…
I executed rpm-ostree install akmod-nvidia multiple times and always cleaned up the cache via rpm-ostree cleanup -b and the errors always differ…
They only have one thing in common and that’s:

/bin/sh: Too many open files

But I’m not sure if that’s really the cause of the error.
However, I tried ti increase my users open-file limit ulimt -n XXX, but I was not able to set it.

I finally managed to resolve the issue.
I assume my system had too many available threads in order to execute the post-script of the akmod-nvidia installation script. By disabling all my CPUs but one, I was able to run the installation.

#!/bin/sh

do_enable="$1"
cpu_from="$2"
cpu_to="$3"

flag=0
if [ "$do_enable" = true ]; then
    flag=1
elif [ ! "$do_enable" = false ]; then
    echo "do_enable must be bool. It's value is $do_enable."
exit 1
fi

for ((i="$cpu_from"; i<="$cpu_to"; i++)); do
    echo "$flag" > /sys/devices/system/cpu/cpu"$i"/online
done

Now I wonder if this is a bug or not.
Because, maybe it’s just possible to increase my systems open file limit via ulimit -Sn, however I was not able to effectively modify this value in Silverblue.
It would be great, if you could tell me, if I should file a bug or not.

My CPU:
Threadripper 3960x (24 physical cores, 48 virtual CPUs)

2 Likes

Same error on AMD Ryzen 9 3950X ( 16 cores, 32 threads ) but no logs… I used your script with this way : ./script false 8 31 which deactivate threads from 9 to 32 and it’s done !
After I replace false by true and reboot !

I‘m happy that it helps.
You don’t need to enable the cpus, if you reboot anyways.

By the way, I think my script only fights the symptoms. I‘m still looking for a way to increase my host systems open file limit on Silverblue. I guess that could solve the whole issue :wink:

This worked for me; I hit the same bug on Fedora Silverblue 33 33.20210114.0. Thanks!