How to install Cuda with NVIDIA and the RPM Fusion packages

I’ve installed the RPM Fusion NVIDIA drivers, following the advice here https://rpmfusion.org/Howto/NVIDIA. It went really smoothly and worked first time (unlike certain advice elsewhere on the Web, which left me without a working X server). Thanks to the Fedora volunteers for packaging it up!

However, even though I installed the xorg-x11-drv-nvidia-cuda package, Folding At Home can’t detect my Cuda installation. Instructions like these https://stackoverflow.com/questions/9727688/how-to-get-the-cuda-version for verifying that Cuda is installed talk about running nvcc (which doesn’t exist anywhere on my system, as proved by find) and checking /usr/local/cuda/version.txt (which I can’t do, because I have no /usr/local/cuda directory).

Despite this, nvidia-bug-report.log reports that I have Cuda 11.2. It’s too large for fpaste, but I see this:

[msl@localhost tmp]$ zcat nvidia-bug-report.log.gz | grep -i cuda
Feb 18 11:17:13 localhost.localdomain sudo[7326]:      msl : TTY=pts/1 ; PWD=/home/msl ; USER=root ; COMMAND=/usr/bin/dnf install xorg-x11-drv-nvidia-cuda
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
  Attribute 'CUDACores' (localhost.localdomain:1.0): 384.
    'CUDACores' is an integer attribute.
    'CUDACores' is a read-only attribute.
    'CUDACores' can use the following target types: X Screen, GPU.
  Attribute 'CUDACores' (localhost.localdomain:1[gpu:0]): 384.
    'CUDACores' is an integer attribute.
    'CUDACores' is a read-only attribute.
    'CUDACores' can use the following target types: X Screen, GPU.
CUDA Version                              : 11.2
CUDA Version                              : 11.2
[...]
[msl@localhost tmp]$

Is it possible, while staying with the RPM Fusion packages, to build up my Cuda installation to the point where Folding At Home can find it?

Many thanks!

1 Like

I have not been running F@H, but in the past, when running seti@home I was using cuda with just the package from rpmfusion.

I just installed the F@H packages on fedora and like you I found the cuda does not run. I really suspect this is a result of the way F@H is written since it uses python2 and has not been ported to python3 as yet.

In the logs I found it also was unable to find libOpenCL.so but I fixed that by going into /usr/lib64 and creating a soft link so libOpenCL.so was connected to libOpenCL.so.1.0.0 and those complaints quit.

I will do more testing but am not really holding out any hope because it comes from a 3rd party and uses python2.

Thanks for replying, JV. It seems I already have that symlink:

[msl@localhost lib64]$ pwd
/usr/lib64
[msl@localhost lib64]$ ls -lh libOpenCL*
lrwxrwxrwx. 1 root root   18 Oct  6 04:26 libOpenCL.so -> libOpenCL.so.1.0.0
lrwxrwxrwx. 1 root root   18 Oct  6 04:26 libOpenCL.so.1 -> libOpenCL.so.1.0.0
-rwxr-xr-x. 1 root root 135K Oct  6 04:26 libOpenCL.so.1.0.0
[msl@localhost lib64]$

But, if I add a GPU slot in F@H, I still see complaints about OpenCL being unavailable:

10:25:37:WARNING:FS03:No CUDA or OpenCL 1.2+ support detected for GPU slot 03: gpu:-1:-1.  Disabling.

clinfo says that both Cuda and OpenCL are available on my system https://paste.centos.org/view/c9ac7976, but it also starts with a bunch of errors, making me wonder whether something is askew:

X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
cl_get_gt_device(): error, unknown device: 0

Thanks again for looking into this.

Your post shows 3 GPUs. One intel which gave the errors above (I915 driver), one clover, and one nvidia. The rpmfusion cuda packages only support the nvidia card.

I only have one nvidia card.

I have been able to get F@H working (CPU only) on my system.
I found a couple things that were required in order to get it all functional. I also found that it is not written or packaged to run under systemd, but uses the older style script under /etc/init.d.

The FAHControl portion would not start, either from the command line or from the installed icon. I found 4 files related to FAH in /usr/bin, and one (FAHControl) was a python script. In it, the sha-bang line read “#!/usr/bin/python”. There is a problem with that because /usr/bin/python is a link to /usr/bin/python3 and it was not using the proper path to load the needed modules. Since F@H uses python2 I changed the sha-bang line to read “#!/usr/bin/python2” in that file and now the control works.

I still have not gotten the GPU to function (control says it is “paused: waiting”) but so far everything else seems good. Still working on that.

I do not think the GPU issue is cuda related directly, but rather I believe it is related to the changes between python2 and python3, and since the rpmfusion cuda is python3 it may never work with that until F@H is rewritten for python3

I may dedicate some time to rewriting the F@H modules (those files located under /usr/lib/python2.7/site-packages) for python3 and see if I can get it working that way.

Thanks, JV.

I don’t know why `clinfo’ thinks I have an Intel graphics card – I haven’t:

[msl@localhost ~]$ sudo lspci | grep VGA
41:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)
[msl@localhost ~]$

I’ve had F@H running in CPU-only mode for donkey’s years. My (trivial) change to FAHControl is the same as yours, modulo a couple of whitespace diffs that’ve crept in over the years.

If the RPM Fusion packages provide Cuda libraries for Python3 but not for Python2, that’ll explain the problem perfectly. It sounds as if it’s time for me to wander over to the Folding Forums and ask whether an upgrade to Python3 is coming soon, or whether there’s some way to bypass the problematic check.

Thanks again for your help.

Going to the other forum will not be necessary.
I am not certain exactly what fixed it but I now have F@H running on the GPU (Geforce GTX 1050).

I have installed only the items we discussed, and made the change in /usr/bin/FAHControl noted above so it uses the python2 libraries.

Today I connected my work to my team at FAH and to do that I used FAHControl and paused the client while making the configuration change. When I returned the client to folding I noted that the GPU also picked up a WU and began folding along with the CPU.

Maybe yours is simply waiting on a new config and restart like mine seemed to do. I think what activated it was, in sequence, pause, writing the config file, then fold, all from within FAHControl, but cannot be 100% certain.

This is the content of /etc/fahclient/config.xml now that the gpu is configured and functioning

<config>
  <!-- Network -->
  <proxy v=':8080'/>

  <!-- User Information -->
  <team v='#####'/>
  <user v='XXXXXXXXXXX'/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'>
    <pci-bus v='8'/>
    <pci-slot v='0'/>
  </slot>
</config>

Well, pausing and then restarting didn’t help, but adding a GPU slot and then rebooting seems to have got me one step further. GPU core 22 now crashes as soon as it starts running, but that’s well outside the remit of the Fedora Project. I’ll take it to the Folding Forums and see what they can find out.

Thanks again for helping me get this far.

Did you use the pause, configure, restart sequence.?

I had entered a slot for the GPU by hand and it did not function. When I did the config from the control panel it changed the syntax and added a couple lines to that config after which it worked.

Notably the extra lines were in the slot config for the GPU and contained the PCI device IDs as seen in the config above.

You might want to examine the output of “lspci -v | grep -A3 VGA” and make certain the entries in /etc/fahclient/config.xml for that GPU match the lspci output like this

 lspci -v | grep -A3 VGA
08:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Gigabyte Technology Co., Ltd Device 3765
	Flags: bus master, fast devsel, latency 0, IRQ 59

  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'>
    <pci-bus v='8'/>
    <pci-slot v='0'/>
  </slot>

Note that pci-bus and pci-slot match the first 2 parts on the lspci output (08:00.0)

If that is all matching then I don’t know.

Notice also that there are 2 slot stanzas there. The first, slot 0 is one line for the CPU, but slot 1 for the GPU is four lines.