I want to transcribe handwritten pages using OCR. I installed ollama to see if I could get that to accomplish the task, as well as Newelle. In testing, I am able to run a small model (llama3.2) using ollama but a larger model (llama3.3) is extremely slow. I discovered that for all the years I’ve had this laptop I was not using the Nvidia GPU (there is also an integrated GPU which I assume was working), and went through the process of installing drivers. The Nvidia GPU is working now, but ollama does not seem to be using the GPU. I installed Resources to see, and it did not show any activity for the Nvidia GPU when I ran any model with ollama or when I used whisper through Speech Note, until I launched Speech Note with a right click and selected “Launch Using Discrete Graphics Card”, which showed activity for both GPUs.
I tried using ‘switcherooctl launch ollama serve’ but still didn’t see any activity on the GPU’s in Resources when I interacted with a model.
If I can be advised on how to find out how to get ollama to use all the acceleration available to it I would appreciate it very much. I’m happy to be referred to documentation, forum posts, or other resources. I’ve been able to solve my problems (or label them as too complicated) using docs and internet searches for all the years I’ve been using Linux, unfortunately search seems to be broken now as slop and hyper-categorization has taken hold the last few years.
I don’t know what to expect a laptop to be able to do and don’t want to suggest that everything I throw at it should be fast, but I do expect load on components to increase when I perform taxing tasks and I’m not seeing the GPU do anything when ollama is working hard.
If my transcription task is better suited to a different solution I am open to being pointed to that as well.
Install nvtop, and run your model with nvtop running in a terminal (but OCR would be far better as @theprogram notes). You’ll see for definite if the nvidia GPU is in use.
This may be the culprit. I originally was making a question to ask about seeing any activity at all from the GPU. I had come across the CUDA info and found that it is not available for F43 yet, so I proceeded with just xorg-x11-drv-nvidia-cuda from the repo and forgot about it. While writing the question and searching the forums for similar questions I found out about switcherooctl and running with discrete graphics and was able to finally see some GPU activity, just not from ollama. The detour away from focus on just ollama distracted me from the issue with the CUDA install step.
Since the cuda-toolkit is not supported for F43 is it advisable to install it anyway? I guess I will try, and report back.
The cuda tool kit is not from rpmfusion, but from an nvidia site. Having items from different locations makes it very very easy to have version mismatches and to cause more issues than solutions.
If you feel it important you may try the toolkit installation but unless absolutely necessary I would caution against that.
As I understand it the toolkit is more for developers than for end users.
That was why I didn’t attempt it when I first came across it. Perhaps I will wait for rpmfusion to update after Nvidia eventually releases CUDA toolkit for F43.
I did find Making sure you're not a bot! so it seems someone has it working but I’m not sure if it’s worth messing with myself, or perhaps I’m misunderstanding and the cuda-toolkit is not necessary for running ollama using CUDA, with only the xorg-x11-drv-nvidia-cuda driver being needed. Seeing as it doesn’t work, I thought the skipped step of installing the toolkit might be the key. Timeline for CUDA Support on Fedora 43 (Wayland-only GNOME, Kernel Updates, and Driver Compatibility) - #8 by nvidia3906 - Linux - NVIDIA Developer Forums seem to indicate the toolkit may be necessary.
That seems to indicate toolkit may be required when using nvidia as installed using the nvidia .run file to install the driver. I don’t use ollama so cannot say first hand, but it is well known that the nvidia driver when installed from rpmfusion has already been tweaked to be stable on fedora and most that I have seen have had no problems with cuda as installed with the xorg-x11-drv-nvidia-cuda package.
My comment above about mixing sources of software stands. Many have reported issues when that has been done. Packages may have incompatible dependencies as well as differences in the actual files installed – not to mention that packages from one source may overwrite files installed by packages from another source.
Many have also had problems when installing nvidia from any source other than rpmfusion.
Thank you @computersavvy. I will wait until there is an updated driver supporting wayland from Nvidia and rpmfusion to see if there is a way for it to work without the toolkit and then test out the toolkit if not, but I agree, I don’t want to keep it installed as updates progress as problems may develop, as reported by many users after the F43 upgrade. Fingers crossed that the updated rpmfusion driver will be enough, whenever that may arrive.
Since you’re on NVIDIA and Fedora 43, one structural option (independent of RPM Fusion packaging delays) is to move the CUDA userspace into a container instead of relying on host-side CUDA packages.
I’m personally running Ollama containerized (Podman) on top of an official CUDA runtime image with GPU passthrough. In my case it’s ROCm, but the architectural idea is the same for CUDA:
Only the kernel driver lives on the host
The compute runtime (CUDA or ROCm userspace) comes from the container
The GPU device nodes are passed through
This avoids mixing Fedora packages with NVIDIA’s upstream toolkit and reduces breakage during kernel updates.
NVIDIA maintains official CUDA runtime container images, which can be used as a base:
For Nvidia you probably also need “nvidia-container-toolkit” on the host side. (Edit: in your case, with an AMD GPU, I guess you don’t need an equivalent of this.)
I also do (non-Ollama) CUDA stuff in containers, and I installed the toolkit from the Fedora AI/ML SIG’s COPR repo.