LLMs in Fedora

Anyone doing anything with LLMs in Fedora?

I am running stable-diffusion-webui and llm (python LLM manager) to do some playing around and testing.

1 Like

If your GPU is suported, no need to do this; ollama will detect your GPU without issue.

You can find the supported cards here:

In my case, I have an AMD Radeon RX 6600, which isn’t supported by rocm:

$ rocm-smi --showproductname


============================ ROCm System Management Interface ============================
====================================== Product Info ======================================
GPU[0]		: Card series: 		Navi 23 [Radeon RX 6600/6600 XT/6600M]
GPU[0]		: Card model: 		0xe451
GPU[0]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]		: Card SKU: 		2451LMH
==========================================================================================
================================== End of ROCm SMI Log ===================================

In order to have ollama use my GPU, I had to modify my /etc/systemd/system/ollama.service to add an environment variable in order to provide the closest GFX version to my card in order to be able to use it:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/home/renich/.local/share/gem/ruby/bin:/usr/lib64/ccache:/home/renich/.local/bin:/home/renich/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin"
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" # <-- added this

[Install]
WantedBy=default.target

The GVX_VERSION is relative to the closest LLVM target to my card’s.

Find your gfx version with: rocminfo, or rocminfo | grep -Eo 'gfx[0-9]{3,4}' and use the part after ‘gfx’ to create a version. If you have 3 digits, well x.y.z should be your version. In the case of 4 digits: wx.y.z should be it.

Now, when I start the ollama service, I see this in the logs:

# systemctl start ollama.service| journalctl -fu ollama
...
Jul 10 19:05:06 desktop.casa.g02.org systemd[1]: Started ollama.service - Ollama Service.
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: 2024/07/10 19:05:06 routes.go:1033: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.813-06:00 level=INFO source=images.go:751 msg="total blobs: 10"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.814-06:00 level=INFO source=images.go:758 msg="total unused blobs removed: 0"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.814-06:00 level=INFO source=routes.go:1080 msg="Listening on 127.0.0.1:11434 (version 0.2.1)"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.814-06:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1778105021/runners
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.862-06:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60101]"
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.862-06:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.867-06:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.871-06:00 level=INFO source=amd_linux.go:333 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.871-06:00 level=INFO source=types.go:103 msg="inference compute" id=0 library=rocm compute=gfx1032 driver=0.0 name=1002:73ff total="8.0 GiB" available="7.2 GiB"

Anyway, good luck with this if you try it. Obviously, you need to install ollama first but that is straight-forward. Check out their website for a know-how: Download Ollama on macOS

Good luck.

2 Likes

I, currently, use stable-diffusion-webui using the pip installation method; venv and all.

This isn’t ideal, IMHO.

I am looking for a way to be able to install it and use Fedora’s packages and Fedora’s curated rocm.

I have not had much success with fedora’s built in rocm. Maybe it’s because I’m on silverblue idk.

what I usually do is just download the ollama tgz, then the matching ollama-rocm tgz and run using something like this:

LD_LIBRARY_PATH=./lib/ollama:$LD_LIBRARY_PATH ./bin/ollama serve

I also find just running the docker container to be really easy as well.

1 Like