Anyone doing anything with LLMs in Fedora?
I am running stable-diffusion-webui and llm
(python LLM manager) to do some playing around and testing.
Anyone doing anything with LLMs in Fedora?
I am running stable-diffusion-webui and llm
(python LLM manager) to do some playing around and testing.
If your GPU is suported, no need to do this; ollama
will detect your GPU without issue.
You can find the supported cards here:
In my case, I have an AMD Radeon RX 6600, which isn’t supported by rocm:
$ rocm-smi --showproductname
============================ ROCm System Management Interface ============================
====================================== Product Info ======================================
GPU[0] : Card series: Navi 23 [Radeon RX 6600/6600 XT/6600M]
GPU[0] : Card model: 0xe451
GPU[0] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0] : Card SKU: 2451LMH
==========================================================================================
================================== End of ROCm SMI Log ===================================
In order to have ollama
use my GPU, I had to modify my /etc/systemd/system/ollama.service
to add an environment variable in order to provide the closest GFX version to my card in order to be able to use it:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/home/renich/.local/share/gem/ruby/bin:/usr/lib64/ccache:/home/renich/.local/bin:/home/renich/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin"
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" # <-- added this
[Install]
WantedBy=default.target
The GVX_VERSION is relative to the closest LLVM target to my card’s.
Find your gfx version with: rocminfo
, or rocminfo | grep -Eo 'gfx[0-9]{3,4}'
and use the part after ‘gfx’ to create a version. If you have 3 digits, well x.y.z should be your version. In the case of 4 digits: wx.y.z should be it.
Now, when I start the ollama
service, I see this in the logs:
# systemctl start ollama.service| journalctl -fu ollama
...
Jul 10 19:05:06 desktop.casa.g02.org systemd[1]: Started ollama.service - Ollama Service.
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: 2024/07/10 19:05:06 routes.go:1033: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.813-06:00 level=INFO source=images.go:751 msg="total blobs: 10"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.814-06:00 level=INFO source=images.go:758 msg="total unused blobs removed: 0"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.814-06:00 level=INFO source=routes.go:1080 msg="Listening on 127.0.0.1:11434 (version 0.2.1)"
Jul 10 19:05:06 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:06.814-06:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1778105021/runners
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.862-06:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60101]"
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.862-06:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.867-06:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.871-06:00 level=INFO source=amd_linux.go:333 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
Jul 10 19:05:08 desktop.casa.g02.org ollama[554903]: time=2024-07-10T19:05:08.871-06:00 level=INFO source=types.go:103 msg="inference compute" id=0 library=rocm compute=gfx1032 driver=0.0 name=1002:73ff total="8.0 GiB" available="7.2 GiB"
Anyway, good luck with this if you try it. Obviously, you need to install ollama
first but that is straight-forward. Check out their website for a know-how: Download Ollama on macOS
Good luck.
I, currently, use stable-diffusion-webui using the pip
installation method; venv
and all.
This isn’t ideal, IMHO.
I am looking for a way to be able to install it and use Fedora’s packages and Fedora’s curated rocm.
I have not had much success with fedora’s built in rocm. Maybe it’s because I’m on silverblue idk.
what I usually do is just download the ollama tgz, then the matching ollama-rocm tgz and run using something like this:
LD_LIBRARY_PATH=./lib/ollama:$LD_LIBRARY_PATH ./bin/ollama serve
I also find just running the docker container to be really easy as well.