Following a discussion we had today I came up with a short blog about ramalama.
What is ramalama?
Ramalama is simply a command line that runs AI models locally by treating them like containers.
Ramalama uses podman or docker to run containers.
It is CPU/GPU optimized and it accelerates performance.
Running AI models using ramalama.
When running the AI models, ramalama uses different transport registries which include ollama, huggingface, Modelscope and OCI registries.
Ollama is the easiest to use. Hugging face and modelscope are not as complicated but OCI registries require authentication in order to use them. For example, ghcr.io which is one of the OCI registries require an authentication token from github to run.
By Njeri Kimaru
Model hallucinations and temperature control.
Different models have different specs and the lighter models seem to hallucinate more than the heavier ones. Some of the lighter models will actually provide no information.
Here’s a light weight model I prompted the four foundations of Fedora;
One way of reducing hallucinations is using --temp 0 flag to make the model deterministic and reduce hallucinations. Temperature control tag ranges from 0 to 1, with lower values increasing model determinism. Uses the 0 as at temp=0, there’s no randomness given the same input, you always get the same output. That’s what makes it deterministic.
For example;
The Merlinite model (4.07GB) looped and hallucinated when asked about Fedora RPM packaging role in Fedora but it provided detailed answers with reference links when it used temperature control.
By Utkarsh_Mishra
Conclusion
While Ollama is praised for ease of use, RamaLama was built as an alternative that allows developers to run and serve AI models while making it easy to put those models in containers and enable local, collaborative, and production benefits. Red Hat
Do you think Ramalama makes AI boring??
