Skip to Main Content
gpt-oss-120b-mxfp4 Open WebUI

Building llama.cpp

Llama
llama.cpp is a lightweight, high-performance inference framework designed to run large language models locally on consumer hardware.

Recipe

Create Directories

mkdir -p $HOME/codebase/llama.cpp
cd $HOME/codebase/llama.cpp

Download and Build

git clone https://github.com/ggml-org/llama.cpp.git

cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=ON -DGGML_NATIVE=OFF
cmake -B build -DGGML_NATIVE=ON -DGGML_CUDA=ON -DGGML_CURL=ON -DGGML_RPC=ON -DCMAKE_CUDA_ARCHITECTURES=121a-real
cmake --build build --config Release -j 20

Validate

ls -all $HOME/codebase/llama.cpp/build/bin/llama-server 

Stage

cp ls -all $HOME/codebase/llama.cpp/build/bin/llama-server \
/usr/local/bin/llama-server

chmod 755 /usr/local/bin/llama-server
chown sysadmin:sysadmin /usr/local/bin/llama-server