Install llama.cpp
Build llama.cpp from source with Metal or CUDA acceleration. Run a GGUF model with llama-cli. The closest thing to bare-metal local inference.
Blog
Posts on AI engineering, LLM systems, and software development.
Build llama.cpp from source with Metal or CUDA acceleration. Run a GGUF model with llama-cli. The closest thing to bare-metal local inference.