The runtimes: llama.cpp, Ollama, LM Studio
llama.cpp is the engine; Ollama and LM Studio wrap it. What each does, when to pick which, and why the OpenAI-compatible APIs are mostly but not entirely interchangeable.
Blog
Posts on AI engineering, LLM systems, and software development.
llama.cpp is the engine; Ollama and LM Studio wrap it. What each does, when to pick which, and why the OpenAI-compatible APIs are mostly but not entirely interchangeable.
Install LM Studio on macOS, Linux, and Windows. The fastest GUI for running local LLMs — no terminal needed. Includes the local server for OpenAI-compatible API access.