Your first local LLM, end to end
Install Ollama, pull Llama 3.2 3B, chat, hit the OpenAI-compatible API, and troubleshoot the five things that go wrong on first install. By the end of this post you have a working local LLM.
Blog
Posts on AI engineering, LLM systems, and software development.
Install Ollama, pull Llama 3.2 3B, chat, hit the OpenAI-compatible API, and troubleshoot the five things that go wrong on first install. By the end of this post you have a working local LLM.
Why every engineer should run a local LLM in 2026: privacy, zero marginal cost, lower latency, no rate limits, and offline. Even a 16GB MacBook Air runs Llama 3.2 3B at 30 tok/s.
llama.cpp is the engine; Ollama and LM Studio wrap it. What each does, when to pick which, and why the OpenAI-compatible APIs are mostly but not entirely interchangeable.
Install Ollama on macOS, Linux, and Windows. Pull your first model, run it locally, and verify with ollama list. The fastest path to a local LLM.