Blog

#local-models

Posts on AI engineering, LLM systems, and software development.

Sort:

Local LlmsApril 3, 2026#1

The pitch for local LLMs in 2026

The case for running an LLM on the machine you already own. Privacy, no per-call cost, faster first token, no rate limits, and it works on a flight.

AI LLM Local Llms Local Models Ollama

Read →

AI RunningMarch 31, 2026#7

What leaves your machine when you use AI

What providers actually see, log, and keep when you call an LLM API in 2026. What "we don't train on your data" really means, how free and paid tiers differ, and when local is the only safe choice.

AI AI Running API LLM Local Models

Read →

AI RunningMarch 23, 2026#4

Why Apple Silicon punches above its weight on local LLMs

Unified memory lets the GPU see all of RAM. Here's why that beats a discrete-GPU PC past 32B parameters, what fits in 16/32/64/128/192GB, and where Apple Silicon still loses.

AI AI Running Apple Silicon Hardware LLM

Read →

Setup ToolboxFebruary 22, 2026#8

Install LM Studio

Install LM Studio on macOS, Linux, and Windows, then flip on the local OpenAI-compatible server so any client library can talk to a model on your own machine.

AI LLM Lm Studio Local Models Setup

Read →

Setup ToolboxFebruary 21, 2026#7

Install llama.cpp

Build llama.cpp from source with Metal or CUDA, then run a GGUF model with llama-cli. The closest thing to bare-metal local inference.

AI Llama Cpp LLM Local Models Setup

Read →

Setup ToolboxFebruary 20, 2026#6

Install Ollama

Get Ollama running on macOS, Linux, or Windows, pull your first model, and confirm it works with ollama list. The shortest path to a local LLM.

AI LLM Local Models Ollama Setup

Read →