Blog

AI Running

Posts on AI engineering, LLM systems, and software development.

Sort:

AI RunningMarch 18, 2026#2

The major LLMs in 2026

A field guide to the closed frontier models and the open weights you can actually run. What the "B" numbers mean, and which size fits your machine.

AI AI Running Benchmarks LLM Open Weights

Read →

AI RunningMarch 21, 2026#3

What it takes to run a model on your own machine

Why VRAM is the one number that decides whether a local LLM runs, what quantization really does to a model file, and the hardware ladder from an 8GB laptop to a 192GB workstation.

AI AI Running Hardware LLM Quantization

Read →

AI RunningMarch 23, 2026#4

Why Apple Silicon punches above its weight on local LLMs

Unified memory lets the GPU see all of RAM. Here's why that beats a discrete-GPU PC past 32B parameters, what fits in 16/32/64/128/192GB, and where Apple Silicon still loses.

AI AI Running Apple Silicon Hardware LLM

Read →

AI RunningMarch 26, 2026#5

The runtimes: llama.cpp, Ollama, LM Studio

llama.cpp is the engine. Ollama and LM Studio wrap it. What each one does, when to reach for which, and why the OpenAI-compatible APIs are mostly but not entirely interchangeable.

AI AI Running Llama Cpp LLM Lm Studio

Read →

AI RunningMarch 28, 2026#6

LLM API bills, and why a token costs what it costs

How input and output tokens get priced, why output runs 5-6x more, and how prompt caching cuts the input bill by 10x. Plus the hidden costs that ambush people.

AI AI Running API LLM Pricing

Read →

AI RunningMarch 31, 2026#7

What leaves your machine when you use AI

What providers actually see, log, and keep when you call an LLM API in 2026. What "we don't train on your data" really means, how free and paid tiers differ, and when local is the only safe choice.

AI AI Running API LLM Local Models

Read →