Blog

Engineering

Posts on AI engineering, LLM systems, and software development.

Sort:

Local LlmsApril 14, 2026#4

Streaming, throughput, and the KV cache

Why TTFT and tok/s are different numbers, why streaming feels faster than it is, and the KV cache that makes the 1000th token cost about the same as the first.

AI Inference Kv Cache LLM Local Llms

Read →

AI RunningMarch 16, 2026#1

Where AI actually runs: cloud, local, edge

When you use AI, a model file is sitting on a real machine. There are only three places it can be, and which one decides almost everything else.

AI Edge Hardware Inference LLM

Read →

AI FoundationsMarch 3, 2026#4

How a model learns: training and inference

Training is the expensive one-time event where a model's numbers get tuned. Inference is the cheap repeated use afterwards. The gap in cost is enormous, and it shapes the whole industry.

AI Fundamentals Inference ML Training

Read →