Every machine can run a local LLM (here's what fits)
Per-tier guide: 8GB integrated graphics, 16GB MacBook Air, 8/12/16/24/32GB VRAM PCs, 24/32/64/128/192GB Macs. Specific models, specific tok/s, specific configs. Every tier runs something useful.
Blog
Posts on AI engineering, LLM systems, and software development.
Per-tier guide: 8GB integrated graphics, 16GB MacBook Air, 8/12/16/24/32GB VRAM PCs, 24/32/64/128/192GB Macs. Specific models, specific tok/s, specific configs. Every tier runs something useful.
Unified memory means the GPU sees all of RAM. Why that beats discrete-GPU PCs above 32B parameters, what fits in 16/32/64/128/192GB, and where Apple Silicon still loses.
Why VRAM is the hard ceiling on local LLMs, what quantization actually does to a model file, and the practical hardware ladder from 8GB laptops to 192GB workstations.
Where the model file actually sits when you use AI: a datacenter GPU (cloud), your own machine (local), or the device's silicon (edge). The trade-offs and how to pick.