Blog

#gguf

Posts on AI engineering, LLM systems, and software development.

Sort:

Local LlmsApril 10, 2026#3

Quantization, distillation, pruning: how a 140GB model fits on your laptop

Three ways to shrink an LLM, and why one of them does almost all the work. What Q4_K_M actually means and what each shortcut costs you.

AI Distillation GGUF LLM Local Llms

Read →

Local LlmsApril 7, 2026#2

The local-LLM vocabulary

Parameters, B, dense vs MoE, base vs instruct, tokens, context windows, chat templates, GGUF, and quant suffixes. Read it once and any HuggingFace model card stops being scary.

AI GGUF LLM Local Llms Vocabulary

Read →