Fine-Tuning

/dictionary/fine-tuning

Definition

Continuing to train an existing model on new data, so the new patterns get baked into the weights. Distinct from RAG (which only changes the prompt) and prompting (which changes nothing).

Posts that use this term

Troubleshooting local LLMs (and how to keep up after this series)
The full catalog of local-LLM failures: OOM, slow tok/s, garbage output, instruction drift, bad RAG hits, tool-call hallucination. Plus where to follow the field once you're on your own.
Fine-tuning a model locally
When fine-tuning is actually the right call (it usually isn't) and how to pull off a LoRA run on a 16GB Mac, with a worked Llama 3.2 3B example.
Local agents and tool use
Function calling on open models in 2026. Which ones actually work, why local agents break when they break, and the scaffolding that keeps them upright.
Every machine can run a local LLM (here's what fits)
A per-tier guide to running local LLMs in 2026, from 8GB integrated graphics to a 192GB Mac Studio. Specific models, specific speeds, specific configs.
Picking a local model by task
The 2026 open leaders, sorted by what you actually want to do: coding, chat, the small-model crowd, structured output, vision, embeddings, and audio.
The local-LLM vocabulary
Parameters, B, dense vs MoE, base vs instruct, tokens, context windows, chat templates, GGUF, and quant suffixes. Read it once and any HuggingFace model card stops being scary.
The pitch for local LLMs in 2026
The case for running an LLM on the machine you already own. Privacy, no per-call cost, faster first token, no rate limits, and it works on a flight.
What leaves your machine when you use AI
What providers actually see, log, and keep when you call an LLM API in 2026. What "we don't train on your data" really means, how free and paid tiers differ, and when local is the only safe choice.
The major LLMs in 2026
A field guide to the closed frontier models and the open weights you can actually run. What the "B" numbers mean, and which size fits your machine.
Prompt, RAG, fine-tune: three ways to shape a model
Three levers for shaping what an LLM does: prompting (ask better), RAG (give it the right context), fine-tuning (change the weights). What each costs, what each fixes, and how to pick.
RAG: giving a model memory it doesn't have
RAG is the pattern of fetching relevant text from a search system and putting it in the LLM's context window before asking your question. Not magic, not fine-tuning, just better prompts.
From models to LLMs
An LLM is one kind of ML model, trained on text, predicts the next token. That single trick at scale gets you ChatGPT, and also explains where it breaks.
How a model learns: training and inference
Training is the expensive one-time event where a model's numbers get tuned. Inference is the cheap repeated use afterwards. The gap in cost is enormous, and it shapes the whole industry.