Parameters

/dictionary/parameters

Definition

The individual learned numbers inside a model. "7B parameters" means 7 billion of them. More parameters generally means more capacity, more memory needed, and slower inference.

Posts that use this term

Troubleshooting local LLMs (and how to keep up after this series)
The full catalog of local-LLM failures: OOM, slow tok/s, garbage output, instruction drift, bad RAG hits, tool-call hallucination. Plus where to follow the field once you're on your own.
Fine-tuning a model locally
When fine-tuning is actually the right call (it usually isn't) and how to pull off a LoRA run on a 16GB Mac, with a worked Llama 3.2 3B example.
Local RAG and embeddings
Build a working local RAG pipeline in about 30 lines using nomic-embed-text, Chroma, and Llama 3.2. And why running it on your own machine beats the cloud for personal notes.
Picking a local model by task
The 2026 open leaders, sorted by what you actually want to do: coding, chat, the small-model crowd, structured output, vision, embeddings, and audio.
Quantization, distillation, pruning: how a 140GB model fits on your laptop
Three ways to shrink an LLM, and why one of them does almost all the work. What Q4_K_M actually means and what each shortcut costs you.
The local-LLM vocabulary
Parameters, B, dense vs MoE, base vs instruct, tokens, context windows, chat templates, GGUF, and quant suffixes. Read it once and any HuggingFace model card stops being scary.
The pitch for local LLMs in 2026
The case for running an LLM on the machine you already own. Privacy, no per-call cost, faster first token, no rate limits, and it works on a flight.
Why Apple Silicon punches above its weight on local LLMs
Unified memory lets the GPU see all of RAM. Here's why that beats a discrete-GPU PC past 32B parameters, what fits in 16/32/64/128/192GB, and where Apple Silicon still loses.
What it takes to run a model on your own machine
Why VRAM is the one number that decides whether a local LLM runs, what quantization really does to a model file, and the hardware ladder from an 8GB laptop to a 192GB workstation.
The major LLMs in 2026
A field guide to the closed frontier models and the open weights you can actually run. What the "B" numbers mean, and which size fits your machine.
Where AI actually runs: cloud, local, edge
When you use AI, a model file is sitting on a real machine. There are only three places it can be, and which one decides almost everything else.
RAG: giving a model memory it doesn't have
RAG is the pattern of fetching relevant text from a search system and putting it in the LLM's context window before asking your question. Not magic, not fine-tuning, just better prompts.
From models to LLMs
An LLM is one kind of ML model, trained on text, predicts the next token. That single trick at scale gets you ChatGPT, and also explains where it breaks.
How a model learns: training and inference
Training is the expensive one-time event where a model's numbers get tuned. Inference is the cheap repeated use afterwards. The gap in cost is enormous, and it shapes the whole industry.
What makes a model: data and algorithm
A model is a file of learned numbers, produced by running an algorithm over data. Both ingredients matter, but bad data beats a good algorithm every time.
Inside AI: machine learning and deep learning
Open the AI umbrella. Machine learning is the part that learns from data. Deep learning is ML done with neural networks, and that's where today's models live.
AI, in plain words
What "AI" actually means, where the term came from, and why every product calls itself AI now. Sets up where machine learning and deep learning fit underneath.