Transformer
DL/dictionary/transformer
Definition
The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.
Posts that use this term
- Troubleshooting local LLMs and keeping up
The catalog of common local-LLM failures: OOM, slow tok/s, garbage output, instruction drift, RAG miss, tool-call hallucination. Plus where to follow the field as it moves.
- Fine-tuning a model locally
When fine-tuning is the right answer (rarely) and how to do it on consumer hardware: LoRA, QLoRA, MLX-LM, Unsloth. A worked example fine-tuning Llama 3.2 3B on a 16GB Mac.
- Streaming, throughput, and the KV cache
TTFT vs tok/s, why streaming feels faster, and the KV cache that makes the 1000th token cost the same as the first. KV cache quantization (Q8/Q4 KV) and why it should be your default.
- The context window, and why models hallucinate
An LLM only sees a fixed-size slice of text at a time. When it doesn't know something, it predicts anyway — that's a hallucination, not a bug.
- From models to LLMs
An LLM is one kind of ML model — trained on text, predicts the next token. That single trick at scale gets you ChatGPT, and also explains where it breaks.
- What makes a model: data and algorithm
A model is a file of learned numbers, produced by running an algorithm over data. Both ingredients matter, but bad data beats a good algorithm every time.
- Inside AI: machine learning and deep learning
Open the AI umbrella. Machine learning is the part that learns from data. Deep learning is ML done with neural networks — and that's where today's models live.