Transformer

/dictionary/transformer

Definition

The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.

Posts that use this term

Troubleshooting local LLMs (and how to keep up after this series)
The full catalog of local-LLM failures: OOM, slow tok/s, garbage output, instruction drift, bad RAG hits, tool-call hallucination. Plus where to follow the field once you're on your own.
Fine-tuning a model locally
When fine-tuning is actually the right call (it usually isn't) and how to pull off a LoRA run on a 16GB Mac, with a worked Llama 3.2 3B example.
Streaming, throughput, and the KV cache
Why TTFT and tok/s are different numbers, why streaming feels faster than it is, and the KV cache that makes the 1000th token cost about the same as the first.
The context window, and why models hallucinate
An LLM only sees a fixed-size slice of text at a time. When it doesn't know something, it predicts anyway. That's a hallucination, not a bug.
From models to LLMs
An LLM is one kind of ML model, trained on text, predicts the next token. That single trick at scale gets you ChatGPT, and also explains where it breaks.
What makes a model: data and algorithm
A model is a file of learned numbers, produced by running an algorithm over data. Both ingredients matter, but bad data beats a good algorithm every time.
Inside AI: machine learning and deep learning
Open the AI umbrella. Machine learning is the part that learns from data. Deep learning is ML done with neural networks, and that's where today's models live.