March 1, 20264 min read

What makes a model: data and algorithm

A model is a file of learned numbers, produced by running an algorithm over data. Both ingredients matter, but bad data beats a good algorithm every time.

Post 2 ended on a promise. Every ML and DL system, whatever its flavour, spits out the same artifact: a model. People say the word like we've all agreed what it means. We mostly haven't.

This is post 3 of 8 in the Foundations series.

A model is just a file

Strip the romance off it and a model is a file. Sometimes a few hundred kilobytes. Sometimes 400 gigabytes. Inside are numbers, millions or billions of them, arranged in a structure the program knows how to load.

Those numbers have a name: parameters, also called weights. Nobody typed them. They came out of running an algorithm over data until they settled into values that make the model's predictions match what it was shown.

That's the whole thing. A model is learned numbers in a file, plus an architecture that tells the computer how to use them. Loading a model is reading those numbers off disk into memory. Running it is doing arithmetic with them and whatever input you hand over.

To put real figures on it: a Llama 3 8B model is about 16 GB on disk at full precision, which is 8 billion parameters each stored as a 16-bit number. GPT-4 is rumoured to run into the trillions. When ChatGPT answers you, it's doing math with one of those files. Nothing more mystical than that.

Two ingredients, every time

data and algorithm flowing into a model

Every model is the product of exactly two things.

Data is what it learned from. A spam filter learns from emails. A coding model learns from GitHub. An image model learns from pictures and captions. A model can only ever know what sat in its data, and it inherits every bias, gap, and mislabel that came along with it.

Algorithm is how it learned. That's the architecture (the shape of the network) plus the training procedure (how the numbers get nudged as the model sees more data). Swap the algorithm and you get a different kind of model: a decision tree, a gradient-boosted tree, a neural network, a transformer.

The kitchen version: data is the ingredients, the algorithm is the recipe, the model is the dish. You can't cook something great out of rotten ingredients no matter how good the recipe. Same with models and bad data.

Why bad data beats a good algorithm

This is one of the least intuitive truths in ML, and one of the most reliable. A team running a worse algorithm on a pile of clean, well-labelled data will usually beat a team running a better algorithm on less data, or noisier data.

I've lived this one. Spend a week tuning the model and you claw back maybe 1%. Spend that same week fixing the labels and you can find 5%. Google's search team has preached it for years, and Andrew Ng built his whole "data-centric AI" argument on it. Ask anyone who has actually shipped models and you'll hear the same story.

LLMs don't escape it either. What made GPT-3 land in 2020 wasn't a fresh architecture. The transformer was already three years old by then. It was scale plus carefully curated training data. Garbage in, garbage out, just at 175 billion parameters.

What "learning" actually means here

When we say a model "learned," here's the literal event. The numbers in the file started out random. The algorithm adjusted them, a little at a time, over many passes through the data, until they produced predictions close to the truth.

The model didn't understand a thing. It found a configuration of numbers that compresses the patterns in the data well enough to handle inputs it has never seen. That last part is the whole game. A model that only memorises its training data is useless. A model that captures the underlying pattern can generalise. The distance between memorising and generalising is most of why this work is hard.

So that's the recipe: data plus algorithm, learned into a file. But how does the adjusting actually happen, step by step? And why is running a model afterwards so much cheaper than training it in the first place? That's post 4.

AI From Scratch Fundamentals ML Model

From the dictionary

Terms used in this post

Quick reference for the 14 terms you met above. Each one comes from the AI dictionary.

Artificial IntelligenceAI: Umbrella term for software that performs tasks usually associated with human reasoning — language, perception, decision-making. Coined at the 1956 Dartmouth Summer Research Project. In everyday 2026 use, "AI" almost always means a large language model like ChatGPT, Claude, or Gemini, even though the textbook definition is much broader.; e.g. When a product page says "AI-powered", it could mean a 70-billion-parameter LLM or a hand-written if-statement. The label moves with the times.
AlgorithmML: In ML, the recipe used to turn data into a model — the architecture plus the training procedure. Different algorithms (decision trees, gradient-boosted trees, neural networks, transformers) produce different model types.
ChatGPTAI: OpenAIs consumer chat product, launched November 30, 2022. The first LLM to reach mass adoption — 100 million users in two months. The product most people mean when they say AI today.
DatasetData: The collection of examples a model learns from during training. The shape, size, quality, and bias of the dataset determines almost everything about the resulting model.
Deep LearningDL: A subset of machine learning that uses neural networks with many layers ("deep" stacks). Powers image recognition, speech, and the LLMs behind ChatGPT/Claude/Gemini. Needs much more data and compute than classical ML, but scales further.; e.g. Every modern LLM is a deep-learning model — a transformer with billions of parameters trained on internet-scale text.
GPTAI: OpenAIs family of large language models — Generative Pre-trained Transformer. GPT-4 (2023) and successors are the most widely used closed-source LLMs in production.
Large Language ModelAI: A deep-learning model trained on huge volumes of text to predict the next token given the previous ones. Scaling next-token prediction to billions of parameters yields the chat-like behaviour of ChatGPT, Claude, and Gemini. Capabilities are bounded by training data and the context window.; e.g. Claude is an LLM — it reads your message as tokens and generates a response one token at a time.
Machine LearningML: A subset of AI where the system learns patterns from data instead of following hand-written rules. The output is a model — a set of learned numbers that maps inputs to outputs. Spam filters, recommendation systems, and credit-risk scorers are classical ML.; e.g. Gmail's spam filter learns which emails you mark as junk and updates its model — that's machine learning, not a rule someone wrote.
ModelML: In ML, a model is a file of learned numbers (parameters or weights) plus an architecture that tells the program how to use them. Loading a model means reading those numbers; running it means doing arithmetic with them.
Neural NetworkDL: A model architecture loosely inspired by neurons in the brain — in practice, a stack of matrix multiplications with non-linear functions between them. Deep learning is what you get when you stack many layers of these and train them on a lot of data.
ParametersML: The individual learned numbers inside a model. "7B parameters" means 7 billion of them. More parameters generally means more capacity, more memory needed, and slower inference.
TrainingML: The expensive one-time process of running a learning algorithm over data until the models parameters settle into useful values. Frontier-model training costs $100M+ and tens of thousands of GPUs.
TransformerDL: The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.
WeightsML: The numbers inside a trained model. They start out random and get adjusted during training until they encode the patterns in the data. "Open weights" means the trained numbers are downloadable; it does not mean the training data or code is open.

Rate this article

How helpful did you find this?

Series

AI Foundations

3 / 8 posts

Browse all in AI Foundations →

Newsletter

Get new articles in your inbox

AI engineering, LLM systems, and software architecture — no filler.

No spam. Unsubscribe any time.

Discussion

Comments

Leave a note about the article, architecture choices, or what you would build next.

Loading comments...

On this page

From the dictionary

Terms in this post

Artificial IntelligenceAI: Umbrella term for software that performs tasks usually associated with human reasoning — language, perception, decision-making. Coined at the 1956 Dartmouth Summer Research Project. In everyday 2026 use, "AI" almost always means a large language model like ChatGPT, Claude, or Gemini, even though the textbook definition is much broader.
AlgorithmML: In ML, the recipe used to turn data into a model — the architecture plus the training procedure. Different algorithms (decision trees, gradient-boosted trees, neural networks, transformers) produce different model types.
ChatGPTAI: OpenAIs consumer chat product, launched November 30, 2022. The first LLM to reach mass adoption — 100 million users in two months. The product most people mean when they say AI today.
DatasetData: The collection of examples a model learns from during training. The shape, size, quality, and bias of the dataset determines almost everything about the resulting model.
Deep LearningDL: A subset of machine learning that uses neural networks with many layers ("deep" stacks). Powers image recognition, speech, and the LLMs behind ChatGPT/Claude/Gemini. Needs much more data and compute than classical ML, but scales further.
GPTAI: OpenAIs family of large language models — Generative Pre-trained Transformer. GPT-4 (2023) and successors are the most widely used closed-source LLMs in production.
Large Language ModelAI: A deep-learning model trained on huge volumes of text to predict the next token given the previous ones. Scaling next-token prediction to billions of parameters yields the chat-like behaviour of ChatGPT, Claude, and Gemini. Capabilities are bounded by training data and the context window.
Machine LearningML: A subset of AI where the system learns patterns from data instead of following hand-written rules. The output is a model — a set of learned numbers that maps inputs to outputs. Spam filters, recommendation systems, and credit-risk scorers are classical ML.
ModelML: In ML, a model is a file of learned numbers (parameters or weights) plus an architecture that tells the program how to use them. Loading a model means reading those numbers; running it means doing arithmetic with them.
Neural NetworkDL: A model architecture loosely inspired by neurons in the brain — in practice, a stack of matrix multiplications with non-linear functions between them. Deep learning is what you get when you stack many layers of these and train them on a lot of data.
ParametersML: The individual learned numbers inside a model. "7B parameters" means 7 billion of them. More parameters generally means more capacity, more memory needed, and slower inference.
TrainingML: The expensive one-time process of running a learning algorithm over data until the models parameters settle into useful values. Frontier-model training costs $100M+ and tens of thousands of GPUs.
TransformerDL: The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.
WeightsML: The numbers inside a trained model. They start out random and get adjusted during training until they encode the patterns in the data. "Open weights" means the trained numbers are downloadable; it does not mean the training data or code is open.

Series

AI Foundations

3 / 8 posts

Browse all in AI Foundations →