4 min read

What makes a model: data and algorithm

A model is a file of learned numbers, produced by running an algorithm over data. Both ingredients matter, but bad data beats a good algorithm every time.

What makes a model: data and algorithm

Post 2 ended with: every ML and DL system produces the same artifact — a model. The word gets thrown around like everyone agrees on what it means. Most people don't.

This is post 3 of 8 in the Foundations series.

What a model actually is

A model is a file. Sometimes a few hundred kilobytes. Sometimes 400 gigabytes. Inside the file are numbers — millions or billions of them — arranged in a structure the program knows how to load.

Those numbers are called parameters, or weights. They were not typed by anyone. They were produced by running an algorithm over data until the numbers settled into values that make the model's predictions match the data it was shown.

That's it. A model is learned numbers in a file, plus an architecture that tells the computer how to use them. Loading a model means reading the numbers off disk into memory. Running it means doing arithmetic with those numbers and the input you give it.

A Llama 3 8B model file is about 16 GB on disk in full precision — 8 billion parameters, each stored as a 16-bit number. GPT-4 is rumoured to be in the trillions of parameters. ChatGPT, when it answers you, is doing math with one of those files.

The two ingredients

data and algorithm flowing into a model

Every model is the output of two things:

Data. What it learned from. A spam filter learns from emails. A coding model learns from GitHub. An image model learns from images and captions. The model can only know what was in its data — and it'll inherit every bias, gap, and mislabel that was in there.

Algorithm. How it learned. The architecture (the shape of the network) plus the training procedure (how the numbers get adjusted as the model sees data). Different algorithms produce different model types: decision trees, gradient-boosted trees, neural networks, transformers.

Think of it like cooking. Data is the ingredients. The algorithm is the recipe. The model is the dish that comes out. You can't make great food from rotten ingredients, no matter how good the recipe. And you can't make a great model from bad data, no matter how good the algorithm.

Why bad data beats a good algorithm

This is one of the most counter-intuitive things in ML, and one of the most consistent. A team using a worse algorithm with more clean labelled data will usually beat a team using a better algorithm with less data, or noisier data.

Google's search team has said this for years. Andrew Ng's whole "data-centric AI" pitch is built on it. Every ML practitioner who's shipped real models has the same story: spend a week tuning the model, get a 1% improvement. Spend a week cleaning the labels, get a 5% improvement.

LLMs are no exception. The thing that made GPT-3 work in 2020 wasn't a new architecture — the transformer was already 3 years old by then. It was scale plus carefully curated training data. Garbage in, garbage out, but at 175 billion parameters.

What "learning" means in this context

When we say a model learned, what literally happened: the numbers in the file started out random, and the algorithm adjusted them, slowly, over many passes through the data, until they produced predictions close to the ground truth.

The model didn't understand anything. It found a configuration of numbers that compresses the patterns in the data well enough to generalise to new inputs. "Generalise" is the key word — a model that just memorises its training data is useless. A model that captures the underlying patterns can handle inputs it's never seen.

The gap between memorising and generalising is most of why ML is hard.

What to take away

  • A model is a file of numbers (parameters / weights) plus an architecture. Nothing magical, just learned arithmetic.
  • Two ingredients: data (what it learned from) and algorithm (how it learned). Both matter; data usually matters more.
  • Bad data beats a good algorithm. Most real-world ML wins come from cleaning data, not tweaking models.
  • "Learning" means adjusting numbers until predictions match data. The goal is generalisation, not memorisation.

That's the recipe. But how does the adjusting actually happen — what is training, mechanically? And why is running the model afterwards so much cheaper than training it? Post 4.

From the dictionary

Terms used in this post

Quick reference for the 14 terms you met above. Each one comes from the AI dictionary.

Artificial IntelligenceAI
Umbrella term for software that performs tasks usually associated with human reasoning — language, perception, decision-making. Coined at the 1956 Dartmouth Summer Research Project. In everyday 2026 use, "AI" almost always means a large language model like ChatGPT, Claude, or Gemini, even though the textbook definition is much broader.
e.g. When a product page says "AI-powered", it could mean a 70-billion-parameter LLM or a hand-written if-statement. The label moves with the times.
AlgorithmML
In ML, the recipe used to turn data into a model — the architecture plus the training procedure. Different algorithms (decision trees, gradient-boosted trees, neural networks, transformers) produce different model types.
ChatGPTAI
OpenAIs consumer chat product, launched November 30, 2022. The first LLM to reach mass adoption — 100 million users in two months. The product most people mean when they say AI today.
DatasetData
The collection of examples a model learns from during training. The shape, size, quality, and bias of the dataset determines almost everything about the resulting model.
Deep LearningDL
A subset of machine learning that uses neural networks with many layers ("deep" stacks). Powers image recognition, speech, and the LLMs behind ChatGPT/Claude/Gemini. Needs much more data and compute than classical ML, but scales further.
e.g. Every modern LLM is a deep-learning model — a transformer with billions of parameters trained on internet-scale text.
GPTAI
OpenAIs family of large language models — Generative Pre-trained Transformer. GPT-4 (2023) and successors are the most widely used closed-source LLMs in production.
Large Language ModelAI
A deep-learning model trained on huge volumes of text to predict the next token given the previous ones. Scaling next-token prediction to billions of parameters yields the chat-like behaviour of ChatGPT, Claude, and Gemini. Capabilities are bounded by training data and the context window.
e.g. Claude is an LLM — it reads your message as tokens and generates a response one token at a time.
Machine LearningML
A subset of AI where the system learns patterns from data instead of following hand-written rules. The output is a model — a set of learned numbers that maps inputs to outputs. Spam filters, recommendation systems, and credit-risk scorers are classical ML.
e.g. Gmail's spam filter learns which emails you mark as junk and updates its model — that's machine learning, not a rule someone wrote.
ModelML
In ML, a model is a file of learned numbers (parameters or weights) plus an architecture that tells the program how to use them. Loading a model means reading those numbers; running it means doing arithmetic with them.
Neural NetworkDL
A model architecture loosely inspired by neurons in the brain — in practice, a stack of matrix multiplications with non-linear functions between them. Deep learning is what you get when you stack many layers of these and train them on a lot of data.
ParametersML
The individual learned numbers inside a model. "7B parameters" means 7 billion of them. More parameters generally means more capacity, more memory needed, and slower inference.
TrainingML
The expensive one-time process of running a learning algorithm over data until the models parameters settle into useful values. Frontier-model training costs $100M+ and tens of thousands of GPUs.
TransformerDL
The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.
WeightsML
The numbers inside a trained model. They start out random and get adjusted during training until they encode the patterns in the data. "Open weights" means the trained numbers are downloadable; it does not mean the training data or code is open.

Rate this article

How helpful did you find this?

Newsletter

Get new articles in your inbox

AI engineering, LLM systems, and software architecture — no filler.

No spam. Unsubscribe any time.

Discussion

Comments

Leave a note about the article, architecture choices, or what you would build next.

Comments are stored in Supabase and fetched per post slug.

Loading comments...