February 26, 20264 min read

Inside AI: machine learning and deep learning

Open the AI umbrella. Machine learning is the part that learns from data. Deep learning is ML done with neural networks, and that's where today's models live.

Post 1 ended on a picture. AI is the umbrella, machine learning sits under it, and deep learning sits inside ML. Marketing copy throws all three around as if they were the same word. They're not, and the difference is worth ten minutes of your time, because it tells you what a product can actually do.

This is post 2 of 8 in the Foundations series.

Machine learning: software that writes its own rules

Normal software is written by hand. A developer thinks up the rules, types them out, and the program runs them. "If the email says 'lottery' three times, mark it spam." Clear, rigid, and entirely the programmer's doing.

Machine learning flips that around. You don't write the rules. You show the program a few hundred thousand emails, each labelled spam or not-spam, and it works out the rules itself by nudging numbers up and down until its guesses line up with the labels. What you get at the end is a model: a frozen set of numbers that takes a new email and hands back a probability.

That's the whole shift. The rules come from the data, not from the person at the keyboard. Everything else in ML is a variation on that one idea, and there are three you'll run into most.

Supervised learning is the common one. You hand it labelled examples (email, then spam-or-not) and it learns the mapping.
Unsupervised learning gets no labels. The model finds structure on its own: customer segments, odd transactions, clusters you didn't know were there.
Reinforcement learning learns by doing. The model takes actions, gets rewards, and adjusts. It's how DeepMind taught a machine to play Go in 2016, and it's a big part of how today's chatbots get tuned.

Here's the part that surprises people. Almost every "AI feature" you've touched in the last decade, Gmail's spam filter, Netflix recommendations, the fraud check on your card, is supervised ML. No neural network anywhere. Just statistics, a lot of data, and a learned function.

Deep learning: one technique inside ML

Deep learning is a single technique that lives inside ML. It uses neural networks, which are math objects loosely named after brain neurons, though the resemblance pretty much ends at the name. Strip away the mystique and a neural network is a stack of matrices with simple bending functions between them. "Deep" just means the stack is tall.

Neural networks aren't new. They go back to the 1950s. They sat on the shelf until the 2010s for two very practical reasons.

We finally had enough data. ImageNet, the dataset that lit the fuse, holds 14 million labelled images.
We finally had GPUs that could train these things in days instead of years.

The tipping point even had a name: AlexNet. In 2012 it beat every classical ML approach at image classification, and it wasn't close. After that the whole field leaned in. Vision, speech, translation, every hard problem started falling to deeper networks fed more data and more compute.

LLMs are where that trajectory stands today. A model like GPT-4 or Claude is a very deep neural network with hundreds of billions of parameters, trained on a huge slice of the public internet. Fancy as that sounds, the recipe hasn't changed since 2012: more layers, more data, more GPUs.

Why people smush the three words together

Partly it's marketing. "AI" sells, so a company will say "AI" when it means "a logistic regression we trained last Tuesday," and nobody checks.

Partly it's that the line between ML and DL is real but doesn't matter to most people using it. If you're calling an API, you don't care whether the model behind it is a gradient-boosted tree or a 70-billion-parameter transformer. You care about accuracy, speed, and cost.

And partly it's the nesting. Since 2022 the most visible AI has been LLMs. An LLM is deep learning, deep learning is ML, and ML is AI. So when someone says "AI" in 2026 they're often pointing at all four levels at once: the umbrella, the layer, the technique, and the one specific model, collapsed into two letters.

If you keep one map in your head, make it this one.

AI is the field. Any system that does something we'd call smart.
ML is the systems that learn from data. The dominant approach inside AI today.
DL is ML done with deep neural networks. The dominant approach inside ML for the hard problems.

Both ML and DL hand you the same thing at the end of the day: a model. So what is a model, exactly? A file? A function? Just a pile of learned numbers? That's post 3.

AI Deep Learning From Scratch Fundamentals ML

From the dictionary

Terms used in this post

Quick reference for the 17 terms you met above. Each one comes from the AI dictionary.

Artificial IntelligenceAI: Umbrella term for software that performs tasks usually associated with human reasoning — language, perception, decision-making. Coined at the 1956 Dartmouth Summer Research Project. In everyday 2026 use, "AI" almost always means a large language model like ChatGPT, Claude, or Gemini, even though the textbook definition is much broader.; e.g. When a product page says "AI-powered", it could mean a 70-billion-parameter LLM or a hand-written if-statement. The label moves with the times.
ClaudeAI: Anthropic's family of LLMs (Opus, Sonnet, Haiku) and consumer chat product at claude.ai. Used in this blog's tooling for drafting and dictionary work; also powers Claude Code, the CLI agent.; e.g. This blog's create-post skill drafts inline using Claude.
DatasetData: The collection of examples a model learns from during training. The shape, size, quality, and bias of the dataset determines almost everything about the resulting model.
Deep LearningDL: A subset of machine learning that uses neural networks with many layers ("deep" stacks). Powers image recognition, speech, and the LLMs behind ChatGPT/Claude/Gemini. Needs much more data and compute than classical ML, but scales further.; e.g. Every modern LLM is a deep-learning model — a transformer with billions of parameters trained on internet-scale text.
GPTAI: OpenAIs family of large language models — Generative Pre-trained Transformer. GPT-4 (2023) and successors are the most widely used closed-source LLMs in production.
GPUGeneral: A chip built for massive parallel arithmetic. The reason deep learning took off in the 2010s — GPUs make matrix multiplication fast enough to train deep networks in days instead of years. Nvidia dominates the market.
Large Language ModelAI: A deep-learning model trained on huge volumes of text to predict the next token given the previous ones. Scaling next-token prediction to billions of parameters yields the chat-like behaviour of ChatGPT, Claude, and Gemini. Capabilities are bounded by training data and the context window.; e.g. Claude is an LLM — it reads your message as tokens and generates a response one token at a time.
Machine LearningML: A subset of AI where the system learns patterns from data instead of following hand-written rules. The output is a model — a set of learned numbers that maps inputs to outputs. Spam filters, recommendation systems, and credit-risk scorers are classical ML.; e.g. Gmail's spam filter learns which emails you mark as junk and updates its model — that's machine learning, not a rule someone wrote.
ModelML: In ML, a model is a file of learned numbers (parameters or weights) plus an architecture that tells the program how to use them. Loading a model means reading those numbers; running it means doing arithmetic with them.
Neural NetworkDL: A model architecture loosely inspired by neurons in the brain — in practice, a stack of matrix multiplications with non-linear functions between them. Deep learning is what you get when you stack many layers of these and train them on a lot of data.
ParametersML: The individual learned numbers inside a model. "7B parameters" means 7 billion of them. More parameters generally means more capacity, more memory needed, and slower inference.
Reinforcement LearningML: ML where an agent takes actions, gets rewards, and learns a policy that maximises long-run reward. Behind AlphaGo and a key part of how modern LLMs are tuned to be helpful (RLHF).
Supervised LearningML: The most common flavour of ML: you give the model labelled examples (input → correct output) and it learns the mapping. Spam classification, fraud detection, image recognition — all supervised.
TrainingML: The expensive one-time process of running a learning algorithm over data until the models parameters settle into useful values. Frontier-model training costs $100M+ and tens of thousands of GPUs.
TransformerDL: The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.
Unsupervised LearningML: ML without labels. The model finds structure in the data on its own — clusters, anomalies, or representations. Used for customer segmentation, anomaly detection, and the pre-training step of large models.
APIGeneral: Application Programming Interface. In LLM context: the HTTP endpoint a hosted model exposes (api.openai.com, api.anthropic.com). You send JSON, you get tokens back. The cloud-inference contract.

Rate this article

How helpful did you find this?

Series

AI Foundations

2 / 8 posts

Browse all in AI Foundations →

Newsletter

Get new articles in your inbox

AI engineering, LLM systems, and software architecture — no filler.

No spam. Unsubscribe any time.

Discussion

Comments

Leave a note about the article, architecture choices, or what you would build next.

Loading comments...

On this page

From the dictionary

Terms in this post

Artificial IntelligenceAI: Umbrella term for software that performs tasks usually associated with human reasoning — language, perception, decision-making. Coined at the 1956 Dartmouth Summer Research Project. In everyday 2026 use, "AI" almost always means a large language model like ChatGPT, Claude, or Gemini, even though the textbook definition is much broader.
ClaudeAI: Anthropic's family of LLMs (Opus, Sonnet, Haiku) and consumer chat product at claude.ai. Used in this blog's tooling for drafting and dictionary work; also powers Claude Code, the CLI agent.
DatasetData: The collection of examples a model learns from during training. The shape, size, quality, and bias of the dataset determines almost everything about the resulting model.
Deep LearningDL: A subset of machine learning that uses neural networks with many layers ("deep" stacks). Powers image recognition, speech, and the LLMs behind ChatGPT/Claude/Gemini. Needs much more data and compute than classical ML, but scales further.
GPTAI: OpenAIs family of large language models — Generative Pre-trained Transformer. GPT-4 (2023) and successors are the most widely used closed-source LLMs in production.
GPUGeneral: A chip built for massive parallel arithmetic. The reason deep learning took off in the 2010s — GPUs make matrix multiplication fast enough to train deep networks in days instead of years. Nvidia dominates the market.
Large Language ModelAI: A deep-learning model trained on huge volumes of text to predict the next token given the previous ones. Scaling next-token prediction to billions of parameters yields the chat-like behaviour of ChatGPT, Claude, and Gemini. Capabilities are bounded by training data and the context window.
Machine LearningML: A subset of AI where the system learns patterns from data instead of following hand-written rules. The output is a model — a set of learned numbers that maps inputs to outputs. Spam filters, recommendation systems, and credit-risk scorers are classical ML.
ModelML: In ML, a model is a file of learned numbers (parameters or weights) plus an architecture that tells the program how to use them. Loading a model means reading those numbers; running it means doing arithmetic with them.
Neural NetworkDL: A model architecture loosely inspired by neurons in the brain — in practice, a stack of matrix multiplications with non-linear functions between them. Deep learning is what you get when you stack many layers of these and train them on a lot of data.
ParametersML: The individual learned numbers inside a model. "7B parameters" means 7 billion of them. More parameters generally means more capacity, more memory needed, and slower inference.
Reinforcement LearningML: ML where an agent takes actions, gets rewards, and learns a policy that maximises long-run reward. Behind AlphaGo and a key part of how modern LLMs are tuned to be helpful (RLHF).
Supervised LearningML: The most common flavour of ML: you give the model labelled examples (input → correct output) and it learns the mapping. Spam classification, fraud detection, image recognition — all supervised.
TrainingML: The expensive one-time process of running a learning algorithm over data until the models parameters settle into useful values. Frontier-model training costs $100M+ and tens of thousands of GPUs.
TransformerDL: The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.
Unsupervised LearningML: ML without labels. The model finds structure in the data on its own — clusters, anomalies, or representations. Used for customer segmentation, anomaly detection, and the pre-training step of large models.
APIGeneral: Application Programming Interface. In LLM context: the HTTP endpoint a hosted model exposes (api.openai.com, api.anthropic.com). You send JSON, you get tokens back. The cloud-inference contract.

Series

AI Foundations

2 / 8 posts

Browse all in AI Foundations →