4 min read

Inside AI: machine learning and deep learning

Open the AI umbrella. Machine learning is the part that learns from data. Deep learning is ML done with neural networks — and that's where today's models live.

Inside AI: machine learning and deep learning

Post 1 ended on a picture: AI is the umbrella, ML sits underneath, and DL sits inside ML. This post opens that umbrella. The three terms get used as if they were synonyms in marketing copy. They aren't.

This is post 2 of 8 in the Foundations series.

What machine learning actually is

Classical software is written by hand. A developer thinks about the rules, types them out, the program runs them. "If the email contains the word 'lottery' three times, mark it spam."

Machine learning flips that. You don't write the rules. You show the program a few hundred thousand emails labelled spam or not-spam, and it figures out the rules itself by adjusting numbers until its guesses match the labels. The output is a model: a frozen set of numbers that, given a new email, produces a probability.

The key shift: rules come from data, not from the programmer. Everything else in ML is variation on that theme.

Three common flavours:

  • Supervised learning: the most common. You give it labelled examples (email → spam/not-spam) and it learns the mapping.
  • Unsupervised learning: no labels. The model finds structure in the data on its own. Customer segmentation, anomaly detection, clustering.
  • Reinforcement learning: the model takes actions and gets rewards. How DeepMind taught computers to play Go in 2016, and a big part of how modern chatbots get tuned today.

Most "AI features" you've used in the last decade (Gmail's spam filter, Netflix recommendations, fraud detection on your card) are supervised ML. No neural network in sight. Just statistics, lots of data, and a learned function.

Where deep learning fits

Deep learning is one technique inside ML. It uses neural networks, math objects loosely inspired by neurons in the brain, though the resemblance is mostly a naming choice. A neural network is a pile of matrices with simple non-linear functions between them. "Deep" just means you stack a lot of them.

Neural networks have been around since the 1950s. They didn't dominate until the 2010s for two reasons:

  1. We finally had enough data. ImageNet, the dataset that kicked off the deep-learning era, has 14 million labelled images.
  2. We finally had GPUs that could train them in days instead of years.

The 2012 AlexNet result on ImageNet, a deep network that crushed every classical ML approach at image classification, was the moment the field tilted. After that, every hard ML problem (vision, speech, translation) started getting solved by deeper networks with more data and more compute.

LLMs are the current peak of that trajectory. A model like GPT-4 or Claude is a very deep neural network with hundreds of billions of parameters, trained on a large fraction of the public internet. But the recipe is the same as 2012: more layers, more data, more GPUs.

Why the three terms get conflated

Three reasons.

One, marketing. "AI" sells. Companies say "AI" when they mean "a logistic regression we trained last Tuesday". Nobody fact-checks them.

Two, the boundary between ML and DL is real but not load-bearing for most users. If you're calling an API, you don't care whether the model is a gradient-boosted tree or a 70B-parameter transformer. You care about accuracy, latency, and cost.

Three, since 2022, the most visible AI is LLMs, and LLMs are deep learning, and deep learning is ML, and ML is AI. So when someone says "AI" in 2026 they're often pointing at all four nested levels at once. The umbrella, the layer, the technique, and the specific model, all collapsed into one word.

The useful distinction in your head:

  • AI: the field. Any system that does something "smart".
  • ML: systems that learn from data. The dominant approach inside AI today.
  • DL: ML done with deep neural networks. The dominant approach inside ML for hard problems.

What to take away

  • ML is software that writes its own rules from data, instead of being hand-coded.
  • Three flavours: supervised (labelled data), unsupervised (find structure), reinforcement (actions and rewards). Most production AI is supervised.
  • Deep learning is ML done with deep neural networks. It took over in the 2010s once we had GPUs and large datasets.
  • LLMs are deep learning. So when someone says "AI" today, they usually mean a deep-learning model, which is a kind of ML model.

That's the layered picture. Both ML and DL produce the same kind of artifact at the end: a model. But what is a model, exactly? A file? A function? Some learned numbers? That's post 3.

From the dictionary

Terms used in this post

Quick reference for the 16 terms you met above. Each one comes from the AI dictionary.

Artificial IntelligenceAI
Umbrella term for software that performs tasks usually associated with human reasoning — language, perception, decision-making. Coined at the 1956 Dartmouth Summer Research Project. In everyday 2026 use, "AI" almost always means a large language model like ChatGPT, Claude, or Gemini, even though the textbook definition is much broader.
e.g. When a product page says "AI-powered", it could mean a 70-billion-parameter LLM or a hand-written if-statement. The label moves with the times.
ClaudeAI
Anthropic's family of LLMs (Opus, Sonnet, Haiku) and consumer chat product at claude.ai. Used in this blog's tooling for drafting and dictionary work; also powers Claude Code, the CLI agent.
e.g. This blog's create-post skill drafts inline using Claude.
DatasetData
The collection of examples a model learns from during training. The shape, size, quality, and bias of the dataset determines almost everything about the resulting model.
Deep LearningDL
A subset of machine learning that uses neural networks with many layers ("deep" stacks). Powers image recognition, speech, and the LLMs behind ChatGPT/Claude/Gemini. Needs much more data and compute than classical ML, but scales further.
e.g. Every modern LLM is a deep-learning model — a transformer with billions of parameters trained on internet-scale text.
GPTAI
OpenAIs family of large language models — Generative Pre-trained Transformer. GPT-4 (2023) and successors are the most widely used closed-source LLMs in production.
GPUGeneral
A chip built for massive parallel arithmetic. The reason deep learning took off in the 2010s — GPUs make matrix multiplication fast enough to train deep networks in days instead of years. Nvidia dominates the market.
Large Language ModelAI
A deep-learning model trained on huge volumes of text to predict the next token given the previous ones. Scaling next-token prediction to billions of parameters yields the chat-like behaviour of ChatGPT, Claude, and Gemini. Capabilities are bounded by training data and the context window.
e.g. Claude is an LLM — it reads your message as tokens and generates a response one token at a time.
Machine LearningML
A subset of AI where the system learns patterns from data instead of following hand-written rules. The output is a model — a set of learned numbers that maps inputs to outputs. Spam filters, recommendation systems, and credit-risk scorers are classical ML.
e.g. Gmail's spam filter learns which emails you mark as junk and updates its model — that's machine learning, not a rule someone wrote.
ModelML
In ML, a model is a file of learned numbers (parameters or weights) plus an architecture that tells the program how to use them. Loading a model means reading those numbers; running it means doing arithmetic with them.
Neural NetworkDL
A model architecture loosely inspired by neurons in the brain — in practice, a stack of matrix multiplications with non-linear functions between them. Deep learning is what you get when you stack many layers of these and train them on a lot of data.
ParametersML
The individual learned numbers inside a model. "7B parameters" means 7 billion of them. More parameters generally means more capacity, more memory needed, and slower inference.
Reinforcement LearningML
ML where an agent takes actions, gets rewards, and learns a policy that maximises long-run reward. Behind AlphaGo and a key part of how modern LLMs are tuned to be helpful (RLHF).
Supervised LearningML
The most common flavour of ML: you give the model labelled examples (input → correct output) and it learns the mapping. Spam classification, fraud detection, image recognition — all supervised.
TrainingML
The expensive one-time process of running a learning algorithm over data until the models parameters settle into useful values. Frontier-model training costs $100M+ and tens of thousands of GPUs.
TransformerDL
The neural-network architecture every modern LLM is built on. Introduced by Google in the 2017 paper "Attention Is All You Need". GPT, Claude, Gemini, Llama, Mistral — all transformers.
Unsupervised LearningML
ML without labels. The model finds structure in the data on its own — clusters, anomalies, or representations. Used for customer segmentation, anomaly detection, and the pre-training step of large models.

Rate this article

How helpful did you find this?

Newsletter

Get new articles in your inbox

AI engineering, LLM systems, and software architecture — no filler.

No spam. Unsubscribe any time.

Discussion

Comments

Leave a note about the article, architecture choices, or what you would build next.

Comments are stored in Supabase and fetched per post slug.

Loading comments...