System requirements by OS for local LLMs
What macOS, Linux, and Windows each need to run a local LLM in 2026. Native Windows now works smoothly; WSL2 for Linux power users; Mac is the smoothest path; Linux gives you the most knobs.

A local LLM runs on every modern operating system, but each one wants different things from your hardware and your install commands. This post is the per-OS prerequisites: what you need on Mac, Linux, and Windows before you pull your first model.
This is post 6 of 13 in the Local LLMs series. After this you'll know exactly which boxes your specific machine ticks (or doesn't) and what to install before continuing.
The universal minimums

Regardless of OS, you need:
- 8 GB RAM, ideally 16 GB. Below 8, you're limited to the smallest models (Phi-3 mini, Llama 3.2 1B). With 16, the practical universe opens up.
- 20 GB free disk. Models are big files. A starter set of three models is around 10 GB; you'll add to it.
- 64-bit OS, modern processor. Nothing made in the last 8 years is too old. Laptops with M1/M2/M3 Macs, AMD Ryzen 3000+, Intel Core 8th gen+, modern AMD/Intel desktops are all fine.
If your laptop runs a current browser fluidly, it can run a local LLM. The minimums are not the constraint people imagine.
macOS
The path of least resistance for local LLMs in 2026. Apple Silicon's unified memory means the GPU sees all of RAM (the Running series covered this in detail). For a Mac:
- Architecture: M1 or newer. The Intel-era Macs work but are 5–10x slower; not worth pursuing for local LLMs.
- macOS 13 (Ventura) or newer. Most runtimes target 14+ now.
- Memory. 8GB minimum (1B–3B models only); 16GB is the comfortable floor; 32GB+ is where things get interesting.
- Backend. Metal. The Mac GPU programming model. Every major local-LLM runtime supports it natively.
What you install:
# install homebrew if you don't have it (one-liner from brew.sh)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# install ollama via homebrew
brew install ollama
That's it. No driver, no kernel module, no CUDA installer. You're done.
For LM Studio, download the .dmg from lmstudio.ai and drag to Applications.
A note on the M-series: M3, M4 are faster than M2 are faster than M1, but the unified-memory advantage is the bigger deal. A 16GB M1 will outrun a 16GB Intel Mac with discrete graphics by a wide margin. If you have an M-series Mac at all, you're in great shape.
Linux
The most flexible OS for local LLMs and the one most production deployments use. Three GPU paths and one fallback:
- NVIDIA GPU + CUDA. The mainstream path. RTX 30, 40, 50 series consumer cards; Tesla / A / H series datacenter cards. CUDA 12.1+ recommended.
- AMD GPU + ROCm. AMD's CUDA equivalent. Improved a lot in 2025–2026 but still rougher than CUDA. RX 7900 XT/XTX, MI300 datacenter cards. Set
HSA_OVERRIDE_GFX_VERSIONfor older cards. - Vulkan backend. Cross-vendor GPU compute that works on AMD, Intel Arc, and even integrated Intel/AMD graphics. Slower than CUDA/ROCm but it works on hardware nothing else supports.
- CPU only. Always available. 2–8 tok/s on a 7B model. Fine for testing, painful for real use.
NVIDIA + Ubuntu is the smoothest. Pick that combination if you're choosing fresh.
What you install (Ubuntu 22.04+ on NVIDIA):
# install nvidia driver if not already
sudo apt install nvidia-driver-550
# install ollama
curl -fsSL https://ollama.com/install.sh | sh
Ollama's installer detects CUDA, downloads the right binary, and registers a systemd service. After install, ollama list works and you can pull models.
For ROCm, AMD's rocm-install script plus llama.cpp built with LLAMA_HIPBLAS=1. Or LM Studio's Linux build, which handles ROCm internally.
Windows
The OS most engineers actually use, and the one most local-LLM tutorials skip. In 2026, two paths work, and the right one depends on how comfortable you are with command lines.
Path 1: native Windows (recommended for most)
The fastest, simplest setup. Ollama and LM Studio both ship native Windows binaries.
What you install:
- NVIDIA driver (any recent one). NVIDIA RTX 30/40/50 cards work out of the box.
- Ollama for Windows. Download from ollama.com, run the installer. It registers as a Windows service.
- LM Studio for Windows. Download
.exefrom lmstudio.ai.
Both detect your GPU automatically. No WSL, no Linux subsystem, no compilation. From a fresh Windows 11 install, you can have a model running in 10 minutes.
# verify the install (PowerShell)
ollama run llama3.2:3b
Single best surprise of 2025 for Windows users: native local LLMs now work as smoothly as macOS.
Path 2: WSL2 + Ubuntu (for Linux power users on Windows)
If you live in Linux tools and just happen to be on a Windows machine, WSL2 is the choice. NVIDIA's CUDA driver passes GPU access through to the WSL2 kernel automatically as of 2024.
Install:
# enable wsl2 and install ubuntu
wsl --install
# inside the wsl2 ubuntu, install ollama
curl -fsSL https://ollama.com/install.sh | sh
Then it's identical to native Linux. Performance is within 5% of bare-metal Linux on the same hardware.
What about AMD on Windows?
ROCm on Windows is improving but still rough as of mid-2026. If you have an AMD GPU on Windows, the practical path is LM Studio's Vulkan backend, which works without any custom drivers. Slower than CUDA but functional.
Cross-OS notes
Things that are the same everywhere:
- GGUF files are portable. A model downloaded on a Mac runs on Linux runs on Windows.
- Ollama's daemon protocol is the same. A client running on macOS can talk to an Ollama server running on Linux on your home network.
- LM Studio's UI is identical. Same buttons, same model browser.
Verifying your setup before continuing
Before pulling models, a quick sanity check:
# check ollama is installed
ollama --version
# pull and run the smallest useful model
ollama run llama3.2:1b "say hi"
If you get "say hi" back as a response (or some friendly equivalent), your install is correct. If the model takes more than 30 seconds to download or to first-token, your network or your hardware is the bottleneck , not the install.
Common stumbles per OS
- Mac, "model is slow." Activity Monitor → GPU tab. If the GPU is at 0%, your runtime is on CPU. Check the Ollama / LM Studio settings.
- Linux, "no GPU detected."
nvidia-smishould show your card. If it doesn't, the driver isn't loaded.lsmod | grep nvidia. Reboot after driver install. - Windows native, "Ollama can't find GPU." Update NVIDIA driver. Reboot. Check Task Manager → Performance → GPU is showing CUDA usage when a model runs.
- WSL2, "very slow." Make sure you're on WSL2 not WSL1:
wsl --list --verbose. WSL1 has no GPU access.
What's next
OS sorted, drivers verified. The next post is the per-tier hardware tour: exactly what runs comfortably on a 16GB MacBook Air, a 24GB GPU, an old laptop with integrated graphics. Concrete model recommendations for every common machine.
From the dictionary
Terms used in this post
Quick reference for the 8 terms you met above. Each one comes from the AI dictionary.
- GGUFML
- GPT-Generated Unified Format. A single-file binary format for storing quantized model weights, tokenizer, and metadata. Used by llama.cpp, Ollama, and LM Studio. A 7B model in Q4 quantization is roughly 4GB; the same model in Q8 is roughly 8GB.
- GPUGeneral
- A chip built for massive parallel arithmetic. The reason deep learning took off in the 2010s — GPUs make matrix multiplication fast enough to train deep networks in days instead of years. Nvidia dominates the market.
- llama.cppAI
- A C++ implementation of LLM inference designed to run quantized models on consumer hardware (CPU, CUDA, Metal, Vulkan). The de-facto local inference engine that Ollama and LM Studio both wrap. Supports GGUF format, has a built-in HTTP server, and is the reference for what local LLMs can actually do.
- Large Language ModelAI
- A deep-learning model trained on huge volumes of text to predict the next token given the previous ones. Scaling next-token prediction to billions of parameters yields the chat-like behaviour of ChatGPT, Claude, and Gemini. Capabilities are bounded by training data and the context window.
- e.g. Claude is an LLM — it reads your message as tokens and generates a response one token at a time.
- LM StudioAI
- A GUI app for running local LLMs, wrapping llama.cpp with a chat interface and a model browser. Easier than Ollama for non-CLI users; same underlying engine. Useful for quick model evaluation; less useful for scripting or production-style workflows.
- ModelML
- In ML, a model is a file of learned numbers (parameters or weights) plus an architecture that tells the program how to use them. Loading a model means reading those numbers; running it means doing arithmetic with them.
- OllamaAI
- A wrapper around llama.cpp that makes running local LLMs a one-command operation. Pulls quantized GGUF models from a registry, exposes an HTTP API on localhost:11434, and handles model loading/unloading. The most common on-ramp to local inference in 2026.
- TokenNLP
- The unit an LLM operates on — roughly a word or piece of one. English averages around 4 characters per token. Tokens are the unit of computation, the unit of API billing, and the unit the context window is measured in.
Rate this article
How helpful did you find this?
- 01
Troubleshooting local LLMs and keeping up
May 15, 2026
- 02
Fine-tuning a model locally
May 12, 2026
- 03
Local agents and tool use
May 8, 2026
- 04
Local RAG and embeddings
May 5, 2026
- 05
Integrating a local LLM into your workflow
May 1, 2026
- 06
Your first local LLM, end to end
April 28, 2026
- 07
Every machine can run a local LLM (here's what fits)
April 24, 2026
- 08
System requirements by OS for local LLMs
April 21, 2026
- 09
Picking a local model by task
April 17, 2026
- 10
Streaming, throughput, and the KV cache
April 14, 2026
- 11
Quantization, distillation, pruning: making models fit
April 10, 2026
- 12
The local-LLM vocabulary
April 7, 2026
- 13
The pitch for local LLMs in 2026
April 3, 2026
Newsletter
Get new articles in your inbox
AI engineering, LLM systems, and software architecture — no filler.
No spam. Unsubscribe any time.
Discussion
Comments
Leave a note about the article, architecture choices, or what you would build next.
Loading comments...