← Blog
NPU
General/dictionary/npu
Definition
Neural Processing Unit. A dedicated chip optimized for tensor operations at low power, used for on-device AI. Examples: Apple Neural Engine (iPhone, Mac), Qualcomm Hexagon (Android), Google Tensor. Optimized for tokens-per-watt rather than peak throughput.
Related terms
Posts that use this term
- What it takes to run a model on your machine
Why VRAM is the hard ceiling on local LLMs, what quantization actually does to a model file, and the practical hardware ladder from 8GB laptops to 192GB workstations.
- Where AI actually runs: cloud, local, edge
Where the model file actually sits when you use AI: a datacenter GPU (cloud), your own machine (local), or the device's silicon (edge). The trade-offs and how to pick.