Streaming, throughput, and the KV cache
TTFT vs tok/s, why streaming feels faster, and the KV cache that makes the 1000th token cost the same as the first. KV cache quantization (Q8/Q4 KV) and why it should be your default.
Blog
Posts on AI engineering, LLM systems, and software development.
TTFT vs tok/s, why streaming feels faster, and the KV cache that makes the 1000th token cost the same as the first. KV cache quantization (Q8/Q4 KV) and why it should be your default.