Google TurboQuant: 6x KV-Cache Compression with Zero Accuracy Loss

March 30, 2026Posted by Federico Ulfo

TurboQuant

Google releases TurboQuant, a compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup with zero accuracy loss. The technique combines online vector quantization ideas from PolarQuant and earlier work. Community members have already implemented it for vLLM, fitting 4M+ KV-cache tokens on small devices, calling it the biggest open inference breakthrough of 2026.

Sources: google blog, tweet, Simple Explainer