1 results for tag "turboquant-pytorch"
From-scratch PyTorch implementation of Google TurboQuant (ICLR 2026) for LLM KV-cache compression: Stage 1 random orthogonal rotation + Lloyd-Max scalar quantization, Stage 2 QJL 1-bit sign residual correction for unbiased inner products β achieving 5x compression at 3-bit (58MB vs 289MB FP16) with 99.5% attention fidelity on Qwen2.5-3B.