AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
Eric Liam
Eric Liam

Public Documents 1
TurboQuant-Pro: Neural-Adaptive Online Vector Quantization for Large Language Models
Eric Liam

Eric Liam

April 10, 2026
Large Language Models (LLMs) exhibit heavytailed and highly non-Gaussian weight distributions, posing a major challenge for online vector quantization. While TurboQuant provides an efficient baseline via random orthogonal rotation and Lloyd-Max quantization, it fails under structural outliers and distributional shifts. In this paper, we present TurboQuant-Pro, a systematic evolution from heuristic-driven quantization toward a fully neural-adaptive framework. We demonstrate that only by fully parameterizing the quantization process can near-lossless compression be achieved. Our final model, TurboQuant-ProV3, reaches a cosine similarity of 0.9957, while the multi-codebook ProV4 further approaches theoretical limits.

| Powered by Authorea.com

  • Home