Eric Liam -

Large Language Models (LLMs) exhibit heavytailed and highly non-Gaussian weight distributions, posing a major challenge for online vector quantization. While TurboQuant provides an efficient baseline via random orthogonal rotation and Lloyd-Max quantization, it fails under structural outliers and distributional shifts. In this paper, we present TurboQuant-Pro, a systematic evolution from heuristic-driven quantization toward a fully neural-adaptive framework. We demonstrate that only by fully parameterizing the quantization process can near-lossless compression be achieved. Our final model, TurboQuant-ProV3, reaches a cosine similarity of 0.9957, while the multi-codebook ProV4 further approaches theoretical limits.