
CommunityNews
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Quantizing large language models has become a standard way to reduce their memory and computational costs. Typically, existing methods focus on breaking down the problem into individual layer-wise sub-problems, and minimizing per-layer error, measured via various metrics. Yet, this approach currently lacks theoretical justification and the metrics employed may be sub-optimal. In this paper, we present a “linearity theorem” establishing a direct relationship between the layer-wise $\ell_2$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, which outperforms all prior data-free approaches such as the extremely popular NF4 quantized format, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels which match a given compression constraint in the medium-bitwidth regime, obtained by reduction to dynamic programming. On the practical side, we demonstrate improved accuracy-compression trade-offs on Llama-3.1 and 3.2-family models, as well as on Qwen-family models. Further, we show that our method can be efficiently supported in terms of GPU kernels at various batch sizes, advancing both data-free and non-uniform quantization for LLMs.
Read in full here:
Popular General Dev topics










Other popular topics









Latest in In The News
Latest (all)
Categories:
Sub Categories:
- All
- In The News
- Dev Chat (200)
- Questions (29)
- Resources (118)
- Blogs/Talks (26)
- Jobs (3)
- Events (14)
- Code Editors (58)
- Hardware (57)
- Reviews (3)
- Sales (15)
- Design & UX (4)
- Marketing & SEO (1)
- Industry & Culture (15)
- Ethics & Privacy (19)
- Business (4)
- Learning Methods (4)
- Content Creators (7)
- DevOps & Hosting (9)
Popular Portals
- /elixir
- /rust
- /wasm
- /ruby
- /erlang
- /phoenix
- /keyboards
- /rails
- /js
- /python
- /security
- /go
- /swift
- /vim
- /clojure
- /java
- /haskell
- /emacs
- /svelte
- /onivim
- /typescript
- /crystal
- /c-plus-plus
- /tailwind
- /kotlin
- /gleam
- /react
- /flutter
- /elm
- /ocaml
- /vscode
- /opensuse
- /ash
- /centos
- /php
- /deepseek
- /html
- /zig
- /scala
- /debian
- /nixos
- /lisp
- /agda
- /sublime-text
- /textmate
- /react-native
- /kubuntu
- /arch-linux
- /revery
- /ubuntu
- /manjaro
- /django
- /spring
- /diversity
- /nodejs
- /lua
- /julia
- /c
- /slackware
- /neovim