arxiv_cs_lg 2026年2月10日

マルチスケール校準が量化に重要な理由

On the Importance of a Multi-Scale Calibration for Quantization

Translated: 2026/3/15 14:07:17

quantizationllmptqcalibrationhessian

Japanese Translation

arXiv:2602.07465v1 Announce Type: new 要旨：後処理量化（PTQ）は、大規模言語モデル（LLM）を効率的にデプロイする上で重要な役割を果たしており、小さな校準セットが量化性能に決定的な影響を与える。従来の手法では、固定長のランダムシーケンスに依存しており、LLMの入力長の可変性の性質を見落としている。入力長はアクティベーション分布に直接影響を与え、その結果、Hessian が捉えるウェイトの重要性が変わる。これにより、固定長の校準から得られる Hessian 推定値は、多様な入力シナリオにおいてウェイトの真の重要性を表現できなくなる可能性がある。我々は、長さ意識の Hessian 構築のためのシンプルかつ効果的な手法である MaCa（Matryoshka Calibration）を提案する。MaCa は (i) Hessian 推定にマルチスケールシーケンス長さ情報を統合し、(ii) 各シーケンスを独立したサンプルとして正規化することで、より安定かつ有用な Hessian を生成し、正確な量化を実現する。Qwen3、Gemma3、LLaMA3 などの最先端 LLM における実験は、低ビット量化条件下で常に精度を向上させることを示しており、既存の PTX フレームワークとの互換性がある軽量化の追加手段である。我々の知見では、これが初めてマルチスケール校準が LLM 量化においてどのような役割を果たすかを体系的に明らかにした研究である。

Original Content

arXiv:2602.07465v1 Announce Type: new Abstract: Post-training quantization (PTQ) is a cornerstone for efficiently deploying large language models (LLMs), where a small calibration set critically affects quantization performance. However, conventional practices rely on random sequences of fixed length, overlooking the variable-length nature of LLM inputs. Input length directly influences the activation distribution and, consequently, the weight importance captured by the Hessian, which in turn affects quantization outcomes. As a result, Hessian estimates derived from fixed-length calibration may fail to represent the true importance of weights across diverse input scenarios. We propose MaCa (Matryoshka Calibration), a simple yet effective method for length-aware Hessian construction. MaCa (i) incorporates multi-scale sequence length information into Hessian estimation and (ii) regularizes each sequence as an independent sample, yielding a more stable and fruitful Hessian for accurate quantization. Experiments on state-of-the-art LLMs (e.g., Qwen3, Gemma3, LLaMA3) demonstrate that MaCa consistently improves accuracy under low bit quantization, offering a lightweight enhancement compatible with existing PTQ frameworks. To the best of our knowledge, this is the first work to systematically highlight the role of multi-scale calibration in LLM quantization.