arxiv_cs_lg 2026年2月10日

Astro: アクティベーション導向構造正則化による外れ値頑健な LLM パストトレーニング量化

Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization

Translated: 2026/3/15 14:09:08

quantizationllmptqregularizationoutlier-robust

Japanese Translation

Weight-only パストトレーニング量化（PTQ）は効率的な大規模言語モデル（LLM）デプロイメントにおいて不可欠であるが、重量とアクティベーションの外れ値による精度劣化を被る。既存の緩和戦略はしばしば重大な限界に直面している：外れ値抑制が不十分である場合が多いか、あるいは推論レイテンシ、重い前処理、あるいは複雑な演算子融合への依存といったデプロイメントの非効率性を招きうる。これらの限界を解消するために、われわれは主要な洞察を活用した：過量パラメータ化された LLM は平ら極小点に収束しやすく、これは精度を損なうことなく重量が調整可能である広大な等価解空間を意味する。これに基づき、われわれはハードウェアフレンドリーかつ効率的な外れ値の影響を抑制するように設計された、アクティベーション導向構造正則化フレームワーク「Astro」を提案した。アクティベーション導向正則化目的を介して、Astro は本質的に頑健な重量を積極的に再構築し、高マグニチュードのアクティベーションに対応する重量外れ値を激しく抑制する。重要なのは、Astro は推論レイテンシをゼロに持ち、GPTQ といった主流の量化方法とは直交する点である。大規模実験の結果、Astro は非常に競争力のある性能を示した；特に LLaMA-2-7B において、Astro は古典的 LLM 量化手法に匹敵する性能を達成しつつ、学習ベースの回転手法よりも約 1/3 の量化時間でより優れていることを確認した。

Original Content

arXiv:2602.07596v1 Announce Type: new Abstract: Weight-only post-training quantization (PTQ) is crucial for efficient Large Language Model (LLM) deployment but suffers from accuracy degradation caused by weight and activation outliers. Existing mitigation strategies often face critical limitations: they either yield insufficient outlier suppression or incur significant deployment inefficiencies, such as inference latency, heavy preprocessing, or reliance on complex operator fusion. To resolve these limitations, we leverage a key insight: over-parameterized LLMs often converge to Flat Minima, implying a vast equivalent solution space where weights can be adjusted without compromising accuracy. Building on this, we propose Astro, an Activation-guided Structured Regularization framework designed to suppress the negative effects of outliers in a hardware-friendly and efficient manner. Leveraging the activation-guided regularization objective, Astro actively reconstructs intrinsically robust weights, aggressively suppressing weight outliers corresponding to high-magnitude activations without sacrificing model accuracy. Crucially, Astro introduces zero inference latency and is orthogonal to mainstream quantization methods like GPTQ. Extensive experiments show that Astro achieves highly competitive performance; notably, on LLaMA-2-7B, it achieves better performance than complex learning-based rotation methods with almost 1/3 of the quantization time.