arxiv_cs_lg 2026年4月24日

コーディングベンチマークにおけるディフューョン言語モデルの量子化頑丈性について

On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks

Translated: 2026/4/24 19:55:58

diffusion-modelsquantizationllmcoding-benchmarksptq

Japanese Translation

arXiv:2604.20079v1 Announce Type: new アブストラクト：自己回帰型大規模言語モデル（LLM）はコーディングタスクで強固なパフォーマンスを達成しますが、高昂したメモリと推論コストを伴います。拡散ベースの言語モデル（d-LLM）は反復的去ノイズにより有界した推論コストを提供しますが、トレーニング後の量子化（PTQ）下での其行为は疎かに探求されてきました。我々は、拡散ベースのコーディング LLM（CoDA）に GPTQ と変形されたヘッセ感知量子化（HAWQ）アルゴリズムという PTQ 技術を適用し、標準評価パイプライン下でこれらの手法が低ビット幅においてその対照的な自己回帰型モデル（Qwen3-1.7B）と比較してより大きな頑丈性を示すことを観察しました。我々は、我々のセットアップにおいて CoDA が低ビット幅（2-4 ビット）でより大きな頑丈性を示し、HumanEval および MBPP ベンチマークにおいて精度の劣化がより小さいことを発見しました。さらに、HAWQ から導出された混合精度構成は、精度、遅延、およびメモリに関する滑らかなトレードオフを提供します。これらの結果は、拡散 LLM がより量子化耐性を備えているため、効率的なデプロイメントの潜在的な利点を示唆しています。

Original Content

arXiv:2604.20079v1 Announce Type: new Abstract: Auto-regressive Large Language Models (LLMs) achieve strong performance on coding tasks, but incur high memory and inference costs. Diffusion-based language models (d-LLMs) offer bounded inference cost via iterative denoising, but their behavior under post-training quantization (PTQ) has been sparsely explored. We investigate the application and robustness of PTQ techniques, specifically GPTQ and a modified Hessian-Aware Quantization (HAWQ) algorithm, on a diffusion-based coding LLM (CoDA) and observe that these methods applied to CoDA exhibit greater robustness at low bitwidths compared to Qwen3-1.7B, its auto-regressive counterpart, under a standardized evaluation pipeline. We find that in our setup, CoDA exhibits greater robustness at low bitwidths (2-4 bits), with smaller accuracy degradation across HumanEval and MBPP benchmarks. Additionally, mixed-precision configurations derived from HAWQ provide smooth trade-offs across accuracy, latency, and memory. The results suggest that diffusion LLMs may offer advantages for efficient deployment due to more quantization-resilience.