arxiv_cs_lg 2026年4月20日

Training-Free Confidence-Aware Calibration を用いた Diffusion ベースの大型言語モデルの透過率向上

Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration

Translated: 2026/4/20 11:04:55

cadllmdiffusionlarge-language-modelthroughputtraining-free

Japanese Translation

arXiv:2512.07173v4 Announce Type: replace Abstract: 私々は、Diffusion ベースの LLM（dLLM）の推論透過率を加速するための学習なし（training-free）手法である CadLLM を提唱します。まず、トークンアンマスティングの信頼度がブロックとステップの間どのように動的に変化するかを調査しました。この観察に基づき、アンマスティングされたトークンの平均信頼度に基づいて生成ブロックサイズ、ステップサイズ、および閾値を制御する軽量な適応的なアプローチを提示しました。さらに、サンプリングの広がりを規制するために辞書の一部を動的に活用し、Softmax オーバーヘッドを削減しました。CadLLM は KV-Cache ベースの dLLM と互換性がある Plug-and-play、モデルに依存しない手法です。人気のある 4 つのタスクにおける広範な実験は、CadLLM が最良の既存ベースラインに対する最大 1.1〜2.28 倍の透過率向上をもたらすと示しています。精度は同等です。

Original Content

arXiv:2512.07173v4 Announce Type: replace Abstract: We present CadLLM, a training-free method to accelerate the inference throughput of diffusion-based LLMs (dLLMs). We first investigate the dynamic nature of token unmasking confidence across blocks and steps. Based on this observation, we present a lightweight adaptive approach that controls the generation block size, step size, and threshold based on the average confidence of unmasked tokens. We further reduce softmax overhead by dynamically leveraging a subset of the vocabulary to regulate sampling breadth. CadLLM is a plug-and-play, model-agnostic method compatible with KV-cache-based dLLMs. Extensive experiments on four popular tasks demonstrate that CadLLM yields up to 1.1-2.28x throughput improvement over the state-of-the-art baseline with competitive accuracy.