arxiv_cs_lg 2026年2月10日

Inference-Time Scaling を用いたマスク付き離散拡散モデルの Remasking

Remasking Discrete Diffusion Models with Inference-Time Scaling

Translated: 2026/3/15 9:04:01

diffusion-modelsmachine-learningnatural-language-processinginference-scalingneural-stuff

Japanese Translation

arXiv:2503.00307v4 Announce Type: replace Abstract: 拡散モデルの成功には、反復的微修正を行う能力が貢献している：生成中に出力を繰り返し修正する。しかし、現代のマスク付き離散拡散モデルは、生成したトークンが誤りを生じてもそれ以上更新されないという欠点を有する。ここでは、事前学習したマスク付き拡散モデルに原理的な方法で適用可能な Remasking Diffusion Model (ReMDM) サンプラーを提案し、カスタムの Remasking 逆過程から導かれた離散拡散モデルに基ずく。特に興味深いのは、ReMDM が離散拡散モデルに推論時間の計算スケーリングの形を与えた点である。サンプリングステップ数を増やすことで、ReMDM は自動回帰モデルに近い品質の自然言語出力を生成し、計算リソースが限られている場合でも品質をより良好に維持する。さらに、ReMDM は離散化された画像におけるマスク付き拡散モデルのサンプル品質を向上させ、科学分野（例えば、分子設計）において ReMDM は拡散ガイダンスを促進し、古典的なマスク付けや一様ノイズ拡散と比較して制御可能性のパレート frontier を押し上げる。当プロジェクトページ（https://guanghanwang.com/remdm）にコードとプロジェクトの記事を掲載する。

Original Content

arXiv:2503.00307v4 Announce Type: replace Abstract: Part of the success of diffusion models stems from their ability to perform iterative refinement, i.e., repeatedly correcting outputs during generation. However, modern masked discrete diffusion lacks this capability: when a token is generated, it cannot be updated again, even when it introduces an error. Here, we address this limitation by introducing the remasking diffusion model (ReMDM) sampler, a method that can be applied to pretrained masked diffusion models in a principled way and that is derived from a discrete diffusion model with a custom remasking backward process. Most interestingly, ReMDM endows discrete diffusion with a form of inference-time compute scaling. By increasing the number of sampling steps, ReMDM generates natural language outputs that approach the quality of autoregressive models, whereas when the computation budget is limited, ReMDM better maintains quality. ReMDM also improves sample quality of masked diffusion models for discretized images, and in scientific domains such as molecule design, ReMDM facilitates diffusion guidance and pushes the Pareto frontier of controllability relative to classical masking and uniform noise diffusion. We provide the code along with a blog post on the project page: https://guanghanwang.com/remdm