arxiv_cs_cv 2026年2月10日

適応型マッティングディスティルによる少段階生成の最適化

Optimizing Few-Step Generation with Adaptive Matching Distillation

Translated: 2026/3/15 18:02:49

few-step-generationdistillationreinforcement-learningoptimizationgenerative-models

Japanese Translation

arXiv:2602.07345v1 Announce Type: new 抽象: ディストリビューションマッチングディスティル（DMD）は強力な加速パラダイムですが、その安定性はしばしば「禁地」と呼ばれる領域で損なわれます。禁地とは、現実の教師モデルが信頼性の低いガイドを、そして偽の教師モデルが十分な排斥力を発しない領域を指します。本稿では、これらの腐敗された領域を回避する暗黙的な戦略として既存の研究成果を再解釈する統一最適化フレームワークを提案します。この洞察に基づき、報酬の代替指標を用いて明示的に禁地を検出および脱出する自己修正機構「適応型マッティングディスティル（AMD）」を導入しました。AMD は構造的な信号分解を通じて修正用勾配を動的に優先し、失敗モードの崩壊に対する急なエネルギー障壁を強制する「排斥ランドスケープの鋭化」を導入しました。画像およびビデオ生成タスク（例：SDXL, Wan2.1）における広範な実験と、VBench や GenEval などの厳格なベンチマークは、AMD がサンプルの忠実性とトレーニングの強固さを著しく向上させることを示しています。例えば、SDXL の HPSv2 スコアを AMD は 30.64 から 31.25 に高め、既存の最良ベースラインを超えています。これらの結果は、禁地内で最適化経路を明示的に修正することが、少段階生成モデルのパフォーマンスの天井を押し上げるために不可欠であることを裏付けています。

Original Content

arXiv:2602.07345v1 Announce Type: new Abstract: Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where the real teacher provides unreliable guidance while the fake teacher exerts insufficient repulsive force. In this work, we propose a unified optimization framework that reinterprets prior art as implicit strategies to avoid these corrupted regions. Based on this insight, we introduce Adaptive Matching Distillation (AMD), a self-correcting mechanism that utilizes reward proxies to explicitly detect and escape Forbidden Zones. AMD dynamically prioritizes corrective gradients via structural signal decomposition and introduces Repulsive Landscape Sharpening to enforce steep energy barriers against failure mode collapse. Extensive experiments across image and video generation tasks (e.g., SDXL, Wan2.1) and rigorous benchmarks (e.g., VBench, GenEval) demonstrate that AMD significantly enhances sample fidelity and training robustness. For instance, AMD improves the HPSv2 score on SDXL from 30.64 to 31.25, outperforming state-of-the-art baselines. These findings validate that explicitly rectifying optimization trajectories within Forbidden Zones is essential for pushing the performance ceiling of few-step generative models.