arxiv_cs_lg 2026年2月10日

CADO: 画像解像度（熱図ベース）の組み合わせ最適化ソルバーにおける、模倣からコスト最小化へ

CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

Translated: 2026/3/15 15:03:07

combinatorial-optimizationheatmap-solverreinforcement-learningdiffusion-modelscost-minimization

Japanese Translation

arXiv:2602.08210v1 Announce Type: new 要約：画像解像度（熱図ベース）の組み合わせ最適化（CO）ソルバーが有望なパラダイムとして台頭しています。しかし、我々は支配的な監督学習（SL）のトレーニングパラダイムが根本的な目的の不一致を抱えていると主張します：模倣損失（例：クロスエントロピー）の最小化は、解のコスト最小化を保証するものではありません。我々はこの不一致を欠陥 2 つに分解します：デコーダー無知（非微分可能なデコーディング過程に無頓着であること）とコスト無知（解の質よりも構造の模倣を優先すること）。我々は、これらの固有の欠陥が硬いパフォーマンス天井を課すことを経験的に示しました。この限界を克服するために、CADO（Optimization 向けのコスト感知型拡散モデル）を提案し、これは拡散ノイズ取り消しのプロセスをマルコフ決定過程（MDP）として形式化して、デコーding後の解のコストを直接最適化する強化学習のファインチューニングフレームワークです。我々は、真値ラベルを模倣目標ではなく公平な基準として再目的化する「ラベル中心型報酬」と、パラメータ効率的な適応のための「混合ファインチューニング」を導入しました。CADOは、多様なベンチマークにおいて最先端のパフォーマンスを達成し、目的の一致が画像解像度（熱図ベース）ソルバーの全可能性を解き放つために不可欠であると検証しました。

Original Content

arXiv:2602.08210v1 Announce Type: new Abstract: Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.