arxiv_cs_lg 2026年2月10日

Horizon Imagination: 拡散モデルにおける効率的なオンポリシートレーニング

Horizon Imagination: Efficient On-Policy Training in Diffusion World Models

Translated: 2026/3/15 14:50:30

diffusion-world-modelsreinforcement-learningon-policy-traininggenerative-aiatari-100k

Japanese Translation

arXiv:2602.08032v1 Announce Type: new Abstract: 私たちは、再帰的な精度が高まるが制御における効率性の課題に直面している、強化学習用の拡散ベースの世界モデルを調査しました。既存の方法は、推論時にHeavyweightモデルを必要とするか、非常に順序立てた想像に依存しており、どちらも莫大な計算コストを伴います。私たちは、離散確率的政策のためのオンポリシー想像プロセスである Horizon Imagination (HI) を提案します。HI は複数の未来観測を並列的にデノイズ化し、安定化機構と、デノイズ予算と適用される有効ホライズンを分離する新しいサンプリングスケジュールを統合しています。これはまた、サブフレーム予算をサポートしています。Atari 100K と Craftium での実験により、我々のアプローチはデノイズステップの半分というサブフレーム予算でも制御性能を維持し、異なるスケジュール下で優れた生成品質を達成することが示されました。コードは https://github.com/leor-c/horizon-imagination に利用可能です。

Original Content

arXiv:2602.08032v1 Announce Type: new Abstract: We study diffusion-based world models for reinforcement learning, which offer high generative fidelity but face critical efficiency challenges in control. Current methods either require heavyweight models at inference or rely on highly sequential imagination, both of which impose prohibitive computational costs. We propose Horizon Imagination (HI), an on-policy imagination process for discrete stochastic policies that denoises multiple future observations in parallel. HI incorporates a stabilization mechanism and a novel sampling schedule that decouples the denoising budget from the effective horizon over which denoising is applied while also supporting sub-frame budgets. Experiments on Atari 100K and Craftium show that our approach maintains control performance with a sub-frame budget of half the denoising steps and achieves superior generation quality under varied schedules. Code is available at https://github.com/leor-c/horizon-imagination.