arxiv_cs_lg 2026年2月10日

コードにおける夢：オープンエンド世界の課題学習に向けた方法

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

Translated: 2026/3/15 15:02:57

open-ended-learningcurriculum-learningfoundation-modelsenvironment-synthesisreinforcement-learning

Japanese Translation

arXiv:2602.08194v1 Announce Type: new 要約：オープンエンド学習は、知性を絶え間ない環境との相互作用から生み出されるものと捉えます。最近、基礎モデルを用いて多様な環境をプログラム生成する方法が進んでいますが、これらは孤立した行動の発見に焦点を当てる傾向があり、持続的な進化の調整に不足しています。複雑なオープンエンド世界では、可能な課題の巨大な組み合わせ空間ゆえに、エージェントが一貫して学習可能な経験の系列を発見することが困難です。これを解決するために、我々は基礎モデルが実行可能な環境コードを合成し、能力の向上に向けた学習を構造化する「コードにおける夢（Dreaming in Code; DiCode）」という枠組みを提案します。DiCodeにおいて、「夢」は世界のコードレベルの変化を具現化するという形をとります。DiCodeは、豊富な機械要素と長期的な進捗が特徴的な困難なオープンエンドベンチマーク「Craftax」において実装されています。経験的には、DiCodeはエージェントが長期的なスキルを獲得することを可能にし、最良のベースラインと比較して平均リターンを16%向上させるだけでなく、先々の手法では失敗するゲーム後半の戦闘タスクにおいて非ゼロの成功率を達成しました。我々の結果は、コードレベルの環境設計が、オープンエンド世界の能力ギャップを架橋する中間環境を構築するための実践的な手法を提供することを示唆しています。プロジェクトページとソースコードは、https://konstantinosmitsides.github.io/dreaming-in-code および https://github.com/konstantinosmitsides/dreaming-in-code で利用可能です。

Original Content

arXiv:2602.08194v1 Announce Type: new Abstract: Open-ended learning frames intelligence as emerging from continual interaction with an ever-expanding space of environments. While recent advances have utilized foundation models to programmatically generate diverse environments, these approaches often focus on discovering isolated behaviors rather than orchestrating sustained progression. In complex open-ended worlds, the large combinatorial space of possible challenges makes it difficult for agents to discover sequences of experiences that remain consistently learnable. To address this, we propose Dreaming in Code (DiCode), a framework in which foundation models synthesize executable environment code to scaffold learning toward increasing competence. In DiCode, "dreaming" takes the form of materializing code-level variations of the world. We instantiate DiCode in Craftax, a challenging open-ended benchmark characterized by rich mechanics and long-horizon progression. Empirically, DiCode enables agents to acquire long-horizon skills, achieving a $16\%$ improvement in mean return over the strongest baseline and non-zero success on late-game combat tasks where prior methods fail. Our results suggest that code-level environment design provides a practical mechanism for curriculum control, enabling the construction of intermediate environments that bridge competence gaps in open-ended worlds. Project page and source code are available at https://konstantinosmitsides.github.io/dreaming-in-code and https://github.com/konstantinosmitsides/dreaming-in-code.