arxiv_cs_lg 2026年2月10日

TodoEvolve: 自律的エージェントпланиリングシステムを設計・構築する学習

TodoEvolve: Learning to Architect Agent Planning Systems

Translated: 2026/3/15 7:04:18

agentic-aireinforcement-learningmeta-planningtask-planningartificial-intelligence

Japanese Translation

arXiv:2602.07839v1 Announce Type: cross 要約: パーティショニングは、現代のエージェントシステムが複雑かつ長期的なタスクをナビゲートする上で中心的な能力へと成長しました。しかし、既存のアプローチは、オープンエンドの課題の構造的な多様性に適応する柔軟性を欠く固定された、手動で作成されたパーティショニング構造に主に依存しています。この限界に対処するため、私たちは、タスク特定パーティショニングアーキテクチャを自律的に統合し、動的に修正するメタパーティショニングパラダイムである TodoEvolve を導入しました。具体的には、まず、トポロジー、初期化、適応、ナビゲーションを含む統一されたコードベースにおいて多様なパーティショニングパラダイムを標準化するモジュール設計的空間である PlanFactory を構築しました。PlanFactory を活用して、Todo-14B を訓練する際に、任意のタスクとエージェントの背骨において性能、安定性、トークン効率を高く保つパーティショニングシステムを生成することを促すマルチ目標強化学習目的である extit{Impedance-Guided Preference Optimization} (IGPO) を適用しました。5 つのエージェントベンチマークでの実証評価は、TodoEvolve が、慎重に設計されたパーティショニングモジュールを凌駕し、また、安価な API コストとランタイムオーバーヘッドを維持しながら、一貫して優れていることを示しています。

Original Content

arXiv:2602.07839v1 Announce Type: cross Abstract: Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.