arxiv_cs_ai 2026年2月10日

ToolSelf: 変化するタスク実行と自己調整を統合するツール駆動的な自己修飾を提供する

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

Translated: 2026/3/7 9:07:05

Japanese Translation

アゲンスシステムの大規模言語モデル(GLM)が複雑で遠視的なタスクに対する驚異的な能力を示していることが証明されています。しかし、その有用性は、固定された状態設定によって決まる動的状況の行動の制約によって基づかれています。これは、作業進捗の変化に対応できないことから、一般的な適合性や分割された最適化を欠いています。これらの限界を越えるためには、新しいパラダイムであるToolSelfを提案します。これはツール駆動的な実行時自己修飾を統合する新たな原理です。状態設定の更新は呼び出し可能なツールとして抽象化されます。これにより、タスクの実行と自律調整が単一のアクション空間に統合されることで、外部規則からインナリーパラメータへの一連の変遷を達成できます。エージェントは作業進行に対応して自身のサブ目標と状態を自己更新することができ、さらにその戦略とツールキットは適応的に変化し、単純な実行者から両方ともタスクと自身に対する双方向マネージャへと進化できます。また、Meta-capabilityを内包するConfiguration-aware Two-Stage Training(CAT)を作成します。これは無応答サンプリングの再チューニングとトレジャリースキームレベルの強化学習を使用した統合となっています。これらの様々なベンチマークでの広範な実験により、ToolSelfが専門的なプロセスに匹敵しつつも新しいタスクに適応することにより一般化していることが示されています。これにより24.1%の平均パフォーマンス向上と、一歩を自適応性を持つエージェントに向けて道筋を描くことができます。

Original Content

arXiv:2602.07883v1 Announce Type: new Abstract: Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks. However, their efficacy is fundamentally constrained by static configurations governing agent behaviors, which are fixed prior to execution and fail to adapt to evolving task dynamics. Existing approaches, relying on manual orchestration or heuristic-based patches, often struggle with poor generalization and fragmented optimization. To transcend these limitations, we propose ToolSelf, a novel paradigm enabling tool-driven runtime self-reconfiguration. By abstracting configuration updates as a callable tool, ToolSelf unifies task execution and self-adjustment into a single action space, achieving a phase transition from external rules to intrinsic parameters. Agents can thereby autonomously update their sub-goals and context based on task progression, and correspondingly adapt their strategy and toolbox, transforming from passive executors into dual managers of both task and self. We further devise Configuration-Aware Two-stage Training (CAT), combining rejection sampling fine-tuning with trajectory-level reinforcement learning to internalize this meta-capability. Extensive experiments across diverse benchmarks demonstrate that ToolSelf rivals specialized workflows while generalizing to novel tasks, achieving a 24.1% average performance gain and illuminating a path toward truly self-adaptive agents.