arxiv_cs_lg 2026年2月10日

SkillRL: 再帰的なスキル拡張強化学習によるアジェントの進化

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Translated: 2026/3/15 15:03:29

skillrlreinforcement-learninglarge-language-modelreinforcement-learning-agentartificial-intelligence

Japanese Translation

arXiv:2602.08234v1 Announce Type: new 摘要：大規模言語モデル（LLM）アジェントは複雑なタスクにおいて驚くべき成果を示していますが、それらはしばしば孤立して動作し、過去の経験から学習することができていません。既存のメモリーベースの方法は主に生の軌道データを記憶し、それがしばしば冗長かつノイズに満ちており、これによりアジェントが一般化のために不可欠な高レベルの再利用可能な行動パターンを抽出できなくなっています。本稿では、自動的スキル発見と再帰的進化を通じて生の経験からポリシー改善の隙を埋める枠組みである SkillRL を提案します。われ々のアプローチは、階層スキILL バンク（SkillBank）の構築に役立つ経験ベースの蒸留メカニズム、一般およびタスク固有のヒューリスティクスのための適応的検索戦略、および強化学習中にスキルライブラリとアジェントのポリシーが共進化することを可能にする再帰的進化メカニズムを導入します。これらの革新はトークンのフットプリントを大幅に削減するとともに、推論の有用性を向上させます。ALFWorld、WebShop、および 7 つの検索拡張タスクにおける実験結果は、SkillRL が 15.3% 以上の高いパフォーマンスを発揮し、強健なベースラインを上回り、タスクの複雑性が増大しても堅牢性を維持していることを示しています。コードは以下の URL で利用可能です：https://github.com/aiming-lab/SkillRL。

Original Content

arXiv:2602.08234v1 Announce Type: new Abstract: Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.