arxiv_cs_lg 2026年2月10日

マルチエージェントシステムが優位となるのはいつか？エージェントシステムの学習効率に関する分析

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

Translated: 2026/3/15 15:03:55

reinforcement-learningmulti-agent-systemslarge-language-modelssample-complexitytask-decomposition

Japanese Translation

arXiv:2602.08272v1 発表タイプ：新しい要旨：強化学習（RL）は、双方向フィードバックを通じて適応的かつタスク固有の最適化を可能にする、大規模言語モデル（LLM）のトレーニングや微調整の重要な手法として登場しました。マルチエージェント強化学習（MARL）は、複雑なタスクを特化する異なるエージェントが学習する個別のサブタスクに分解することで、LLM システムの能力と効率を向上させる可能性があるため、特に有望な道程を提示しています。しかし、MARL が単一エージェント強化学習（SARL）を上回る理由とタイミングに関する理論的洞察は限定的であり、適切な RL フレームワークを選択する不確実性が残っています。本論文では、この重要なギャップを埋めるために、LLM の文脈における MARL と SARL の比較サンプリング効率を厳密に分析します。 Probably Approximately Correct（PAC）フレームワークを活用し、LLM における SARL と MARL のセットアップを形式化定義し、明示的なサンプリング複雑性の境界を導出し、タスクの分解とアライメントが学習効性にどのように影響するかを系統的に特徴付けます。我々の研究結果は、MARL がタスクが自然に独立したサブタスクに分解する際サンプル複雑性を改善することを示し、一方、依存するサブタスクは MARL の比較優位性を低下させます。さらに、潜在的不整合が存在する場合でも独立したタスク分解を強制する際のトレードオフを定量化する「タスクアライメント」という概念を導入し、分析しました。これらの理論的洞察は、実証的不整合を解明し、複雑な LLM シナリオで MARL 戦略を効果的に導入するための実用的な基準を提供します。

Original Content

arXiv:2602.08272v1 Announce Type: new Abstract: Reinforcement Learning (RL) has emerged as a crucial method for training or fine-tuning large language models (LLMs), enabling adaptive, task-specific optimizations through interactive feedback. Multi-Agent Reinforcement Learning (MARL), in particular, offers a promising avenue by decomposing complex tasks into specialized subtasks learned by distinct interacting agents, potentially enhancing the ability and efficiency of LLM systems. However, theoretical insights regarding when and why MARL outperforms Single-Agent RL (SARL) remain limited, creating uncertainty in selecting the appropriate RL framework. In this paper, we address this critical gap by rigorously analyzing the comparative sample efficiency of MARL and SARL within the context of LLM. Leveraging the Probably Approximately Correct (PAC) framework, we formally define SARL and MARL setups for LLMs, derive explicit sample complexity bounds, and systematically characterize how task decomposition and alignment influence learning efficiency. Our results demonstrate that MARL improves sample complexity when tasks naturally decompose into independent subtasks, whereas dependent subtasks diminish MARL's comparative advantage. Additionally, we introduce and analyze the concept of task alignment, quantifying the trade-offs when enforcing independent task decomposition despite potential misalignments. These theoretical insights clarify empirical inconsistencies and provide practical criteria for deploying MARL strategies effectively in complex LLM scenarios.