arxiv_cs_lg 2026年2月10日

MARTI-MARS$^2$: 強化学習を活用したコード生成におけるマルチエージェント自己検索の拡張

MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation

Translated: 2026/3/15 14:48:38

reinforcement-learningmulti-agentcode-generationlarge-language-modelsscaling-law

Japanese Translation

arXiv:2602.07848v1 Announce Type: new 【要約】大規模言語モデル（LLM）の複雑な推論能力は大きな関心を集めていますが、単一エージェントシステムでは、コード生成のような複雑なタスクにおいて内在する性能の限界に直面することが多いです。マルチエージェントの協力は、これらの限界を乗り越える有望なアプローチを提供しています。ただし、既存のフレームワークは、提示物ベースのテスト時の相互作用や、同一パラメータで訓練された均質なマルチロール構成に依存しており、誤り修正能力と戦略的多様性に制限がかかっています。本論文では、政策学習とマルチエージェントツリー検索を統合し、マルチエージェント協調探索プロセスを動的かつ学習可能な環境として形式化することで、マルチエージェント強化訓練と推論フレームワーク（MARTI-MARS$^2$）を提案します。エージェントが環境内で反復的に探索・改良できることにより、このフレームワークは、パラメータ共有均質なマルチロール訓練から異質なマルチエージェント訓練への進化を可能にし、単一エージェントの能力限界を突破します。また、テスト時のマルチエージェント協力による拡張ポテンシャルを完全に活用するための効率的な推論戦略である MARTI-MARS$^2$-T+ を導入しました。我々は、課題となるコード生成ベンチマークで異なるモデル規模（8B, 14B, 32B）を対象に大規模な実験を行いました。相互に協力する 2 つの 32B モデルを用いた実験において、MARTI-MARS$^2$は 77.7% という得点を達成し、GPT-5.1 などの強力な基準を超えました。さらに、MARTI-MARS$^2$ は、単一エージェントから均質なマルチロールパラダイムを経て最終的に異質なマルチエージェントパラダイムへ移行すると、強化学習の性能天井が向上し、頑健な TTS（テスト時推論）能力と政策的多様性が高まることを示す新しい拡張の法則を明らかにしました。これは、マルチエージェント強化学習を通じて知性を拡張するためには政策的多様性が不可欠であることを示唆しています。

Original Content

arXiv:2602.07848v1 Announce Type: new Abstract: While the complex reasoning capability of Large Language Models (LLMs) has attracted significant attention, single-agent systems often encounter inherent performance ceilings in complex tasks such as code generation. Multi-agent collaboration offers a promising avenue to transcend these boundaries. However, existing frameworks typically rely on prompt-based test-time interactions or multi-role configurations trained with homogeneous parameters, limiting error correction capabilities and strategic diversity. In this paper, we propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2), which integrates policy learning with multi-agent tree search by formulating the multi-agent collaborative exploration process as a dynamic and learnable environment. By allowing agents to iteratively explore and refine within the environment, the framework facilitates evolution from parameter-sharing homogeneous multi-role training to heterogeneous multi-agent training, breaking through single-agent capability limits. We also introduce an efficient inference strategy MARTI-MARS2-T+ to fully exploit the scaling potential of multi-agent collaboration at test time. We conduct extensive experiments across varied model scales (8B, 14B, and 32B) on challenging code generation benchmarks. Utilizing two collaborating 32B models, MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1. Furthermore, MARTI-MARS2 reveals a novel scaling law: shifting from single-agent to homogeneous multi-role and ultimately to heterogeneous multi-agent paradigms progressively yields higher RL performance ceilings, robust TTS capabilities, and greater policy diversity, suggesting that policy diversity is critical for scaling intelligence via multi-agent reinforcement learning.