arxiv_cs_ai 2026年2月10日

マルチエージェントチームが専門家を進退させる

Multi-Agent Teams Hold Experts Back

Translated: 2026/2/14 8:15:03

Japanese Translation

arXiv:2602.01011v3 Announce Type: replace-cross 抄録：マルチエージェントLLMシステムは、より広範に自律的な協働者が採用され始めています。その中でき、エージェントは完全には決まらない仕事の流れを超えて自由に互いにつながることができます。このような状況下で、有効な統合はあらかじめ決まらずにのみ生成されることができ、それに対して個々での相互作用から発展する必要があります。しかしながら、前の研究には固定された役割、標準的な仕事の流れまたは集約規則といった統合が導入されており、これによって統合に関して何らかの変化が起こしたときに如何に自我組織されるチームが機能するのかを問い続けているのが現状です。組織心理学に関するアプローチを利用して、この問題について調査してみました。どのような条件下、エラークエージェントの間で強力な統合は生じるのでしょうか？我々は人間のようなインスピレーションや最前線のMLテスト用のプラットフォームを用いてそれを評価しました。これらのテストを通じてわかったのは、人工的なシステムとの比較で、マルチエージェントLLMチームは専門家とのようなシステムが見せる優れたパフォーマンスを達成することができません。また、これは専門家の誰かを明らかにした場合にも同様の結果が得られました。このような状況ではそのようなパフォーマンスの低下は最大で37.6%を超過することがありました。統合に対するこの試みが機能しなかった理由を究明すると、それはプロフェッショナル的視点への協調（整合）ではなく、専門家と非専門家の意見の均等な重視（平均論）に起因していました。これらの傾向はメンバーの数によって増幅されるとともに、パフォーマンスに関する相関性が負となることもあります。我々の発見によれば、自己組織化されているマルチエージェントチーム間で専門技術を組み合わせる上での機能的な利点が見つからないため、彼らには大きなギャップがあるように感じられます。

Original Content

arXiv:2602.01011v3 Announce Type: replace-cross Abstract: Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 37.6%. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.