arxiv_cs_lg 2026年2月10日

マルチエージェント強化学習のためのディープメタ調整グラフ (Deep Meta Coordination Graphs)

Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning

Translated: 2026/3/15 9:03:48

multi-agent-reinforcement-learningmeta-learninggraph-neural-networkscooperative-policycollaborative-ai

Japanese Translation

arXiv:2502.04028v2 発表タイプ：置換要約：この論文は、マルチエージェント強化学習 (MARL) における協調政策の学習に向けたディープメタ調整グラフ (DMCG) を提案します。調整グラフ記述は局所的な相互作用をエンコードし、すべてのエージェントの联合価値関数を書き分解することで、MARL の効率性を向上させます。DMCG を通じて、我々が参照する「メタ調整グラフ」というものを動的に構成し、エージェント相互作用のより表現力のある表現を学習し、これによりグラフ畳み込みネットワークを通じてエージェント情報を統合します。その目的は、演化する協調グラフが協調 MARL タスクにおける効果的な協調を誘導する能力を持たせることです。これらのグラフはエージェントの価値関数と共同で最適化され、联合行動に対する明示的な推論に代えて暗黙的に推論する能力を学習し、相互作用表現と協調政策のエンド・ツー・エンドの学習を可能にします。我々は、DMCG が課題となる協調タスクで、いくつかの従来のグラフベースおよび非グラフベースの MARL ベーラインを凌駕して、状態の芸術レベルの協調性能とサンプル効率を一貫して達成することを示しました。いくつかのアベレーション実験を通じて、我々は DMCG の各構成要素の影響を分離し、観察される改善がこのアプローチの有意な設計選択に起因することを示しました。我々はまた、現実世界の応用における実用性についての議論を促すために、その計算複雑性の解析も含まれています。すべてのコードはこの URL から利用できます：https://github.com/Nikunj-Gupta/dmcg-marl

Original Content

arXiv:2502.04028v2 Announce Type: replace Abstract: This paper presents deep meta coordination graphs (DMCG) for learning cooperative policies in multi-agent reinforcement learning (MARL). Coordination graph formulations encode local interactions and accordingly factorize the joint value function of all agents to improve efficiency in MARL. Through DMCG, we dynamically compose what we refer to as \textit{meta coordination graphs}, to learn a more expressive representation of agent interactions and use them to integrate agent information through graph convolutional networks. The goal is to enable an evolving coordination graph to guide effective coordination in cooperative MARL tasks. The graphs are jointly optimized with agents' value functions to learn to implicitly reason about joint actions, facilitating the end-to-end learning of interaction representations and coordinated policies. We demonstrate that DMCG consistently achieves state-of-the-art coordination performance and sample efficiency on challenging cooperative tasks, outperforming several prior graph-based and non-graph-based MARL baselines. Through several ablations, we also isolate the impact of individual components in DMCG, showing that the observed improvements are due to the meaningful design choices in this approach. We also include an analysis of its computational complexity to discuss its practicality in real-world applications. All codes can be found here: {\color{blue}{https://github.com/Nikunj-Gupta/dmcg-marl}.