arxiv_cs_lg 2026年2月10日

マルチエージェント強化学習における量子もつれを通じた協調学習

Learning to Coordinate via Quantum Entanglement in Multi-Agent Reinforcement Learning

Translated: 2026/3/15 8:10:15

quantum-mechanicsmulti-agent-reinforcement-learningmachine-learningdecision-processesinformation-theory

Japanese Translation

arXiv:2602.08965v1 Announce Type: cross 摘要：マルチエージェント強化学習（MARL）における協調は、通信の欠如を重大な課題としています。以前の実験では、共同ランダム性を用いた共相关装置を通じてローカルポリシーを相関させる手法が探求されており、分散型意思決定を支援するメカニズムとして機能していました。一方、本稿では、共同量子もつれを協調資源として利用することを可能にする初めての枠組みを提示します。このアプローチは、通信がない特定のシングルラウンド協力ゲームにおいて、共同量子もつれだけが利用可能な戦略に比べ、共同ランダム性だけの戦略を凌駕させることが可能であることを示唆する量子物理学の既知の結果に基づいています。このようなケースでは、我々はその「量子的优势」であると称します。我々の枠組みは、量子測定に対する最適化を可能にする新たな微分可能なポリシーパラメータ化に、および量子コーディネーターと分散型ローカルアクターに分解された共同ポリシーを持つ新たなポリシーアーキテクチャに基づいています。我々提案された手法の有効性を示すために、まず我々がブラックボックスオラクルとして扱われるシングルラウンドゲームにおいて、経験だけでは量子的优势を達成する戦略を学習できていることを示しました。次に、分散部分可視マルコフ意思決定プロセス（Dec-POMDP）として形式化されたマルチエージェント順次意思決定問題において、我々の機械が量子的优势を持つポリシーを学習できることを示しました。

Original Content

arXiv:2602.08965v1 Announce Type: cross Abstract: The inability to communicate poses a major challenge to coordination in multi-agent reinforcement learning (MARL). Prior work has explored correlating local policies via shared randomness, sometimes in the form of a correlation device, as a mechanism to assist in decentralized decision-making. In contrast, this work introduces the first framework for training MARL agents to exploit shared quantum entanglement as a coordination resource, which permits a larger class of communication-free correlated policies than shared randomness alone. This is motivated by well-known results in quantum physics which posit that, for certain single-round cooperative games with no communication, shared quantum entanglement enables strategies that outperform those that only use shared randomness. In such cases, we say that there is quantum advantage. Our framework is based on a novel differentiable policy parameterization that enables optimization over quantum measurements, together with a novel policy architecture that decomposes joint policies into a quantum coordinator and decentralized local actors. To illustrate the effectiveness of our proposed method, we first show that we can learn, purely from experience, strategies that attain quantum advantage in single-round games that are treated as black box oracles. We then demonstrate how our machinery can learn policies with quantum advantage in an illustrative multi-agent sequential decision-making problem formulated as a decentralized partially observable Markov decision process (Dec-POMDP).