arxiv_cs_lg 2026年2月10日

Multi-Agent Task Allocation の逆強化学習のための時空注意強化型逆強化学習

Spatiotemporal Attention-Augmented Inverse Reinforcement Learning for Multi-Agent Task Allocation

Translated: 2026/3/15 9:04:37

inverse-reinforcement-learningmulti-agent-task-allocationattention-mechanismsspatiotemporal-representationmarl

Japanese Translation

arXiv:2504.05045v4 Announce Type: replace 要約：非定常性の相互作用と高次元の調整に直面したマルチエージェントタスク割り当て（MATA）のための対抗的逆強化学習（IRL）は、不制約された報酬推論は高い変数と一般的な汎化能力に悪影響を与える。我々は、時空表現学習を通じて報酬推論を制約する、注意構造の対抗的 IRL フレームワークを提案した。我々の手法は、長距離時間依存のためにマルチヘッド自己注意（MHSA）を、エージェンツとタスクの関係構造のためにグラフ注意力ネットワーク（GAT）を採用している。報酬推論を、環境報酬の低容量アダプティブ線形変換として定式化し、安定性と解釈可能性を備えたガイドを得ている。このフレームワークは、報酬推論とポリシー学習を切り離し、報酬モデルを対抗的に最適化する。ベンチマーク MATA シナリオでの実験は、我々の手法が、収束速度、累積報酬、空間的効率的な面で、代表的な MARL ベースラインを上回ることを示している。結果は、注意力導向の容量制約された報酬推論が、複雑なマルチエージェントシステムにおける対抗的 IRL を安定させるスケーラブルで効果的なメカニズムであることを示している。

Original Content

arXiv:2504.05045v4 Announce Type: replace Abstract: Adversarial inverse reinforcement learning (IRL) for multi-agent task allocation (MATA) is challenged by non-stationary interactions and high-dimensional coordination. Unconstrained reward inference in these settings often leads to high variance and poor generalization. We propose an attention-structured adversarial IRL framework that constrains reward inference via spatiotemporal representation learning. Our method employs multi-head self-attention (MHSA) for long-range temporal dependencies and graph attention networks (GAT) for agent-task relational structures. We formulate reward inference as a low-capacity, adaptive linear transformation of the environment reward, ensuring stable and interpretable guidance. This framework decouples reward inference from policy learning and optimizes the reward model adversarially. Experiments on benchmark MATA scenarios show that our approach outperforms representative MARL baselines in convergence speed, cumulative rewards, and spatial efficiency. Results demonstrate that attention-guided, capacity-constrained reward inference is a scalable and effective mechanism for stabilizing adversarial IRL in complex multi-agent systems.