arxiv_cs_lg 2026年2月10日

CoMI-IRL:コントラスタティブ・マルチ・インテンション・インバーズ・リファインメント学習

CoMI-IRL: Contrastive Multi-Intention Inverse Reinforcement Learning

Translated: 2026/3/15 14:08:00

reinforcement-learninginverse-reinforcement-learningmulti-intentiontransformerunsupervised-learning

Japanese Translation

arXiv:2602.07496v1 発表タイプ: 新しいアブストラクト：インバーズ・リファインメント学習（IRL）は、エキスパートのデモから報酬関数を推論することを目的としています。複数の異なる意図を持つエキスパートがデモを提供する場合、これをマルチ・インテンション・IRL（MI-IRL）と呼びます。最近の深層生成 MI-IRLのアプローチは行動クラスタリングと報酬学習を組み合わせていますが、通常、真の行動モードの数$K^*$に関する事前知識を必要とします。この専門家知識への依存は、新しい行動への適応性を制限し、学習された報酬に関する解析のみを可能にし、それらでトレーニングされた行動モード全体の範囲では分析できません。我々は、Transformer を基にした無教師学習フレームワークであるコントラスタティブ・マルチ・インテンション・IRL（CoMI-IRL）を提案し、行動表現とクラスタリングを次流の報酬学習から解離しました。我々の実験は、CoMI-IRL が存在するアプローチを上回ることを示しており、これは$K^*$またはラベルに関する事前知識を必要とせず、行動関係の視覚的な解釈と、完全な再トレーニングなしでの未見行動への適応を可能にします。

Original Content

arXiv:2602.07496v1 Announce Type: new Abstract: Inverse Reinforcement Learning (IRL) seeks to infer reward functions from expert demonstrations. When demonstrations originate from multiple experts with different intentions, the problem is known as Multi-Intention IRL (MI-IRL). Recent deep generative MI-IRL approaches couple behavior clustering and reward learning, but typically require prior knowledge of the number of true behavioral modes $K^*$. This reliance on expert knowledge limits their adaptability to new behaviors, and only enables analysis related to the learned rewards, and not across the behavior modes used to train them. We propose Contrastive Multi-Intention IRL (CoMI-IRL), a transformer-based unsupervised framework that decouples behavior representation and clustering from downstream reward learning. Our experiments show that CoMI-IRL outperforms existing approaches without a priori knowledge of $K^*$ or labels, while allowing for visual interpretation of behavior relationships and adaptation to unseen behavior without full retraining.