arxiv_cs_lg 2026年2月10日

拡張性とARに基づく遠隔人間ロボットインタラクションを活用した機嫌よく操作できるロボット学習

Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions

Translated: 2026/3/15 14:05:08

robot-learningaugmented-realityreinforcement-learningbehavior-cloningmanipulation

Japanese Translation

arXiv:2602.07341v1 Announce Type: new 本文では、拡張性とAR（拡張現実）に基づく遠隔人間ロボットインタラクションを活用した機嫌よく操作できるロボットアーム・ハンドシステムの操業学習に焦点を当てており、専門家のデモンストレーションデータを収集して効率向上を図る。このシステムにおいて、一般の操作タスクの問題に対処するための統一された枠組みを提案する。具体的には、提案された方法は以下の2段階から構成される：i) プレビュートレーニングの第1段階では、動作クローニング（BC）の手法を用いて、我々のARベースの遠隔人間ロボットインタラクションシステムからの学習データを活用してポリシーを作成する；ii) 第2段階では、コントラスト学習強化型強化学習（RL）手法を開発し、BCよりも効率的かつ強力なポリシーを導出するとともに、学習を加速するためにプロジェクションヘッドを設計する。安全性の向上のためにイベント駆動型拡張報酬を採用した。提案された手法の有効性を検証するために、PyBulletを使用したシミュレーション実験と実際の現実世界での実験が実施された。結果は、当社の方法は、古典的な近親ポリシー最適化（PPO）やソフトアクター・リミット（SAC）ポリシーと比較し、推論を大幅に高速化すると同時に、操作タスクの成功率で著しく優れたパフォーマンスを発揮することを示している。アブレーション研究を実施したところ、提案されたコントラスト学習付きRLはポリシーの崩壊を克服することが確認された。補足デモンストレーションは https://cyberyyc.github.io/ で利用可能です。

Original Content

arXiv:2602.07341v1 Announce Type: new Abstract: This paper focuses on the scalable robot learning for manipulation in the dexterous robot arm-hand systems, where the remote human-robot interactions via augmented reality (AR) are established to collect the expert demonstration data for improving efficiency. In such a system, we present a unified framework to address the general manipulation task problem. Specifically, the proposed method consists of two phases: i) In the first phase for pretraining, the policy is created in a behavior cloning (BC) manner, through leveraging the learning data from our AR-based remote human-robot interaction system; ii) In the second phase, a contrastive learning empowered reinforcement learning (RL) method is developed to obtain more efficient and robust policy than the BC, and thus a projection head is designed to accelerate the learning progress. An event-driven augmented reward is adopted for enhancing the safety. To validate the proposed method, both the physics simulations via PyBullet and real-world experiments are carried out. The results demonstrate that compared to the classic proximal policy optimization and soft actor-critic policies, our method not only significantly speeds up the inference, but also achieves much better performance in terms of the success rate for fulfilling the manipulation tasks. By conducting the ablation study, it is confirmed that the proposed RL with contrastive learning overcomes policy collapse. Supplementary demonstrations are available at https://cyberyyc.github.io/.