arxiv_cs_cv 2026年4月24日

ロボティックマニピュレーション政策の動作空間設計を解明する

Demystifying Action Space Design for Robotic Manipulation Policies

Translated: 2026/4/24 19:53:16

robotic-manipulationreinforcement-learningimitation-learningpolicy-designaction-space

Japanese Translation

arXiv:2602.23408v2 Announce Type: replace-cross 要旨：動作空間の指定は、真似に基づくロボティックマニピュレーション政策学習において決定的な役割を果たし、政策学習の最適化景観を根本的に形成します。最近のアバンティスは、トレーニングデータの規模拡大とモデル能力に注力してきましたが、動作空間の選択は依然として即適的な直感やレガシー設計によって指導されており、ロボティック政策設計の哲学に対して曖昧さをもたらしています。この曖昧さを解消するため、我々は大規模かつ体系的な経験的研究を実施し、動作空間がロボティック政策学習に重大かつ複雑な影響を及ぼすことを確認しました。我々は時間と空間の両軸に沿って動作設計空間を分解し、これらの選択がどのように政策学習可能性と制御安定性を支配するかという構造化された分析を可能にしました。13,000 以上の現実世界ロールアウトと、4 つのシナリオで 500 以上のトレーニング済みモデルにおける評価に基づき、絶対座標とデルタ表現の間のトレードオフ、そして joint-space と task-space パラメータ化の間のトレードオフを検討しました。我々の大規模な結果は、一貫してデルタ動作を予測するように政策を適切に設計することでパフォーマンスが改善される一方で、joint-space と task-space 表現はそれぞれ制御安定性と一般化において補完的な強みを持つことを示唆しています。

Original Content

arXiv:2602.23408v2 Announce Type: replace-cross Abstract: The specification of the action space plays a pivotal role in imitation-based robotic manipulation policy learning, fundamentally shaping the optimization landscape of policy learning. While recent advances have focused heavily on scaling training data and model capacity, the choice of action space remains guided by ad-hoc heuristics or legacy designs, leading to an ambiguous understanding of robotic policy design philosophies. To address this ambiguity, we conducted a large-scale and systematic empirical study, confirming that the action space does have significant and complex impacts on robotic policy learning. We dissect the action design space along temporal and spatial axes, facilitating a structured analysis of how these choices govern both policy learnability and control stability. Based on 13,000+ real-world rollouts on a bimanual robot and evaluation on 500+ trained models over four scenarios, we examine the trade-offs between absolute vs. delta representations, and joint-space vs. task-space parameterizations. Our large-scale results suggest that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.