arxiv_cs_cv 2026年4月20日

Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration

Translated: 2026/4/20 10:49:06

roboticspose-estimation6dofstate-space-modelsrgb-d

Japanese Translation

arXiv:2503.05578v4 Announce Type: replace 要約：ポーズ推定を導向とした見えない物体の 6 自由度（6-DoF）ロボットマニピュレーションは、ロボット工学における重要な課題です。しかし、現在のポーズ推定手法の見えない物体に対する拡張性は、CAD モデルや見えない物体の高密度参照視点が一般的に取得が困難であるという根本的な課題に制約されています。本論文では、ロボットマニピュレーション中にキャプチャされた単一のポーズラベル付き RGB-D 画像のみを用いて、見えない物体への 6-DoF 絶対ポーズ推定に対応する、SinRef-6D と称する新しいタスクセットアップを提案します。この設定は大きなポーズ差と単一の視覚に含まれる限られた幾何学的および空間情報のために、技術的に非自明でありながら、より拡張可能です。これらの問題を解決するために、私たちの核心的なアイデアは、状態空間モデル（SSM）をバックボーンとして使用し、共通座標系でのポイントベースの対齊を反復的に確立することです。具体的には、大きなポーズ差に対処するために、反復的なオブジェクト空間のポイントベースの対齊戦略を導入しました。さらに、単一の視覚からの長距離空間依存性をキャッチする Point SSM と RGB SSM を提案し、線形複雑性を持つ優れた空間モデル化能力を提供します。SinRef-6D は合成データで事前学習された後、単一の参照視覚のみを用いて見えない物体の 6-DoF 絶対ポーズを推定できます。推定されたポーズに基づき、ハードウェア・ソフトウェアのロボットシステムを発展させ、SinRef-6D を現実世界の設定に実装しました。6 つのベンチマークおよび多様な現実世界のシナリオでの広範な実験が、私たちの SinRef-6D が優れた拡張性を提供することを示しています。追加のロボットグレースティング実験も、開発されたロボットシステムの有効性を検証しました。コードおよびロボットデモは、https://paperreview99.github.io/SinRef-6DoF-Robotic に利用可能です。

Original Content

arXiv:2503.05578v4 Announce Type: replace Abstract: Pose estimation-guided unseen object 6-DoF robotic manipulation is a key task in robotics. However, the scalability of current pose estimation methods to unseen objects remains a fundamental challenge, as they generally rely on CAD models or dense reference views of unseen objects, which are difficult to acquire, ultimately limit their scalability. In this paper, we introduce a novel task setup, referred to as SinRef-6D, which addresses 6-DoF absolute pose estimation for unseen objects using only a single pose-labeled reference RGB-D image captured during robotic manipulation. This setup is more scalable yet technically nontrivial due to large pose discrepancies and the limited geometric and spatial information contained in a single view. To address these issues, our key idea is to iteratively establish point-wise alignment in a common coordinate system with state space models (SSMs) as backbones. Specifically, to handle large pose discrepancies, we introduce an iterative object-space point-wise alignment strategy. Then, Point and RGB SSMs are proposed to capture long-range spatial dependencies from a single view, offering superior spatial modeling capability with linear complexity. Once pre-trained on synthetic data, SinRef-6D can estimate the 6-DoF absolute pose of an unseen object using only a single reference view. With the estimated pose, we further develop a hardware-software robotic system and integrate the proposed SinRef-6D into it in real-world settings. Extensive experiments on six benchmarks and in diverse real-world scenarios demonstrate that our SinRef-6D offers superior scalability. Additional robotic grasping experiments further validate the effectiveness of the developed robotic system. The code and robotic demos are available at https://paperreview99.github.io/SinRef-6DoF-Robotic.