arxiv_cs_cv 2026年4月20日

Open-world Robotic Manipulation における逐次手-目調整（Continual Hand-Eye Calibration）

Continual Hand-Eye Calibration for Open-world Robotic Manipulation

Translated: 2026/4/20 10:42:52

robotic-manipulationhand-eye-calibrationcontinual-learningvisual-localizationdeep-learning

Japanese Translation

arXiv:2604.15814v1 Announce Type: new 摘要：視覚ロケーションベースの手-目調整（Hand-eye calibration）は、オープンワールド環境におけるロボティクスマニピュレーションにおいて不可欠な機能です。しかし、多くの深層学習ベースの調整モデルは、オープンワールドにおけるシーンの変化に伴い未見データに適応する際に、カタルーグスフォーgetting（catastrophic forgetting）を起こします。また、単純な再習練（rehearsal）に基づく逐次学習戦略はこの問題を十分に緩和することはできません。この課題を克服するために、我々は、空間再プレイ戦略と構造保持蒸留により、逐次遭遇するオープンワールドマニピュレーションシーンに適応できる逐次手-目調整フレームワークを提案します。具体的には、空間認識再プレイ戦略（Spatial-Aware Replay Strategy, SARS）は、各シーンのポーズ空間を包括的にカバーする幾何学的に均一な再プレイバッファを構築し、冗長な隣接フレームを情報量が最大となる視点で置き換えます。同時に、構造保持二重蒸留（Structure-Preserving Dual Distillation, SPDD）を提案し、ロケーションの知識を粗いシーンレイアウトと細かいポーズ精度に分解し、それぞれ個別に蒸留することで、両方の種類の忘却を軽減します。新たなマニピュレーションシーンが到着すると、SARS は過去のすべてのシーンから幾何学的に代表性の高い再プレイサンプルを提供し、SPDD はこれらのサンプルに対し構造化された蒸留を適用して、以前学習した知識を保持します。新しいシーンに対するトレーニングの後、SARS は新しいシーンから選択されたサンプルを再プレイバッファに取り込むことで、モデルを将来の再習練に備え、多シーンの調整能力を継続的に蓄積させます。複数の公開データセットで行われた実験は、顕著なシーン忘却防止性能を示しており、過去シーンの精度を維持しながら新たなシーンへの適応能力を維持することを確認し、本フレームワークの有効性を裏付けています。

Original Content

arXiv:2604.15814v1 Announce Type: new Abstract: Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual learning strategy cannot well mitigate this issue. To overcome this challenge, we propose a continual hand-eye calibration framework, enabling robots to adapt to sequentially encountered open-world manipulation scenes through spatially replay strategy and structure-preserving distillation. Specifically, a Spatial-Aware Replay Strategy (SARS) constructs a geometrically uniform replay buffer that ensures comprehensive coverage of each scene pose space, replacing redundant adjacent frames with maximally informative viewpoints. Meanwhile, a Structure-Preserving Dual Distillation (SPDD) is proposed to decompose localization knowledge into coarse scene layout and fine pose precision, and distills them separately to alleviate both types of forgetting during continual adaptation. As a new manipulation scene arrives, SARS provides geometrically representative replay samples from all prior scenes, and SPDD applies structured distillation on these samples to retain previously learned knowledge. After training on the new scene, SARS incorporates selected samples from the new scene into the replay buffer for future rehearsal, allowing the model to continuously accumulate multi-scene calibration capability. Experiments on multiple public datasets show significant anti scene forgetting performance, maintaining accuracy on past scenes while preserving adaptation to new scenes, confirming the effectiveness of the framework.