arxiv_cs_lg 2026年4月24日

VTouch++：視覚ベースの触覚強化を用いた双腕操作のためのマルチモーダルデータセット

VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation

Translated: 2026/4/24 20:05:05

multimodal-datasetbimanual-manipulationembodied-airoboticstactile-sensing

Japanese Translation

arXiv:2604.20444v1 Announce Type: cross Abstract: エンボディードな知能は近年急速に進歩したが、特に接触が豊富なタスクにおける双腕操作は依然として挑戦的です。この課題の主な原因は、物理相互作用信号、体系的なタスク組織、そして十分な規模を持つデータセットの不足にあります。これらの限界に対処するために、VTOUCH データセットを導入します。このデータセットは、高忠実度の物理相互作用信号を提供するための視覚ベースの触覚センシングを活用し、体系的な学習を可能にするマトリクス風の課題設計を採用し、スケーラビリティを確保するために現実世界および需要駆動のシナリオをカバーする自動化されたデータ収集パイプラインを実装しています。さらに、このデータセットの有効性を検証するため、クロスモーダル検索および実 robot 評価にわたり広範な定量実験を遂行しました。最後に、複数のロボット、ポリシー、およびタスクにわたる汎用推論を通じて、現実世界のパフォーマンスを示しました。

Original Content

arXiv:2604.20444v1 Announce Type: cross Abstract: Embodied intelligence has advanced rapidly in recent years; however, bimanual manipulation-especially in contact-rich tasks remains challenging. This is largely due to the lack of datasets with rich physical interaction signals, systematic task organization, and sufficient scale. To address these limitations, we introduce the VTOUCH dataset. It leverages vision based tactile sensing to provide high-fidelity physical interaction signals, adopts a matrix-style task design to enable systematic learning, and employs automated data collection pipelines covering real-world, demand-driven scenarios to ensure scalability. To further validate the effectiveness of the dataset, we conduct extensive quantitative experiments on cross-modal retrieval as well as real-robot evaluation. Finally, we demonstrate real-world performance through generalizable inference across multiple robots, policies, and tasks.