arxiv_cs_lg 2026年4月24日

LEXIS: 3D 人対物相互作用の LatEnt ProXimal Interaction Signatures の画像からの推定

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

Translated: 2026/4/24 20:06:38

lexis3d-hoihuman-object-interactiondiffusion-modelsimage-reconstruction

Japanese Translation

arXiv:2604.20800v1 発表タイプ：cross 要約: RGB 画像から 3D 人間対物相互作用を再構築することは、感応型システムにとって不可欠です。しかし、これは人体と物体間の微細な物理的結合を捉える必要があるため、依然として課題が残っています。現在の手法は、不鮮明な接触の指示に依存していますが、これらは自然な相互作用の特徴である連続的な親和性と稠密的な空間関係のモデル化に失敗します。我々は、人体表面と物体表面全体にわたる稠密的かつ連続的な親和性をエンコードする「InterFields」という表現を用いてこの制限に対処しました。しかし、単一の画像からこれらの場を推測することは内在的に不適定です。これを解決するため、我々の直感によると、相互作用のパターンは動作と物体の幾何学的に特徴的に構造化されています。これを捉えるために、我々は VQ-VAE を用いて学習した相互作用符号の新たな離散多様体である「LEXIS」を構築しました。次に、LEXIS signatures を活用して人間および物体のメッシュを共に InterFields を推定するための拡散フレームワーク「LEXIS-Flow」を発展させました。特に、これらの InterFields は、事後最適化を必要とせず、物理的に妥当かつ親和性に配慮した再構築を確保するためのガイド付きリファインメントを助けます。Open3DHOI および BEHAVE での評価において、LEXIS-Flow は既存のソート・オブ・ザ・アート基盤線よりも再構築、接触、および親和性の質において著しく優れており、我々の手法は一般化の改善だけでなく、よりリアルチックと認識される再構築をもたらすことで、包括的な 3D シーンの理解へと一歩近づけました。コードおよびモデルは、https://anticdimi.github.io/lexis で公開されます。

Original Content

arXiv:2604.20800v1 Announce Type: cross Abstract: Reconstructing 3D Human-Object Interaction from an RGB image is essential for perceptive systems. Yet, this remains challenging as it requires capturing the subtle physical coupling between the body and objects. While current methods rely on sparse, binary contact cues, these fail to model the continuous proximity and dense spatial relationships that characterize natural interactions. We address this limitation via InterFields, a representation that encodes dense, continuous proximity across the entire body and object surfaces. However, inferring these fields from single images is inherently ill-posed. To tackle this, our intuition is that interaction patterns are characteristically structured by the action and object geometry. We capture this structure in LEXIS, a novel discrete manifold of interaction signatures learned via a VQ-VAE. We then develop LEXIS-Flow, a diffusion framework that leverages LEXIS signatures to estimate human and object meshes alongside their InterFields. Notably, these InterFields help in a guided refinement that ensures physically-plausible, proximity-aware reconstructions without requiring post-hoc optimization. Evaluation on Open3DHOI and BEHAVE shows that LEXIS-Flow significantly outperforms existing SotA baselines in reconstruction, contact, and proximity quality. Our approach not only improves generalization but also yields reconstructions perceived as more realistic, moving us closer to holistic 3D scene understanding. Code & models will be public at https://anticdimi.github.io/lexis.