arxiv_cs_cv 2026年4月24日

拡張された構造化意味的イベントチェーンを用いた神経記号制操作理解

Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

Translated: 2026/4/24 19:48:06

neuro-symbolicrobotic-manipulationsemantic-event-chainsaction-recognitioninterpretability

Japanese Translation

arXiv:2604.21053v1 Announce Type: cross Abstract: 人間環境で動作するロボティクスシステムは、物体の相互作用が時間とともにどのように変化するかを、現在どのような操作が実行されているかを、また次にどのような操作が行われうるかを推論する必要があります。古典的な拡張された構造化意味的イベントチェーン（eSEC）は、操作を解釈可能な関係記述として提供しますが、主に記述的なものであり、不確実性を意識した意思決定を直接支援していません。本稿では、eSEC-LAM（拡張された構造化意味的イベントチェーンに基づく学習されたモデル）という神経記号フレームワークを提案します。このフレームワークは、eSEC を操作理解のための明示的なイベントレベルの記号的状态へと変換します。提案された形式は、古典的な eSEC に信頼度意識的な命題、機能的なオブジェクト役割、可能性論的素朴、基礎レベルの抽象化、そして saliency-guided explanation cues（注目点に基づく説明のヒント）を付与します。これらの拡張された記号的状态は、基礎モデルに基づく認識フロントエンドから確定的な命題抽出を通じて導出され、現在の操作の推測と次の素朴の予測は、素朴の前提および結果上の軽量な記号推論によって行われます。我々は、EPIC-KITCHENS-100、EPIC-KITCHENS VISOR、Assembly101 のデータを、動作認識、次の素朴の予測、認識ノイズへの頑健性、および説明の整合性を評価するためにこのフレームワークを使用しました。実験の結果は、eSEC-LAM が競争的な動作認識性能を示し、次の素朴の予測を大幅に改善し、古典的な記号系と端到端（end-to-end）ビデオベースラインの両方に対して、劣化した認識条件下下でもより頑健であり、明示的な関係的事実に基づいた時間的に整合した説明トレースを提供することを示しています。これらの結果は、拡張された構造化意味的イベントチェーンは、操作の解釈可能な記述としてだけでなく、神経記号動作推論のための効果的な内部状态として機能できることを示しています。

Original Content

arXiv:2604.21053v1 Announce Type: cross Abstract: Robotic systems operating in human environments must reason about how object interactions evolve over time, which actions are currently being performed, and what manipulation step is likely to follow. Classical enriched Semantic Event Chains (eSECs) provide an interpretable relational description of manipulation, but remain primarily descriptive and do not directly support uncertainty-aware decision making. In this paper, we propose eSEC-LAM, a neuro-symbolic framework that transforms eSECs into an explicit event-level symbolic state for manipulation understanding. The proposed formulation augments classical eSECs with confidence-aware predicates, functional object roles, affordance priors, primitive-level abstraction, and saliency-guided explanation cues. These enriched symbolic states are derived from a foundation-model-based perception front-end through deterministic predicate extraction, while current-action inference and next-primitive prediction are performed using lightweight symbolic reasoning over primitive pre- and post-conditions. We evaluate the proposed framework on EPIC-KITCHENS-100, EPIC-KITCHENS VISOR, and Assembly101 across action recognition, next-primitive prediction, robustness to perception noise, and explanation consistency. Experimental results show that eSEC-LAM achieves competitive action recognition, substantially improves next-primitive prediction, remains more robust under degraded perceptual conditions than both classical symbolic and end-to-end video baselines, and provides temporally consistent explanation traces grounded in explicit relational evidence. These findings demonstrate that enriched Semantic Event Chains can serve not only as interpretable descriptors of manipulation, but also as effective internal states for neuro-symbolic action reasoning.