arxiv_cs_ai 2026年4月20日

VeriGraph: 実行検証可能なロボットプランニングのためのシーングラフ

VeriGraph: Scene Graphs for Execution Verifiable Robot Planning

Translated: 2026/4/20 11:17:08

robot-planningscene-graphvision-language-modelsllmrobotic-agent

Japanese Translation

arXiv:2411.10446v3 Announce Type: replace-cross Abstract: 最近の視覚言語モデル（VLM）の進展はロボットタスクプランニングの可能性を広げたが、これらのモデルはしばしば不正確な行動シーケンスを生み出す。これらの限界に対処するため、我々は VLM をロボットプランニングに統合しながら行動の実行可能性を検証する新たなフレームワークである VeriGraph を提案する。VeriGraph は、重要なオブジェクトと空間関係を描出するための中間表現としてシーングラフを活用し、より信頼性の高いプラン検証と改良を可能にする。システムは入力画像からシーングラフを生成し、それを介して LLM ベースのタスクプランナーが生成した行動シーケンスを反復的にチェックし、修正を行う。これにより制約が尊重され、行動が実行可能であることを保証する。我々のアプローチは多様な操作シナリオにおいてタスク完了率を大幅に向上させ、言語ベースのタスクにおいてベースライン手法に比べて 58%、タングラムパズルタスクにおいて 56%、画像ベースのタスクにおいて 30% を上回った。定性的な結果およびコードは https://verigraph-agent.github.io にアクセスできる。

Original Content

arXiv:2411.10446v3 Announce Type: replace-cross Abstract: Recent progress in vision-language models (VLMs) has opened new possibilities for robot task planning, but these models often produce incorrect action sequences. To address these limitations, we propose VeriGraph, a novel framework that integrates VLMs for robotic planning while verifying action feasibility. VeriGraph uses scene graphs as an intermediate representation to capture key objects and spatial relationships, enabling more reliable plan verification and refinement. The system generates a scene graph from input images and uses it to iteratively check and correct action sequences generated by an LLM-based task planner, ensuring constraints are respected and actions are executable. Our approach significantly enhances task completion rates across diverse manipulation scenarios, outperforming baseline methods by 58% on language-based tasks, 56% on tangram puzzle tasks, and 30% on image-based tasks. Qualitative results and code can be found at https://verigraph-agent.github.io.