arxiv_cs_gr 2026年2月11日

SceneSmith: Agenticなシミュレーション対応屋内シーン生成

SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes

Translated: 2026/2/11 9:20:22

Japanese Translation

arXiv:2602.09153v1 公開タイプ: cross 要旨: シミュレーションは大規模に家庭用ロボットを訓練・評価するための主要な手段となっているが，既存の環境は実際の屋内空間の多様性や物理的複雑さを十分に捉えられていない。現在のシーン合成手法は，密なクラッターや多関節家具，ロボット操作に不可欠な物理的特性を欠いた，まばらにしか家具が配置されていない部屋を生成するにとどまる。本稿では，自然言語プロンプトからシミュレーション対応の屋内環境を生成する階層的なagenticフレームワーク SceneSmith を提案する。SceneSmith は建築レイアウトから家具配置，そして小物の配置に至るまでの連続した段階—各段階は VLM エージェント（designer、critic、orchestrator）間の相互作用として実装される—を通じてシーンを構築する。本フレームワークは，静的オブジェクトに対する text-to-3D 合成，可動（articulated）オブジェクトに対するデータセット検索，および物理的特性の推定をアセット生成に密に統合する。SceneSmith は従来手法に比べて3–6倍多くのオブジェクトを生成し，オブジェクト間衝突率は <2% であり，物理シミュレーション下で安定に残るオブジェクトは96%に達する。205名の参加者によるユーザースタディでは，ベースラインに対して平均リアリズム勝率92%および平均プロンプト忠実度勝率91%を達成した。さらに，これらの環境が自動ロボットポリシー評価のエンドツーエンドパイプラインで利用可能であることを示す。

Original Content

arXiv:2602.09153v1 Announce Type: cross Abstract: Simulation has become a key tool for training and evaluating home robots at scale, yet existing environments fail to capture the diversity and physical complexity of real indoor spaces. Current scene synthesis methods produce sparsely furnished rooms that lack the dense clutter, articulated furniture, and physical properties essential for robotic manipulation. We introduce SceneSmith, a hierarchical agentic framework that generates simulation-ready indoor environments from natural language prompts. SceneSmith constructs scenes through successive stages$\unicode{x2013}$from architectural layout to furniture placement to small object population$\unicode{x2013}$each implemented as an interaction among VLM agents: designer, critic, and orchestrator. The framework tightly integrates asset generation through text-to-3D synthesis for static objects, dataset retrieval for articulated objects, and physical property estimation. SceneSmith generates 3-6x more objects than prior methods, with <2% inter-object collisions and 96% of objects remaining stable under physics simulation. In a user study with 205 participants, it achieves 92% average realism and 91% average prompt faithfulness win rates against baselines. We further demonstrate that these environments can be used in an end-to-end pipeline for automatic robot policy evaluation.