arxiv_cs_cv 2026年2月10日

Specialized Agent Motion Prediction と Generic Agent Motion Prediction を Dynamic Occupancy Grid Maps で統合する

Integrating Specialized and Generic Agent Motion Prediction with Dynamic Occupancy Grid Maps

Translated: 2026/3/15 19:02:56

motion-predictionoccupancy-grid-mapsself-drivingdeep-learningperception

Japanese Translation

arXiv:2602.07938v1 Announce Type: new Abstract: センサーデータの不確実性、エージェントの複雑な振る舞い、および複数の実現可能な未来の存在により、運転シーンの正確な予測は挑ましい課題です。現在の Occupancy Grid Map を用いた予測手法は、主にエージェンチアガノスティックなシーン予測に焦点を当てており、一方、エージェンチスペシフィックな予測はセマンティック情報の助けを借りて専門的な振る舞いの洞察を提供します。しかし、これら 2 つのパラダイムはそれぞれ特有の制限に直面しており：エージェンチアガノスティックモデルは動的アクターの振る舞いの複雑さを捉えることが困難で、エージェンチスペシフィックアプローチは認識されていないまたは認識しにくいエージェントへの汎化に失敗します。これら両方を組み合わせることで、頑健かつ安全な運動予測が可能となります。これを解決するため、私たちは Dynamic Occupancy Grid Map を用い、簡素化された時空間解析パイプラインを通じて、未来の占有状態グリッド、車両グリッド、そしてシーンの流れグリッドを同時に予測する統合フレームワークを提案します。軽量な時空間バックボーンに基づき、我々のアプローチはグリッド間の依存関係を捕捉し、多様な未来予測を可能にする独自の相互依存の損失関数を中心としています。占有状態情報を活用して流れ導向の遷移を強制することで、この損失関数は障害物と視界を被った状況も含めて、占有の進化を導く正規化器として機能します。したがって、モデルは車両エージェントの具体的な振る舞いを予測するだけでなく、他の動的なエンティティを特定し、複雑なシーン内のそれらの進化を予期することもできます。nuScenes および Woven Planet データセットにおける実世界評価は、基準手法と比較して動的車両および汎用的な動的シーン要素の予測性能が卓越していることを示しています。

Original Content

arXiv:2602.07938v1 Announce Type: new Abstract: Accurate prediction of driving scene is a challenging task due to uncertainty in sensor data, the complex behaviors of agents, and the possibility of multiple feasible futures. Existing prediction methods using occupancy grid maps primarily focus on agent-agnostic scene predictions, while agent-specific predictions provide specialized behavior insights with the help of semantic information. However, both paradigms face distinct limitations: agent-agnostic models struggle to capture the behavioral complexities of dynamic actors, whereas agent-specific approaches fail to generalize to poorly perceived or unrecognized agents; combining both enables robust and safer motion forecasting. To address this, we propose a unified framework by leveraging Dynamic Occupancy Grid Maps within a streamlined temporal decoding pipeline to simultaneously predict future occupancy state grids, vehicle grids, and scene flow grids. Relying on a lightweight spatiotemporal backbone, our approach is centered on a tailored, interdependent loss function that captures inter-grid dependencies and enables diverse future predictions. By using occupancy state information to enforce flow-guided transitions, the loss function acts as a regularizer that directs occupancy evolution while accounting for obstacles and occlusions. Consequently, the model not only predicts the specific behaviors of vehicle agents, but also identifies other dynamic entities and anticipates their evolution within the complex scene. Evaluations on real-world nuScenes and Woven Planet datasets demonstrate superior prediction performances for dynamic vehicles and generic dynamic scene elements compared to baseline methods.