arxiv_cs_cv 2026年4月24日

静的な大規模言語モデルによる地図認識型時空間推論器：車両経路予測のための研究

Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction

Translated: 2026/4/24 19:44:31

llmautonomous-drivingtrajectory-predictionmap-aware-reasoningspatio-temporal-data

Japanese Translation

論文 ID: arXiv:2604.21479v1 公開型：新要約：最近、大規模言語モデル（LLM）は強い推論能力を示し、自律運転（AD）の分野において研究関心に増しています。しかし、LLM を AD の認識・予測に安全に適用するには、動的な交通主体と静的な道路インフラの両方について深く理解する必要があります。本研究中は、LLM の動的な交通主体の行動と道路ネットワークのトポロジーを理解する能力を評価するフレームワークを導入しました。このフレームワークは、静的な LLM を推論エンジンとし、交通エンコーダを用いて観測された主体の経路から空間レベルのシナリオ特徴を抽出します。一方、軽量な畳み込みニューラルネットワーク（CNN）はローカルな高解像度（HD）地図をエンコードします。LLM の内在的な推論能力を評価するために、抽出されたシナリオ特徴を再プログラミングアダプターを通じて LLM 互換トークンへ変換します。予測の負荷を LLM に課すことで、単純な線形デコーダを用いて将来の経路を出力します。このフレームワークは、マルチモーダル情報の影響の定量化解析を可能にし、特に経路予測精度に対する地図 семантиクスの影響を評価できます。さらに、静的な LLM を最小限のアダプションで統合でき、LLM アーキテクチャの多様な環境で強い汎用性を示すとともに、モデル評価の統一プラットフォームを提供します。

Original Content

arXiv:2604.21479v1 Announce Type: new Abstract: Large language models (LLMs) have recently demonstrated strong reasoning capabilities and attracted increasing research attention in the field of autonomous driving (AD). However, safe application of LLMs on AD perception and prediction still requires a thorough understanding of both the dynamic traffic agents and the static road infrastructure. To this end, this study introduces a framework to evaluate the capability of LLMs in understanding the behaviors of dynamic traffic agents and the topology of road networks. The framework leverages frozen LLMs as the reasoning engine, employing a traffic encoder to extract spatial-level scene features from observed trajectories of agents, while a lightweight Convolutional Neural Network (CNN) encodes the local high-definition (HD) maps. To assess the intrinsic reasoning ability of LLMs, the extracted scene features are then transformed into LLM-compatible tokens via a reprogramming adapter. By residing the prediction burden with the LLMs, a simpler linear decoder is applied to output future trajectories. The framework enables a quantitative analysis of the influence of multi-modal information, especially the impact of map semantics on trajectory prediction accuracy, and allows seamless integration of frozen LLMs with minimal adaptation, thereby demonstrating strong generalizability across diverse LLM architectures and providing a unified platform for model evaluation.