arxiv_cs_cv 2026年2月10日

ALIGN: LiDAR と画像による高度なクエリ初期化を応用した、不透過物体検出における遮蔽耐性のある 3D 物体検出

ALIGN: Advanced Query Initialization with LiDAR-Image Guidance for Occlusion-Robust 3D Object Detection

Translated: 2026/3/15 16:06:34

3d-object-detectionlidarmachine-learningocclusion-robustquery-initialization

Japanese Translation

arXiv:2512.18187v2 Announce Type: replace Abstract: 最近、カメラと LiDAR 入力を使用するクエリベースの 3D 物体検出法は高いパフォーマンスを示していますが、ランダムサンプリングや BEV ヒートマップベースのサンプリングなどの既存のクエリ初期化戦略は、非効率なクエリ使用および精度低下をもたらすことがあり、特に遮蔽や混雑している対象物の場合その傾向が顕著です。この制限に対処するため、私たちは ALIGN（Advanced query initialization with LiDAR and Image Guidance）という、遮蔽耐性および物体感知型のクエリ初期化に向けた新たなアプローチを提案します。私たちのモデルは 3 つの主要なコンポーネントを備えています：(i) 遮蔽感知中心推定（OCE）は、LiDAR の幾何学と画像のセマンティクスを統合して、対象物の中心を正確に推定し、(ii) 適応的傍邻サンプリング（ANS）は、LiDAR クラスターリングから対象候補を生成し、空間的におよびセマンティクスの一致した点をサンプリングすることで各対象物を補完し、(iii) 動的クエリバランシング（DQB）は、前地面および後地面の領域間でクエリを適応的に調整します。nuScenes ベンチェマークでの広範な実験は、ALIGN が複数の最前例検出器の一貫してパフォーマンスを改善し、最大 +0.9 mAP と +1.2 NDS の向上をもたらしたことを示しており、遮蔽や密集した大衆が存在する困難なシーンにおいてその効果は特に顕著です。私たちのコードは公開時に利用可能となります。

Original Content

arXiv:2512.18187v2 Announce Type: replace Abstract: Recent query-based 3D object detection methods using camera and LiDAR inputs have shown strong performance, but existing query initialization strategies,such as random sampling or BEV heatmap-based sampling, often result in inefficient query usage and reduced accuracy, particularly for occluded or crowded objects. To address this limitation, we propose ALIGN (Advanced query initialization with LiDAR and Image GuidaNce), a novel approach for occlusion-robust, object-aware query initialization. Our model consists of three key components: (i) Occlusion-aware Center Estimation (OCE), which integrates LiDAR geometry and image semantics to estimate object centers accurately (ii) Adaptive Neighbor Sampling (ANS), which generates object candidates from LiDAR clustering and supplements each object by sampling spatially and semantically aligned points around it and (iii) Dynamic Query Balancing (DQB), which adaptively balances queries between foreground and background regions. Our extensive experiments on the nuScenes benchmark demonstrate that ALIGN consistently improves performance across multiple state-of-the-art detectors, achieving gains of up to +0.9 mAP and +1.2 NDS, particularly in challenging scenes with occlusions or dense crowds. Our code will be publicly available upon publication.