arxiv_cs_cv 2026年2月10日

多様な局所特徴のための注意ベースのスパースマッチングの理解と最適化

Understanding and Optimizing Attention-Based Sparse Matching for Diverse Local Features

Translated: 2026/3/16 14:04:59

attention-based-spars-matchinglightgluezero-shotimage-matchingtransformer-based

Japanese Translation

arXiv:2602.08430v1 Announce Type: new 要約: 私たちは、多様な局所特徴に対する注意ベースのスパースイメージマッチングモデルの訓練に関する問題を見直しました。まず、LightGlue モデルのパフォーマンスに著しい影響を与える、以前見過ごされ続けていた重要な設計選択を特定しました。次に、トランスフォーマーベースのマッチングフレームワークにおける検出器と記述子（descriptors）の役割を探査し、検出器（detectors）が記述子よりもパフォーマンスの差の主な原因であることが分かりました。最後に、多様な検出器のキーポイントを用いて既存のイメージマッチングモデルをファインチューニングするための新しい手法を提案しました。これにより得られたモデルは、検出器に依存しないユニバーサルなモデルとなり、それをゼロショットマッチャとして新規検出器でデプロイすると、特異的にその特徴のために訓練されたモデルの精度と同等、またはそれを超える精度を示しました。我々の発見は、トランスフォーマーベースのマッチングモデルのデプロイや局所特徴の将来的な設計にとって貴重な洞察を提供します。

Original Content

arXiv:2602.08430v1 Announce Type: new Abstract: We revisit the problem of training attention-based sparse image matching models for various local features. We first identify one critical design choice that has been previously overlooked, which significantly impacts the performance of the LightGlue model. We then investigate the role of detectors and descriptors within the transformer-based matching framework, finding that detectors, rather than descriptors, are often the primary cause for performance difference. Finally, we propose a novel approach to fine-tune existing image matching models using keypoints from a diverse set of detectors, resulting in a universal, detector-agnostic model. When deployed as a zero-shot matcher for novel detectors, the resulting model achieves or exceeds the accuracy of models specifically trained for those features. Our findings offer valuable insights for the deployment of transformer-based matching models and the future design of local features.