arxiv_cs_cv 2026年4月20日

Limited Labels から Open Domains へ：ドローンビュー地位置化のための効率的な学習手法

From Limited Labels to Open Domains:An Efficient Learning Method for Drone-view Geo-Localization

Translated: 2026/4/20 10:49:12

drone-viewgeo-localizationcross-domainsupervised-learningunpaired-data

Japanese Translation

arXiv:2503.07520v5 発表タイプ：置換要約: 従来の監督学習ベースのドローンビュー地位置化（DVGL）手法は、ペアリングされた学習データに大きく依存しており、未paired データからのクロスビュー相関の学習に困難を伴います。さらに、新しいドメインで展開する場合、これらの手法では新しいペアリングデータを取得してモデル適応のために再学習を必要とし、計算オーバーヘッドが大幅に増加します。既存の非教師済み手法は、クロスビュー類似度に基づき擬似ラベルを生成してペアリング関係を推定することを可能にしましたが、地理的類似性と空間的連続性が異なる地理的位置で視覚的に類似した特徴を生み出すため、地理的類似性と空間的連続性が特徴混同を引き起こすことがありました。この特徴混同は擬似ラベル生成の信頼性を損なうものであり、誤った擬似ラベルは負の最適化を誘導します。監督学習と非教師済み DVGL 手法に内在するこれらの課題を踏まえ、本稿では、少量の教師あり情報でも機能し、クロスドメイン不変知識転移ネットワーク（CDIKTNet）を提案します。このアーキテクチャは、クロスドメイン不変サブネットワーク（CDIS）とクロスドメイン転移サブネットワーク（CDTS）で構成されています。このアーキテクチャは、不変特徴の学習と知識の転移のための閉ループフレームワークを可能にします。CDIS は、少量のペアリングデータ（既知の事前知識として機能するもの）からクロスビュー構造的不変性と空間的不変性を学習するように設計されています。これにより、未 paired データの共有特徴空間に、類似した明示的なクロスビュー相関が初期化されます。これに基づき、CDTS は二重パス対比学習を採用し、共有特徴空間における一貫性を維持しながらそれぞれの子空間をさらに最適化します。大規模な実験により、CDIKTNet は完全教師あり条件下で既存の教師あり手法と比較して状態の最前線（SOTA）の性能を示し、さらに少ショットおよびクロスドメイン初期化において既存の非教師済み手法も凌駕することができました。

Original Content

arXiv:2503.07520v5 Announce Type: replace Abstract: Traditional supervised drone-view geo-localization (DVGL) methods heavily depend on paired training data and encounter difficulties in learning cross-view correlations from unpaired data. Moreover, when deployed in a new domain, these methods require obtaining the new paired data and subsequent retraining for model adaptation, which significantly increases computational overhead. Existing unsupervised methods have enabled to generate pseudo-labels based on cross-view similarity to infer the pairing relationships. However, geographical similarity and spatial continuity often cause visually analogous features at different geographical locations. The feature confusion compromises the reliability of pseudo-label generation, where incorrect pseudo-labels drive negative optimization. Given these challenges inherent in both supervised and unsupervised DVGL methods, we propose a novel cross-domain invariant knowledge transfer network (CDIKTNet) with limited supervision, whose architecture consists of a cross-domain invariance sub-network (CDIS) and a cross-domain transfer sub-network (CDTS). This architecture facilitates a closed-loop framework for invariance feature learning and knowledge transfer. The CDIS is designed to learn cross-view structural and spatial invariance from a small amount of paired data that serves as prior knowledge. It endows the shared feature space of unpaired data with similar implicit cross-view correlations at initialization, which alleviates feature confusion. Based on this, the CDTS employs dual-path contrastive learning to further optimize each subspace while preserving consistency in a shared feature space. Extensive experiments demonstrate that CDIKTNet achieves state-of-the-art performance under full supervision compared with those supervised methods, and further surpasses existing unsupervised methods in both few-shot and cross-domain initialization.