arxiv_cs_cv 2026年2月10日

Tighnari v2: マルチモーダル植物分布予測におけるラベルノイズと分布シフトの緩和: 混合 experts と弱监督和 learnings の活用

Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning

Open original article

Translated: 2026/3/15 19:05:51

type-2602-08282multimodal-fusionmachine-learningbiodiversity-conservationlabel-noise

Japanese Translation

arXiv:2602.08282v1 Announce Type: new Abstract: 大規模かつ種を超えた植物分布の予測は生物多様性の保全において重要な役割を果たしますが、観測データの希少性とバイアスにより、この分野でのモデル構築は依然として大きな課題に直面しています。存在 - 欠如（PA）データは正確でノイズフリーなラベルを提供しますが、収集コストが高く数量が限られています。一方、存在のみ（PO）データは広範な空間的覆被と豊富な時空間分布を提供しますが、負のサンプルにおいて著しいラベルノイズを受けます。これらの実世界の制約に対処するため、本研究では PA データと PO データの両方の長所を完全に活かせばるマルチモーダル融合フレームワークを提案します。PO データに対して、衛星影像の地理的覆被に基づいた革新的な偽ラベル集約戦略を導入し、ラベル空間とリモートセンシング特徴空間間の地理的整合性を可能にします。モデルアーキテクチャについては、衛星影像用として Swin Transformer Base をバックボーンとし、TabM ネットワークを用いて表形式的特徴を抽出します。時系列モデル化には Temporal Swin Transformer を利用し、異種データ間の融合を最適化するためにスタックな直列三モーダルクロスアテンション機構を採用します。さらに、実証分析により PA トレーニングサンプルとテストサンプル間の顕著な地理的分布シフトが確認され、PA データと PO データを直接混合して学習したモデルは PO データのラベルノイズにより性能が低下し傾向にあります。これを解決するため、混合 experts パラジグムを用い、テストサンプルを PA サンプルとの空間的近似度に応じて分割し、異なるデータセットで学習された異なるモデルを使用して各分割内で推論および後処理を行います。GeoLifeCLEF 2025 データセット上の実験では、PA 覆被が限られ、分布シフトが顕著なシナリオにおいて、当アプローチが優れた予測性能を示したことを示唆しています。

Original Content

arXiv:2602.08282v1 Announce Type: new Abstract: Large-scale, cross-species plant distribution prediction plays a crucial role in biodiversity conservation, yet modeling efforts in this area still face significant challenges due to the sparsity and bias of observational data. Presence-Absence (PA) data provide accurate and noise-free labels, but are costly to obtain and limited in quantity; Presence-Only (PO) data, by contrast, offer broad spatial coverage and rich spatiotemporal distribution, but suffer from severe label noise in negative samples. To address these real-world constraints, this paper proposes a multimodal fusion framework that fully leverages the strengths of both PA and PO data. We introduce an innovative pseudo-label aggregation strategy for PO data based on the geographic coverage of satellite imagery, enabling geographic alignment between the label space and remote sensing feature space. In terms of model architecture, we adopt Swin Transformer Base as the backbone for satellite imagery, utilize the TabM network for tabular feature extraction, retain the Temporal Swin Transformer for time-series modeling, and employ a stackable serial tri-modal cross-attention mechanism to optimize the fusion of heterogeneous modalities. Furthermore, empirical analysis reveals significant geographic distribution shifts between PA training and test samples, and models trained by directly mixing PO and PA data tend to experience performance degradation due to label noise in PO data. To address this, we draw on the mixture-of-experts paradigm: test samples are partitioned according to their spatial proximity to PA samples, and different models trained on distinct datasets are used for inference and post-processing within each partition. Experiments on the GeoLifeCLEF 2025 dataset demonstrate that our approach achieves superior predictive performance in scenarios with limited PA coverage and pronounced distribution shifts.