arxiv_cs_cv 2026年4月20日

リアルなオープンバークアブリモートセンシング分割への道：ベンチマークとベースライン

Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline

Translated: 2026/4/20 10:41:08

open-vocabularyremote-sensingsegmentationbenchmarksimage-parsing

Japanese Translation

arXiv:2604.15652v1 Announce Type: new 要旨：オープンバークアブリモートセンシングイメージ分割（OVRSIS）は、断片的なデータセット、トレーニングの多様性の不足、実地的な地理空間応用ニーズを反映しない評価ベンチマークの欠如などにより、まだ十分に探索されていない状態です。私たちの以前の作品「OVRSISBenchV1」は、初期のクロスデータセット評価プロトコルを確立しましたが、その限られた範囲は、リアルなオープンワールドの汎化能力を評価するには不十分です。この課題に対処するため、私たちが提案するのは、OVRSISのための大規模で応用指向なベンチマークである「OVRSISBenchV2」です。まず、私たちが構築するのは、多様なリモートセンシングシーンにわたる 35 個の一般的なセマンティックカテゴリをカバーする、約 95,000 個の画像 - マスクペアからなるバランスの取れたデータセット「OVRSIS95K」です。 OVRSIS95K と 10 つのダウンストリームデータセットに基づいて、OVRSISBenchV2 は 170,000 画像と 128 カテゴリを含み、シーン多様性、セマンティックカバレッジ、評価の難易度を大幅に拡大します。標準的なオープンバークアブ分割だけでなく、ビルド構築、道路抽出、洪水検出のためのダウンストリームプロトコルも含まれるため、よりリアルな地理空間応用ニーズと複雑なデプロイメントシナリオをよりよく反映します。また、私たちが提案するのは「Pi-Seg」、OVRSISのためのベースラインです。 Pi-Seg は、学習可能かつセマンティックに導かれた擾乱がトレーニング中に視覚 - テキスト特徴空間を拡げる「ポジティブインセンティブノイズ」メカニズムを通じて、転移性を改善します。 OVRSISBenchV1、OVRSISBenchV2 およびダウンストリームタスクへの広範な実験は、Pi-Seg が、特により困難な OVRSISBenchV2 ベンチマークにおいて強い一貫性の高い結果をもたらすことを示しています。私たちの結果は、リアルなベンチマーク設計の重要性と、摂動に基づく転移が OVRSIS に対して効果的であるという点に光を当てています。コードとデータセットは、 https://github.com/LiBingyu01/RSKT-Seg/tree/Pi-Seg に利用可能です。

Original Content

arXiv:2604.15652v1 Announce Type: new Abstract: Open-vocabulary remote sensing image segmentation (OVRSIS) remains underexplored due to fragmented datasets, limited training diversity, and the lack of evaluation benchmarks that reflect realistic geospatial application demands. Our previous \textit{OVRSISBenchV1} established an initial cross-dataset evaluation protocol, but its limited scope is insufficient for assessing realistic open-world generalization. To address this issue, we propose \textit{OVRSISBenchV2}, a large-scale and application-oriented benchmark for OVRSIS. We first construct \textbf{OVRSIS95K}, a balanced dataset of about 95K image--mask pairs covering 35 common semantic categories across diverse remote sensing scenes. Built upon OVRSIS95K and 10 downstream datasets, OVRSISBenchV2 contains 170K images and 128 categories, substantially expanding scene diversity, semantic coverage, and evaluation difficulty. Beyond standard open-vocabulary segmentation, it further includes downstream protocols for building extraction, road extraction, and flood detection, thereby better reflecting realistic geospatial application demands and complex deployment scenarios. We also propose \textbf{Pi-Seg}, a baseline for OVRSIS. Pi-Seg improves transferability through a \textbf{positive-incentive noise} mechanism, where learnable and semantically guided perturbations broaden the visual-text feature space during training. Extensive experiments on OVRSISBenchV1, OVRSISBenchV2, and downstream tasks show that Pi-Seg delivers strong and consistent results, particularly on the more challenging OVRSISBenchV2 benchmark. Our results highlight both the importance of realistic benchmark design and the effectiveness of perturbation-based transfer for OVRSIS. The code and datasets are available at \href{https://github.com/LiBingyu01/RSKT-Seg/tree/Pi-Seg}{LiBingyu01/RSKT-Seg/tree/Pi-Seg}.