arxiv_cs_cv 2026年2月10日

パッチベースのトランスフォーマー法を用いた衛星画像による建物被害検出

Building Damage Detection using Satellite Images and Patch-Based Transformer Methods

Translated: 2026/3/15 19:04:40

vision-transformersatellite-image-analysispatch-based-methodsdisaster-responsedino2

Japanese Translation

arXiv:2602.08117v1 Announce Type: new 本文書は、被災後の対応において迅速な建物被害の評価が極めて重要であるとともに、衛星画像上に構築された被害分類モデルが状況認識を可能な限りスケーラブルにする手段を提供することを説明する。しかし、衛星データにおけるラベルノイズと深刻なクラスアンバランスは主要な課題となっている。xrd データセットは、多様な地理的領域における建物レベルの被害にわたる標準的なベンチマークを提供する。本研究では、xrd データセット上で、ノイズがあり、アンバランスなデータでトレーニングした場合にこれらのモデルがどの種類の構造的被害を区別できるかを調査している。本研究では、マルチクラス被害分類を評価するために、DINOv2-small と DeiT を具体的に評価した。トレーニング中の構造的特徴を隔離し、バックグラウンドノイズを最小限に抑えるための目的指向なパッチベースの前処理パイプラインを提案した。また、計算要件を管理可能に保つための固定ヘッドの微調整戦略を採用した。モデルのパフォーマンスは、精度、精密率、再現率、およびマクロ平均 F1 スコアを通じて評価された。我々は、我々が提案した新しいトレーニング手法を用いた小さな ViT アーキテクチャが、従来の CNN ベースラインに比べて災害分類において競争力的なマクロ平均 F1 スコアを達成していることを示した。

Original Content

arXiv:2602.08117v1 Announce Type: new Abstract: Rapid building damage assessment is critical for post-disaster response. Damage classification models built on satellite imagery provide a scalable means of obtaining situational awareness. However, label noise and severe class imbalance in satellite data create major challenges. The xBD dataset offers a standardized benchmark for building-level damage across diverse geographic regions. In this study, we evaluate Vision Transformer (ViT) model performance on the xBD dataset, specifically investigating how these models distinguish between types of structural damage when training on noisy, imbalanced data. In this study, we specifically evaluate DINOv2-small and DeiT for multi-class damage classification. We propose a targeted patch-based pre-processing pipeline to isolate structural features and minimize background noise in training. We adopt a frozen-head fine-tuning strategy to keep computational requirements manageable. Model performance is evaluated through accuracy, precision, recall, and macro-averaged F1 scores. We show that small ViT architectures with our novel training method achieves competitive macro-averaged F1 relative to prior CNN baselines for disaster classification.