arxiv_cs_lg 2026年4月24日

WildFireVQA: 機上で火災モニタリングのための大規模分光放射赤外 VQA ベンチマーク

WildFireVQA: A Large-Scale Radiometric Thermal VQA Benchmark for Aerial Wildfire Monitoring

Translated: 2026/4/24 20:04:15

wildfirevqathermal-imagingremote-sensinglarge-scale-benchmark

Japanese Translation

arXiv:2604.20190v1 告知タイプ: cross 摘要: 火災モニタリングでは、航空プラットフォームからのタイムリーで実行可能な状況認識が必要ですが、既存の空中視覚質問応答（VQA）ベンチマークは、分光放射データに基づく熱的測定に根ざした火災固有のマルチモーダル推論を評価していません。我々は、RGB 画像と分光放射熱データを活用した機上火災モニタリング用大規模 VQA ベンチマークである WildFireVQA を導入しました。WildFireVQA は RGB-thermal サンプル 6,097 を包含し、各サンプルは RGB 画像、カラーマップされた熱的可視化、分光放射 TIFF ファイルを含み、それに対して 34 問の質問がペアリングされています。これにより、存在と検出、分類、分布とセグメンテーション、位置検出と方向、クロスモーダル推論、運用情報用の航法計画を含む、207,298 つの複数選択問題が生成されます。注釈の信頼性を高めるために、マルチモーダル大規模言語モデル（MLLM）に基づく回答生成と、センサー駆動の決定論的ラベリング、手動検証、さらにフレーム内およびフレーム間の整合性チェックを組み合わせています。また、分光放射熱統計を用いた RGB、Thermal、検索補強設定下で代表格 MLLM を評価するための包括的な評価プロトコルも確立しました。実験では、タスクカテゴリー横断して RGB が現在のモデルにおいて最も強力なモーダルリティであることを確認するとともに、検索された熱的コンテキストにより強力な MLLM において改善が見られることを示し、温度に基づく推論の価値と、既存の MLLM が安全に決定的な火災シナリオにおける限界を強調しました。データセットとベンチマークコードはオープンソースで提供されており、https://github.com/mobiiin/WildFire_VQA に利用可能です。

Original Content

arXiv:2604.20190v1 Announce Type: cross Abstract: Wildfire monitoring requires timely, actionable situational awareness from airborne platforms, yet existing aerial visual question answering (VQA) benchmarks do not evaluate wildfire-specific multimodal reasoning grounded in thermal measurements. We introduce WildFireVQA, a large-scale VQA benchmark for aerial wildfire monitoring that integrates RGB imagery with radiometric thermal data. WildFireVQA contains 6,097 RGB-thermal samples, where each sample includes an RGB image, a color-mapped thermal visualization, and a radiometric thermal TIFF, and is paired with 34 questions, yielding a total of 207,298 multiple-choice questions spanning presence and detection, classification, distribution and segmentation, localization and direction, cross-modal reasoning, and flight planning for operational wildfire intelligence. To improve annotation reliability, we combine multimodal large language model (MLLM)-based answer generation with sensor-driven deterministic labeling, manual verification, and intra-frame and inter-frame consistency checks. We further establish a comprehensive evaluation protocol for representative MLLMs under RGB, Thermal, and retrieval-augmented settings using radiometric thermal statistics. Experiments show that across task categories, RGB remains the strongest modality for current models, while retrieved thermal context yields gains for stronger MLLMs, highlighting both the value of temperature-grounded reasoning and the limitations of existing MLLMs in safety-critical wildfire scenarios. The dataset and benchmark code are open-source at https://github.com/mobiiin/WildFire_VQA.