arxiv_cs_lg 2026年4月24日

ノ-アナログ分布シフト下における気候ファウンデーションモデルの強固性の評価

Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts

Translated: 2026/4/24 20:10:24

climate-modelingmachine-learningdistribution-shiftclimate-changeearth-system-models

Japanese Translation

arXiv:2603.23043v2 Announce Type: replace 摘要：気候変動の加速化は、機械学習に基づく気候エミュレーターの訓練分布を超えて一般化する能力に深刻な非定常性を生じさせ、その課題を提起している。これらのエミュレーターは従来の地球システムモデルに比べて計算効率的な代替策を提供しているが、「ノ-アナログ」未来気候状態下での信頼性が、潜在的なボトルネックとなっている。ここで我々はこの「ノ-アナログ」未来気候状態を、外部強制力によってシステムが歴史的训练データの経験的な範囲外へ進入する状態とする。この信頼性を評価するための根本的な課題として、データの汚染が存在する。多くのモデルはすでに将来シナリオを含むシミュレーションで訓練されており、真の分布外（OOD）パフォーマンスがしばしばマスクされるためである。これに対処するために、我々は U-Net、ConvLSTM、および歴史的なみずか（1850-2014）のみに制限された訓練レジームに特化した ClimaX ファウンデーションモデルの 3 つの最先端アーキテクチャの OOD 強固性をベンチマークした。これらのモデルは、次の 2 つの補完的な戦略を用いて評価された：(i) 最近の気候（2015-2023）への時系列外挿と (ii) 発散する排出経路を跨いでクロスシナリオの強制力シフト。この実験セットアップ内での我々の分析は、精度と安定性のトレードオフを明らかにした：ClimaX ファウンデーションモデルは絶対誤差が最も低く達成されたが、分布シフト下では相対的なパフォーマンス変化が高く、極端な強制力シナリオ下では降水誤差が最大 8.44% 増加することが観察された。これらの見解は、歴史的訓練ダイナミクスに制限された場合、 even 高キャパシティのファウンデーションモデルも外部強制力軌跡に対して敏感であることを示唆している。我々の結果は、変化する気候下における気候エミュレーターの強固性を保証するために、シナリオ意識的な訓練と厳格な OOD 評価プロトコルの必要性を強調している。

Original Content

arXiv:2603.23043v2 Announce Type: replace Abstract: The accelerating pace of climate change introduces profound non-stationarities that challenge the ability of Machine Learning based climate emulators to generalize beyond their training distributions. While these emulators offer computationally efficient alternatives to traditional Earth System Models, their reliability remains a potential bottleneck under "no-analog" future climate states, which we define here as regimes where external forcing drives the system into conditions outside the empirical range of the historical training data. A fundamental challenge in evaluating this reliability is data contamination; because many models are trained on simulations that already encompass future scenarios, true out-of-distribution (OOD) performance is often masked. To address this, we benchmark the OOD robustness of three state-of-the-art architectures: U-Net, ConvLSTM, and the ClimaX foundation model specifically restricted to a historical-only training regime (1850-2014). We evaluate these models using two complementary strategies: (i) temporal extrapolation to the recent climate (2015-2023) and (ii) cross-scenario forcing shifts across divergent emission pathways. Our analysis within this experimental setup reveals an accuracy vs. stability trade-off: while the ClimaX foundation model achieves the lowest absolute error, it exhibits higher relative performance changes under distribution shifts, with precipitation errors increasing by up to 8.44% under extreme forcing scenarios. These findings suggest that when restricted to historical training dynamics, even high-capacity foundation models are sensitive to external forcing trajectories. Our results underscore the necessity of scenario-aware training and rigorous OOD evaluation protocols to ensure the robustness of climate emulators under a changing climate.