arxiv_cs_lg 2026年2月10日

CausalCompass: 誤設定シナリオにおける時系列因果発見の頑健性評価

CausalCompass: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios

Translated: 2026/3/15 14:49:21

causal-discoverytime-series-analysismachine-learningbenchmarkingrobustness-evaluation

Japanese Translation

arXiv:2602.07915v1 発表タイプ：新規要旨: 時系列からの因果発見は機械学習における基本タスクの一つです。しかし、それは検証不能な因果仮定に依存する点と、既存ベンチマークにおける頑健性指向の評価の欠如という障壁に直面しており、広く採用されることが妨げられています。これらの課題に対処するため、我々は仮定違反下における時系列因果発見（TSCD）メソッドの頑健性を評価するための柔軟で拡張可能なベンチマークスイート「CausalCompass」を提案しました。CausalCompassの実用的な有用性を示すため、8 つの仮定違反シナリオを対象として、代表的な TSCD アルゴリズムの包括的なベンチマーキングを行いました。我々の実験結果は、特定の手法がすべての設定で最適性能を常に発揮することはなく、また多様なシナリオにわたって全体的に優れた性能を示す手法はほぼ例外なく深層学習に基づくアプローチであることを示しています。我々はまた、これらの発見をより深く理解するため、ハイパーパラメータ感度解析も提供しています。また、ある意味で驚くべきことに、NTS-NOTEARS の実践的な場合において標準化された前処理に大きく依存していることがあり、標準的な設定では性能が低く、標準化が行なわれる後では強い性能を発揮することが見出されました。最後に、我々の本工作は仮定違反下における TSCD メソッドを包括的かつ体系的に評価することを目的としており、これにより実世界のアプリケーションにおけるより広範な採用を促進するものと願っています。コードとデータセットは https://github.com/huiyang-yi/CausalCompass から入手可能です。

Original Content

arXiv:2602.07915v1 Announce Type: new Abstract: Causal discovery from time series is a fundamental task in machine learning. However, its widespread adoption is hindered by a reliance on untestable causal assumptions and by the lack of robustness-oriented evaluation in existing benchmarks. To address these challenges, we propose CausalCompass, a flexible and extensible benchmark suite designed to assess the robustness of time-series causal discovery (TSCD) methods under violations of modeling assumptions. To demonstrate the practical utility of CausalCompass, we conduct extensive benchmarking of representative TSCD algorithms across eight assumption-violation scenarios. Our experimental results indicate that no single method consistently attains optimal performance across all settings. Nevertheless, the methods exhibiting superior overall performance across diverse scenarios are almost invariably deep learning-based approaches. We further provide hyperparameter sensitivity analyses to deepen the understanding of these findings. We also find, somewhat surprisingly, that NTS-NOTEARS relies heavily on standardized preprocessing in practice, performing poorly in the vanilla setting but exhibiting strong performance after standardization. Finally, our work aims to provide a comprehensive and systematic evaluation of TSCD methods under assumption violations, thereby facilitating their broader adoption in real-world applications. The code and datasets are available at https://github.com/huiyang-yi/CausalCompass.