arxiv_cs_ai 2026年2月10日

機能レベルの検証による因果推定

Effect-Level Validation for Causal Discovery

Translated: 2026/3/7 10:05:10

generalizationcausal inferencetelemetry-driven system

Japanese Translation

因果の発見は、ユーザーファACINGな介入の効果を推測するための大規模なテレメtriesデータでincreasingly忦くており、その有効性がフィードバック駆動システム内での強力な自己選択に適しているかは不明です。この論文では、発見されたグラフを構造的事実と捉えて、適合性、安定性和偽証だけでなく、それらを識別可能性、安定性和偽証を評価するための効果重視、合意優先フレームワークを提案します。客観的的研究として、私たちはゲームに存在する対策ゲームへの早期露出が短期的な保有に与える影響について実際を行います。我々はその結果いくつかの統計的には可能な推測発見出力が最小限の瞬間的および準義的な制約を強化して点特定可能な因果質問が存在しないことを示し、識別可能性が決定サポートの重要な bottleneckであることを提示します。識別性があると他のアルゴリズムファミリは直接治療アウトクロスエッジが空であり影響は伝播的に causalプロトコルを通じて維持されることにしても似た決定的効果を出力します。これはプラ placebo、サブサンプリング、および感度否定によって生き残る。それとは対照的に、他のメソッドの一部はエンドポイントの曖昧さのもとで適応性が分散しているまたはその程度により弱い効果を呈示します。これらの結果から、特定の場合にはターゲット検査に対する因果的信頼性はグラフレベルのメトリックの単独では十分な付与代理とは限らないことを示しています。したがって、テレメtriesに基づいたシステムで信用できるカウンティ推論は、因果構造回復だけで対応せずに合意および効果レベルの検証を優先する必要があります。

Original Content

arXiv:2602.08340v1 Announce Type: new Abstract: Causal discovery is increasingly applied to large-scale telemetry data to estimate the effects of user-facing interventions, yet its reliability for decision-making in feedback-driven systems with strong self-selection remains unclear. In this paper, we propose an effect-centric, admissibility-first framework that treats discovered graphs as structural hypotheses and evaluates them by identifiability, stability, and falsification rather than by graph recovery accuracy alone. Empirically, we study the effect of early exposure to competitive gameplay on short-term retention using real-world game telemetry. We find that many statistically plausible discovery outputs do not admit point-identified causal queries once minimal temporal and semantic constraints are enforced, highlighting identifiability as a critical bottleneck for decision support. When identification is possible, several algorithm families converge to similar, decision-consistent effect estimates despite producing substantially different graph structures, including cases where the direct treatment-outcome edge is absent and the effect is preserved through indirect causal pathways. These converging estimates survive placebo, subsampling, and sensitivity refutation. In contrast, other methods exhibit sporadic admissibility and threshold-sensitive or attenuated effects due to endpoint ambiguity. These results suggest that graph-level metrics alone are inadequate proxies for causal reliability for a given target query. Therefore, trustworthy causal conclusions in telemetry-driven systems require prioritizing admissibility and effect-level validation over causal structural recovery alone.