arxiv_cs_ai 2026年4月24日

LLM の経済的因果推論における思想偏見

Ideological Bias in LLMs' Economic Causal Reasoning

Translated: 2026/4/24 20:16:27

large-language-modelseconomic-causal-reasoningideological-biaseconbenchmachine-learning-evaluation

Japanese Translation

arXiv:2604.21334v1 発表タイプ：新しい要約：大規模言語モデル（LLM）は、経済的な因果効果に関する因果推論において、系長的な思想偏見を示すのか？LLM が政策分析や経済報道においてますます利用され、方向性が正しい因果判断が不可欠な状況において、この問題は直接的な実務的価値を持つ。我々は、思想に争われる事例（介入志向（政府支持）と市場志向（市場支持）の視点が対照的な因果符号を予測するケース）を含んだ EconCausal ベンチマークを拡張することで、体系的な評価を実施した。トップクラスの経済学・金融学ジャーナルから導出された 10,490 つの因果三元組（効果方向が実証的に確認された治療・結果ペア）から、1,056 つの思想に争われる事例を特定し、実証的に支持された因果符号を予測する能力について、20 つの最先端 LLM を評価した。我々は、思想に争われる項目が非争われない項目に比べて一貫して難しいこと、そして 20 本のモデルのうち 18 本において、実証的に確認された因果符号が介入志向の期待と一致する場合、市場志向の期待と一致する場合よりも系統的に精度が高まることを発見した。さらに、モデルが失敗する際には、その誤った予測が介入志向に偏向して disproportionately（過剰に）偏ることが多く、この方向性的な歪みはワンショットのインコンテキストプロンプトによって排除されることはなかった。これらの結果は、LLM が思想的に争われる経済的な質問において単に不正確であるだけでなく、一方の思想的方向性の他よりも系統的に信頼性の低さを示していることを示しており、これは、高利害の経済的および政策的環境における方向感知した評価の必要性を強調している。

Original Content

arXiv:2604.21334v1 Announce Type: new Abstract: Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question has direct practical stakes. We present a systematic evaluation by extending the EconCausal benchmark with ideology-contested cases - instances where intervention-oriented (pro-government) and market-oriented (pro-market) perspectives predict divergent causal signs. From 10,490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1,056 ideology-contested instances and evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions. We find that ideology-contested items are consistently harder than non-contested ones, and that across 18 of 20 models, accuracy is systematically higher when the empirically verified causal sign aligns with intervention-oriented expectations than with market-oriented ones. Moreover, when models err, their incorrect predictions disproportionately lean intervention-oriented, and this directional skew is not eliminated by one-shot in-context prompting. These results highlight that LLMs are not only less accurate on ideologically contested economic questions, but systematically less reliable in one ideological direction than the other, underscoring the need for direction-aware evaluation in high-stakes economic and policy settings.