arxiv_cs_ai 2026年4月24日

推論の「なぜ」を結線する：大規模言語モデルにおける帰納的推論の統合分類体系と調査

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

Translated: 2026/4/24 20:30:50

abductive-reasoninglarge-language-modelsai-ai-researchreasoning-capabilitiesmachine-learning

Japanese Translation

arXiv:2604.08016v2 Announce Type: replace Abstract: 人間の発見と認識の基礎的な役割を果たしている帰納的推論（観察事実に最も可能性の高い説明を導き出す推論）は、大規模言語モデル（LLMs）において相対的に十分に研究されていない。LLM の急速な進歩にもかかわらず、帰納的推論とその多様な側面に関する研究は統合されたものではなく断片的なままであった。この論文では、大規模言語モデルにおける帰納的推論に関する初めての調査を提起し、その軌跡を哲学的基礎から現代的な AI 実装へと追及する。この分野で広く存在する概念的不確実性と断片的なタスク定義に対処するために、我々は先の研究を形式カテゴリとして定式化する統合的な 2 ステージの定義を設定した。この定義は、帰納的推論を「仮説生成」（モデルが認識的なギャップを橋渡しして候補説明を生成する段階）と「仮説選別」（生成された候補が評価され、最も可能性の高い説明が選択される段階）に分離する。この基礎の上に、我々は文献に関する包括的な分類体系を提示し、先の研究をその帰納的タスク、データセット、基盤となる方法論、および評価戦略に基づいて分類する。我々のフレームワークを経験的に裏付けるために、現在の LLM の帰納的タスクに対するコンパクトベンチマーク調査を実施し、モデルサイズ、モデルファミリー、評価スタイル、および生成タスクと選別タスクの類型の間にわたる集中的な比較分析を行った。さらに、最近の経験的結果を総合成成させることで、LLM の帰納的推論性能が論理的推論と帰納的推論のタスクとどのように関連するかを検討し、より広い推論能力に関する洞察を提供した。我々の分析は、現在のアプローチにおける決定的な欠如を明らかにするものであり、それは静的ベンチマーク設計と狭隘なドームのカバレッジ、狭隘なトレーニングフレームワーク、および帰納的プロセスの限定的な機能的理解から生じている...

Original Content

arXiv:2604.08016v2 Announce Type: replace Abstract: Regardless of its foundational role in human discovery and sense-making, abductive reasoning--the inference of the most plausible explanation for an observation--has been relatively underexplored in Large Language Models (LLMs). Despite the rapid advancement of LLMs, the exploration of abductive reasoning and its diverse facets has thus far been disjointed rather than cohesive. This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into Hypothesis Generation, where models bridge epistemic gaps to produce candidate explanations, and Hypothesis Selection, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing prior work based on their abductive tasks, datasets, underlying methodologies, and evaluation strategies. In order to ground our framework empirically, we conduct a compact benchmark study of current LLMs on abductive tasks, together with targeted comparative analyses across model sizes, model families, evaluation styles, and the distinct generation-versus-selection task typologies. Moreover, by synthesizing recent empirical results, we examine how LLM performance on abductive reasoning relates to deductive and inductive tasks, providing insights into their broader reasoning capabilities. Our analysis reveals critical gaps in current approaches--from static benchmark design and narrow domain coverage to narrow training frameworks and limited mechanistic understanding of abductive processes...