arxiv_cs_ai 2026年2月10日

refutability gap: ラングーージャン変換の理由を認めるための挑戦

The Refutability Gap: Challenges in Validating Reasoning by Large Language Models

Translated: 2026/2/14 8:11:32

Japanese Translation

リレートビーナスギャップ（ここでは「refutability gap」という表現を使用します）は、非常に多大な言語モデルが新しい科学を導出し、一般的人間レベルの一般的な知能を持つ能力があるという報告に対して問題を引き起こしています。彼らがポッパーの可否性原理(Term: falsifiability)を必要としないためです。この原理(あるいは可否性)は、科学的な主張が否定可能であることを要求します。私たちの新しい学問的手法的な懸念は、現在のAIの推理研究における複雑なデータの非透明性や非検索対象トレーニングデータについて新たな発見を確認できないこと、モデルの継続的更新によって再現不可能に陥ること、そしてヒトとのインタラクションのテキストがあたかも科学発明の真実のある源が何であるのかを見失わせることなどを指します。さらに、反可能性とデータに関する失敗の事項がないのが、LLM Capabilityを過大評価する選択的偏差を作ります。この課題に取り組むためには、多大な言語モデルの推理研究に対する科学的な透明性と再現性ガイドラインが提案される必要があると考えています。これらガイドラインの設立は、科学の信頼性と、公平なデータの使用について現在の社会議論にとって重要であることを示しています。

Original Content

arXiv:2601.02380v2 Announce Type: replace-cross Abstract: Recent reports claim that Large Language Models (LLMs) have achieved the ability to derive new science and exhibit human-level general intelligence. We argue that such claims are not rigorous scientific claims, as they do not satisfy Popper's refutability principle (often termed falsifiability), which requires that scientific statements be capable of being disproven. We identify several methodological pitfalls in current AI research on reasoning, including the inability to verify the novelty of findings due to opaque and non-searchable training data, the lack of reproducibility caused by continuous model updates, and the omission of human-interaction transcripts, which obscures the true source of scientific discovery. Additionally, the absence of counterfactuals and data on failed attempts creates a selection bias that may exaggerate LLM capabilities. To address these challenges, we propose guidelines for scientific transparency and reproducibility for research on reasoning by LLMs. Establishing such guidelines is crucial for both scientific integrity and the ongoing societal debates regarding fair data usage.