arxiv_cs_lg 2026年2月10日

アダプティブ検索は LLM の推論を助けるが、実際には使用されない限りほとんど意味がない

Adaptive Retrieval helps Reasoning in LLMs -- but mostly if it's not used

Translated: 2026/3/15 13:04:54

llmreasoningretrieval-augmented-generationchain-of-thoughtmeta-cognition

Japanese Translation

arXiv:2602.07213v1 Announce Type: new この論文では、生成モデルの性能向上のための基本原理を探索します。それは、検索を動的なインコンテキスト学習の形態と見なすことです。LLM エージェントが、推論の過程で外部知識ベースをクエリするタイミングを能動的に決定するアダプティブ検索拡張アーキテクチャを実験的に検証しました。これを、標準的な Chain-of-Thought（CoT）ベースラインと、静的な検索アプローチと比較し、GSM8K と MATH-500 ベンチマークで評価しました。実験の結果、静的な検索が CoT に劣ることが示された一方で、アダプティブ検索は興味深い挙動を示しました。 retrieved 結果を含むトレースは CoT にわずかに性能が低くなる一方、検索を含まないトレースは CoT に比べて性能が高い傾向がありました。これは、(a) 検索が推論に役立つことは稀である（有用な定理を用いた事例などいくつかの反例を示す）、(b) 検索を能動的に使用しないことが良好なモデル性能を示唆することを意味します。さらに、モデルは問題の難易度に応じて検索頻度をスケーリングし、これは検索の決定が重要なメタ認知シグナルであることを強化しています。エージェントが自身の知識を自己評価し、外部情報を選択的に活用する能力は、より堅牢で信頼性の高い生成モデルを構築するための重要な原理です。

Original Content

arXiv:2602.07213v1 Announce Type: new Abstract: Large Language Models (LLMs) often falter in complex reasoning tasks due to their static, parametric knowledge, leading to hallucinations and poor performance in specialized domains like mathematics. This work explores a fundamental principle for enhancing generative models: treating retrieval as a form of dynamic in-context learning. We test an adaptive retrieval-augmented architecture where an LLM agent actively decides when to query an external knowledge base during its reasoning process. We compare this adaptive strategy against a standard Chain-of-Thought (CoT) baseline and a static retrieval approach on the GSM8K and MATH-500 benchmarks. Although our experiments show that static retrieval is inferior to CoT, the adaptive retrieval shows interesting behavior: While traces including retrieved results show slightly worse performance compared to CoT, traces that do not include retrieval actually perform better compared to CoT. This suggests that: (a) retrieval only rarely helps reasoning (we show a few counterexamples, e.g. using useful theorems) and (b) actively not using retrieval is indicative of good model performance. Furthermore, we find that the model scales its retrieval frequency with the difficulty of the problem, reinforcing that the decision to retrieve is a crucial metacognitive signal. The agent's ability to self-assess its knowledge and selectively engage with external information represents a key principle for building more robust and reliable generative models.