arxiv_cs_ai 2026年2月10日

アゲンシングな論理思考の最適化:合成的なセマンティック情報を利点にした検索を促すためのリトリーバル

Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward

Translated: 2026/2/14 6:30:53

Japanese Translation

アゲンシングな論理思考は、大量の推理モデル (LRMs) が動的に外部知識を獲得する能力を提供します。しかし、そのリtrieval タークプロセスを最適化することはまだ困難であり、高密度で本質的な報酬信号がないためです。この論文では、InfoReasoner と呼ばれることにより情報を効果的に探求することの誘導につながる合成的なセマンティック情報得点報酬を紹介します。理論的には、我々は、モデルの信念ステートに対する無知度の減少を情報得点として再定義し、これに対して保証を確立しています、包括的性質のない対応可能性、 telescoping additivity そして信頼性マクロニオンティーク。実際には、手動によるリtrieval アノテーションを使用せずにスケーラビリティのある最適化を行うために提案された出力知覚の本質的なエレメネーターが我々は、モデルの出力分布を使って、テキストの言語意味性のクラスタリングを利用してシナミック的に得点を計算します。得点は報酬は政策を労作経験的な登進の最大化につなげさせることで、 GRPO を通じて効率的な訓練を行うための指針を提供します。七つの質問回答評価ベンチマークでの実験では、 InfoReasoner 誠實した強力なリtrieval 增强ベースラインに常により上回りました。最低でも平均で5.4パーセントの正確性の向上が達成されます。我々の作業は、アゲンシングとしてリtrieval を使用するための理論的に固めたスケーラブルな道筋を示します。そのコードは https://github.com/dl-m9/InfoReasoner に公開されています。

Original Content

arXiv:2602.00845v2 Announce Type: replace Abstract: Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner