arxiv_cs_ai 2026年2月10日

あなたの推論モデルは、思考を絶つタイミングを暗黙的に知っているのか？

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Translated: 2026/3/7 10:07:42

aireasoningsamplingefficient-learning

Japanese Translation

最近の大型推論モデル（LRM）の発展により、複雑な推理タスクの能力が飛躍的に向上し、長いリーチングチェーン（CoTs）が頻繁に使用されました。しかし、この手法は通常より大きな冗長性を引き起こし、計算効率とリアルタイム応用での遅延をもたらします。最近の研究では、理由チェーンの長さはしばしば正確さとは関連なく、時に精度自体に悪影響を与えることが示されました。さらに深い分析により、LRMが思考を一時的に停止する適切なタイミングについて、現在のサンプリング法がその能力を隠していたことを驚くべきことに解明しました。それゆえ、我々は新しいサンプリング手法「SAGE（自覚的指針による効率的な推理）」を導入しました。また、グループベースの強化学習 (SAGES-RL) に SAGE を混在させることで、通常の pass @1 推論に対し SAGES-RL は SAGES 自体から見つけた経験則によって推論精度と効率を高めることができます。これらにより、multiple challenging mathematics ベンチマークで LRMs の推理能力および効率が大幅に向上することが示されています。

Original Content

arXiv:2602.08354v1 Announce Type: new Abstract: Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SAGE-RL) enables SAGE-RL to effectively incorporate SAGE-discovered efficient reasoning patterns into standard pass@1 inference, markedly enhancing both the reasoning accuracy and efficiency of LRMs across multiple challenging mathematical benchmarks.