arxiv_cs_ai 2026年2月10日

VERIFY-RL: マインドリールの確認可能な重回帰分解

VERIFY-RL: Verifiable Recursive Decomposition for Reinforcement Learning in Mathematical Reasoning

Translated: 2026/3/7 9:01:55

Japanese Translation

言語モデルの複雑な数学的問題の解決の訓練には、段階的に簡單なサブ問題を学ぶことが効果的です。しかし、現在の分離方法はしばしばテクニック的ですので、分解されたサブ問題がどのように簡単で何であるか提供しません。パパータスクへの解き方や、その関係性が数学的には証明可能です。我々は微積学ルールにより自然な構造を持つことのできるように見つかったというものです: 分析法の法則を示しました。実際に特定の表現を単位の部品に減価します。我々はVerify-RL、これで各パパータスク子分解が三つの確認可能な条件を満足しているフレームワークを導入し始めました: 関連性を示し減少させること、解を包含すること、そして正式なルール推論。テクニックアングルでは多くの組合作業は無効であり、その特定の確証のために何らかの割合が必要です。しかし、我々の性質は符号化計算通じて自動確認可能なため、「構築を証明する」が可能でしょう。実験で、非効率な分解の排除が大きな利益をもたらします。最難な問題への確率はから2倍に達します：32％から68％、全体的な40%以上の改善があります。

Original Content

arXiv:2602.07559v1 Announce Type: new Abstract: Training language models to solve complex mathematical problems benefits from curriculum learning progressively training on simpler subproblems. However, existing decomposition methods are often heuristic, offering no guarantees that subproblems are simpler, that solving them aids the parent task, or that their relationships are mathematically grounded. We observe that symbolic differentiation provides a natural structure for verified decomposition: calculus rules explicitly define how expressions reduce to simpler components with provable properties. We introduce Verify-RL, a framework where every parent-child decomposition satisfies three verifiable conditions: strictly decreasing structural complexity, solution containment, and formal rule derivation. Unlike heuristic methods where a significant fraction of decompositions are invalid our properties admit automatic verification through symbolic computation, achieving "verification by construction" Experiments demonstrate that eliminating invalid decompositions yields sizable gains, accuracy on the hardest problems more than doubles from 32% to 68%, with a 40% relative improvement overall.