arxiv_cs_ai 2026年4月24日

テストタイムコンピューティングの戦略的スケーリング：バンジ学習アプローチ

Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

Translated: 2026/4/24 20:29:48

test-time-computelarge-language-modelsbandit-learningmllmin-context-learning

Japanese Translation

arXiv:2506.12721v2 発表タイプ: 置き換え要約：テストタイムコンピューティングのスケーリングは、大規模言語モデルの性能向上に効果的な戦略として顕在化しました。しかし、既存の手法はすべてのクエリに対してコンピューティングリソースを均一に配分しており、クエリの難易度変動を考慮していません。この非効率性を解消するために、私たちがテストタイムコンピューティングの配分を新たなバンジ学習問題として定式化し、クエリの難易度をリアルタイムで推定しそれに合わせてコンピューティングリソースを配分する適応アルゴリズムを提案しました。均一配分と比較して、我々のアルゴリズムは困難なクエリに対してより多くのコンピューティングリソースを割り当てつつ、簡単なクエリの精度を維持します。困難なクエリの間でも、我々のアルゴリズムは解けるインスタンスを優先的に処理し、解けないクエリに対する過剰なコンピューティングを効果的に削減します。理論的に、我々のアルゴリズムは均一配分よりも高いコンピューティング効率を達成することを証明し、数式ベンチマークとコードベンチマークの両方で実用的な有効性を検証しました。具体的には、我々のアルゴリズムは MATH-500 データセットで最大 11.10% の性能向上（相対値 15.04%）、AIME25 データセットで最大 10.82%（相対値 14.44%）、LiveCodeBench データセットで最大 11.23%（相対値 15.29%）の性能向上を達成しました。

Original Content

arXiv:2506.12721v2 Announce Type: replace Abstract: Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To address this inefficiency, we formulate test-time compute allocation as a novel bandit learning problem and propose adaptive algorithms that estimate query difficulty on the fly and allocate compute accordingly. Compared to uniform allocation, our algorithms allocate more compute to challenging queries while maintaining accuracy on easier ones. Among challenging queries, our algorithms further learn to prioritize solvable instances, effectively reducing excessive computing on unsolvable queries. We theoretically prove that our algorithms achieve better compute efficiency than uniform allocation and empirically validate their effectiveness on math and code benchmarks. Specifically, our algorithms achieve up to an 11.10% performance improvement (15.04% relative) on the MATH-500 dataset, up to 10.82% (14.44% relative) on the AIME25 dataset, and up to an 11.23% performance improvement (15.29% relative) on the LiveCodeBench dataset.