arxiv_cs_ai 2026年4月24日

動的なテストタイム計算割り当てと進化する文脈内デモンストレーション

Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

Translated: 2026/4/24 20:15:03

adaptive-compute-allocationtest-time-computein-context-demonstrationsllm-optimizationscaling-law

Japanese Translation

arXiv:2604.21018v1 発表タイプ：新しい要約：テストタイム計算の規模拡大はモデルのパフォーマンスを著しく向上させる可能性がありますが、既存のアプローチは静的な計算割り当てや固定された生成分布からのサンプリングに依存しています。本研究では、計算をどこに費やすかどのように生成を行うかを同時に適応させるテストタイム計算割り当てフレームワークを導入します。私らの手法は、初期のウォームアップフェーズで易しいクエリを特定し、テストセット自体から質問-回答ペアの初期プールを作成することから始めます。その後、適応フェーズでは未解決クエリに計算を集中させ、進化する文脈内デモンストレーションを通じて生成分布を再整形します。各生成は、語義的に関係するクエリからの成功した応答を条件として取り、固定分布から再サンプリングするのではなく行います。実験の結果、数学、コーディング、推論などのベンチマークで、私らのアプローチは既存のベースラインを常に上回り、大幅に少ない推論時間計算で結果を出したことを示しています。

Original Content

arXiv:2604.21018v1 Announce Type: new Abstract: While scaling test-time compute can substantially improve model performance, existing approaches either rely on static compute allocation or sample from fixed generation distributions. In this work, we introduce a test-time compute allocation framework that jointly adapts where computation is spent and how generation is performed. Our method begins with a warm-up phase that identifies easy queries and assembles an initial pool of question-response pairs from the test set itself. An adaptive phase then concentrates further computation on unresolved queries while reshaping their generation distributions through evolving in-context demonstrations -- conditioning each generation on successful responses from semantically related queries rather than resampling from a fixed distribution. Experiments across math, coding, and reasoning benchmarks demonstrate that our approach consistently outperforms existing baselines while consuming substantially less inference-time compute.