arxiv_cs_lg 2026年4月24日

決定論的に解析された切断されたデコードツリーの探索による効率的な推論時推論

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Translated: 2026/4/24 19:59:09

test-time-inferenceself-consistencydecoding-strategiesreasoning-tracesarxiv-2604-20500

Japanese Translation

arXiv:2604.20500v1 発表タイプ：新要旨：自己整合性手法は、複数の推論経路を並列にサンプリングし投票を行うことで推論時のパフォーマンス向上をもたらします。しかし、数学やコーディングなど制約されたドメインでは、この手法は置換サンプリングにより確率の高い接頭辞や重複する完了を繰り返し再訪れるため計算効率が低下します。我々は、切断されたサンプリングを剪定されたデコードツリーの上での透過として扱い、置換サンプリング代わりに系統的に異なる葉を列挙する決定論的デコード手法「ディスタクトリーフ（DLE）」を提案します。この手法は、アルゴリズム面では既視の確率が高い分岐を探索することで固定予算下で切断された検索領域のカバー率を向上させ、システム面では共有接頭辞を再利用し冗長なトークン生成を減らすことで推論効率を改善します。経験的な検証により、DLE は確率的自己整合性よりも高品質な推論経路を探索し、数学、コーディング、一般的な推論タスクにおいてより優れたパフォーマンスを発揮することが確認されました。

Original Content

arXiv:2604.20500v1 Announce Type: new Abstract: Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly revisiting the same high-probability prefixes and duplicate completions. We propose Distinct Leaf Enumeration (DLE), a deterministic decoding method that treats truncated sampling as traversal of a pruned decoding tree and systematically enumerates distinct leaves instead of sampling with replacement. This strategy improves inference efficiency in two ways. Algorithmically, it increases coverage of the truncated search space under a fixed budget by exploring previously unvisited high-probability branches. Systemically, it reuses shared prefixes and reduces redundant token generation. Empirically, DLE explores higher-quality reasoning traces than stochastic self-consistency, yielding better performance on math, coding, and general reasoning tasks.