arxiv_cs_ai 2026年2月10日

ゲーム理論的で進行性のある共進化に対するLLMベースのヒューリスティック発見

Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery

Translated: 2026/2/14 6:30:20

Japanese Translation

大規模言語モデル (LLMs) は自動的なヒューリスティックの発見 (AHD) の急速な推進に貢献していますが、現存する多くの方法には静的評価対象とする固定されたインスタンス分布に対する限界があり、データ型の移行時に潜在的に過fitting および不適格の予測を行導します。我々は Algorithm Space Response Oracles (ASRO) を提案し、ゲーム理論的なフレームワークでヒューリスティックの発見をプロットプログラムレベルでの解釂けとインスタンス生成間の共進化という角度から考える新しい観点を与えています。ASRO は両方の側で成長する戦略プールを持っています。またミックスされた対手戦略を使って LLM を使った最適反応オクレーションを利用して拡張していきます。それは静的評価代替し、自働生成したカリキュラムを使用した適合的な進化へと替わります。カブトロボイジングの複合制御の各分野で ASRO は静的トレーニングされたAHD 基本のプロキシに対して圧倒的に優れ、異なるものやアウトオーディションのインスタンスに対する改善されつつ多様な予測可能性と強固性が得られました。

Original Content

arXiv:2601.22896v2 Announce Type: replace Abstract: Large language models (LLMs) have enabled rapid progress in automatic heuristic discovery (AHD), yet most existing methods are predominantly limited by static evaluation against fixed instance distributions, leading to potential overfitting and poor generalization under distributional shifts. We propose Algorithm Space Response Oracles (ASRO), a game-theoretic framework that reframes heuristic discovery as a program level co-evolution between solver and instance generator. ASRO models their interaction as a two-player zero-sum game, maintains growing strategy pools on both sides, and iteratively expands them via LLM-based best-response oracles against mixed opponent meta-strategies, thereby replacing static evaluation with an adaptive, self-generated curriculum. Across multiple combinatorial optimization domains, ASRO consistently outperforms static-training AHD baselines built on the same program search mechanisms, achieving substantially improved generalization and robustness on diverse and out-of-distribution instances.