arxiv_cs_lg 2026年4月24日

テキスト埋め込みを用いたドメイン知識ゼロのアルゴリズム選択

Algorithm Selection with Zero Domain Knowledge via Text Embeddings

Translated: 2026/4/24 20:01:17

algorithm-selectiontext-embeddingszero-shot-learningmachine-learningaslib-benchmark

Japanese Translation

arXiv:2604.19753v1 Announce Type: cross 抽出: 我々は、手動設計されたインスタンス特徴を事前学習されたテキスト埋め込みで置換する、特徴フリーなアルゴリズム選択のアプローチを提案します。我々の手法、ZeroFolio は 3 つのステップで進めます：生のインスタンスファイルを読み取って通常テキストに変換する、事前学習された埋め込みモデルで埋め込む、そして重み付けされた k 近傍法を用いてアルゴリズムを選択する。我々のアプローチの鍵となるのは、事前学習された埋め込みがドメイン知識やタスク特異的なトレーニングを行わずとも問題インスタンスを区別する表現を生み出すという観察です。これにより、私達はテキストベースのインスタンス形式を持つ多様な問題ドメインに対して、同じ 3 ステップのピプロライン（シリアライズ、埋め込み、選択）を適用することができます。我々は SAT、MaxSAT、QBF、ASP、CSP、MIP、およびグラフ問題の 7 つのドメインにまたがる 11 つの ASlib シナリオに基づき、我々のアプローチを評価しました。我々の実験は、このアプローチが 1 つの固定構成で手動設計された特徴に-trainedされたランダムフォレストを上回る場合が 11 つのシナリオの 10 つであり、2 種子投票で 11 つのすべてのシナリオにおいて上回ることを示しています。また、このマージンはしばしば著しいものです。我々のアブレーション研究は、逆距離重み付け、ラインのシャッフル、そしてマンハッタン距離が重要な設計選択であることを示しました。両方の選択者が競争力があるシナリオにおいて、埋め込みと手動設計された特徴をソフト投票で組み合わせることで、さらなる改善が見られます。

Original Content

arXiv:2604.19753v1 Announce Type: cross Abstract: We propose a feature-free approach to algorithm selection that replaces hand-crafted instance features with pretrained text embeddings. Our method, ZeroFolio, proceeds in three steps: it reads the raw instance file as plain text, embeds it with a pretrained embedding model, and selects an algorithm via weighted k-nearest neighbors. The key to our approach is the observation that pretrained embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training. This allows us to apply the same three-step pipeline (serialize, embed, select) across diverse problem domains with text-based instance formats. We evaluate our approach on 11 ASlib scenarios spanning 7 domains (SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems). Our experiments show that this approach outperforms a random forest trained on hand-crafted features in 10 of 11 scenarios with a single fixed configuration, and in all 11 with two-seed voting; the margin is often substantial. Our ablation study shows that inverse-distance weighting, line shuffling, and Manhattan distance are the key design choices. On scenarios where both selectors are competitive, combining embeddings with hand-crafted features via soft voting yields further improvements.