arxiv_cs_lg 2026年2月10日

常に最高パフォーマンスのモデルを選ぶ必要はない：大規模言語モデルアンサンブル選択の情報理論的視点

Don't Always Pick the Highest-Performing Model: An Information Theoretic View of LLM Ensemble Selection

Translated: 2026/3/15 14:50:06

llmensemble-selectionmutual-informationinformation-theorydeep-learning

Japanese Translation

arXiv:2602.08003v1 発表タイプ：新規要旨：大規模言語モデル（LLMs）は、全体の信頼性と強固さを高めるためにしばしばアンサンブルされますが、実際にはモデル間の相関は非常に高いです。これにより、LLM アンサンブルを形成する際にどのモデルを選択すべきかという根本的な問いが提起されます。我々は、制約付きアンサンブル選択を、選択されたモデルと真ラベルとの相互情報量を最大化する問題として定式化します。さらに、多くのモデルがあってもパフォーマンスが飽和する理由を説明するため、モデルの相関付き誤差をガウスコパーラを用いてモデル化し、アンサンブルのパフォーマンスに対する情報理論的誤差フロアを示します。これに基づき、データから直接的に必要な情報項を推定し、クエリ予算の下で反復的にアンサンブルを構築する単純な貪欲な相互情報量選択アルゴリズムを提案します。我々は、2 つの質問応答データセット（MEDMCQA、MMLU）と 1 つの二値感情分析データセット（IMDB 映画レビュー）でこのアプローチを試験的に実行しました。すべてのデータセットにおいて、同等のクエリ予算条件下で我々の手法が強力な基線よりも一貫して優れていることが観察されました。

Original Content

arXiv:2602.08003v1 Announce Type: new Abstract: Large language models (LLMs) are often ensembled together to improve overall reliability and robustness, but in practice models are strongly correlated. This raises a fundamental question: which models should be selected when forming an LLM ensemble? We formulate budgeted ensemble selection as maximizing the mutual information between the true label and predictions of the selected models. Furthermore, to explain why performance can saturate even with many models, we model the correlated errors of the models using Gaussian-copula and show an information-theoretic error floor for the performance of the ensemble. Motivated by these, we propose a simple greedy mutual-information selection algorithm that estimates the required information terms directly from data and iteratively builds an ensemble under a query budget. We test our approach in two question answering datasets and one binary sentiment classification dataset: MEDMCQA, MMLU, and IMDB movie reviews. Across all datasets, we observe that our method consistently outperforms strong baselines under the same query budget.