arxiv_cs_lg 2026年4月24日

ベイズ軟最大値ガットのミックス・オブ・エキスパートモデルについて

On Bayesian Softmax-Gated Mixture-of-Experts Models

Translated: 2026/4/24 20:05:38

bayesianmixture-of-expertssoftmax-gatingmachine-learningdensity-estimation

Japanese Translation

arXiv:2604.20551v1 発表タイプ：クロス要約：ミックス・オブ・エキスパートモデルは、入力依存のガティング機構を通じて複数の専門モデルを組み合わせることで、複雑な確率的入力-出力関係を学習するための柔軟な枠組みを提供します。これらのモデルは現代的機械学習においてますます重要になっていますが、ベイズ枠組みにおける理論的性質はほぼ未探究です。本論文では、広く見られる軟最大値ベースのガティング機構に焦点を当てたベイズミックス・オブ・エキスパートモデルを研究します。具体的には、密度推定、パラメータ推定、モデル選択という 3 つの基本的な統計学的タスクにおける後方分布のアシムプトチック動作について調査します。まず、密度推定における後方収束速度を確立し、エキスパートの数が固定または既知の場合、そして学習可能なランダムなエキスパート数の場合について述べています。次に、 tailored Voronoi 型損失に基づく収束保証を導出します。この損失は、ミックス・オブ・エキスパートモデルの複雑な識別可能性構造を考慮しています。最後に、エキスパートの数を決定するための 2 つの補完戦略を提案し、分析します。これらの結果は、ベイズミックス・オブ・エキスパートモデルにおける軟最大値ガティングの 1 つ最初の系統的理論解析を提供し、実践的なモデル設計のためのいくつかの理論に基づく洞察をもたらします。

Original Content

arXiv:2604.20551v1 Announce Type: cross Abstract: Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly prominent in modern machine learning, yet their theoretical properties in the Bayesian framework remain largely unexplored. In this paper, we study Bayesian mixture-of-experts models, focusing on the ubiquitous softmax-based gating mechanism. Specifically, we investigate the asymptotic behavior of the posterior distribution for three fundamental statistical tasks: density estimation, parameter estimation, and model selection. First, we establish posterior contraction rates for density estimation, both in the regimes with a fixed, known number of experts and with a random learnable number of experts. We then analyze parameter estimation and derive convergence guarantees based on tailored Voronoi-type losses, which account for the complex identifiability structure of mixture-of-experts models. Finally, we propose and analyze two complementary strategies for selecting the number of experts. Taken together, these results provide one of the first systematic theoretical analyses of Bayesian mixture-of-experts models with softmax gating, and yield several theory-grounded insights for practical model design.