arxiv_cs_lg 2026年2月10日

Softmax-Gated Multinomial-Logistic Mixture-of-Experts モデルにおける高速モデル選択と安定な最適化

Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models

Translated: 2026/3/15 7:04:44

mixture-of-expertssoftmax-gatingminorization-maximizationmachine-learningstatistical-modeling

Japanese Translation

arXiv:2602.07997v1 Announce Type: cross 要旨：専門予測子を学習されたゲートで結合する Mixture-of-Experts (MoE) アーキテクチャは回帰および分類の両方で効果的ですが、softmax Multinomial-Logistic ゲートを用いた分類において、安定な最大尤推論および原理的なモデル選択のための厳密な保証はまだ限られています。私たちが解決するのは、全データ（バッチ）制の両方の問題です。第一に、softmax-gated Multinomial-Logistic MoE モデルに対して、明示的な二次の最小化関数を用いたバッチの最小化-最大化（MM）アルゴリズムを導出します。これは座標ごとの閉形式の更新を導き、目的関数の単調な昇進と標準的な MM の意味での不動点へのグローバル収束を保証しつつ、EM 型実装で一般的である近似的な M ステップを回避します。第二に、条件密度推定およびパラメータ回復の有限サンプルの速度を示し、混合測度の dendrogram を分類設定に適応させ、余分な適合された原子をマージする後の近似的にパラメトリックに最速に近い速度を達成するスweep-free 選択器を得ます。生物学的なタンパク質 - タンパク質相互作用予測に対する実験が、強力な統計的および機械学習のベースラインと比較して精度の向上およびより良いカリブラートされた確率を提供することを検証しました。

Original Content

arXiv:2602.07997v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) architectures combine specialized predictors through a learned gate and are effective across regression and classification, but for classification with softmax multinomial-logistic gating, rigorous guarantees for stable maximum-likelihood training and principled model selection remain limited. We address both issues in the full-data (batch) regime. First, we derive a batch minorization-maximization (MM) algorithm for softmax-gated multinomial-logistic MoE using an explicit quadratic minorizer, yielding coordinate-wise closed-form updates that guarantee monotone ascent of the objective and global convergence to a stationary point (in the standard MM sense), avoiding approximate M-steps common in EM-type implementations. Second, we prove finite-sample rates for conditional density estimation and parameter recovery, and we adapt dendrograms of mixing measures to the classification setting to obtain a sweep-free selector of the number of experts that achieves near-parametric optimal rates after merging redundant fitted atoms. Experiments on biological protein--protein interaction prediction validate the full pipeline, delivering improved accuracy and better-calibrated probabilities than strong statistical and machine-learning baselines.