arxiv_cs_lg 2026年4月20日

Augmented-Action Surrogates を超えたマルチエクスパート学習に遅延交付

Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

Translated: 2026/4/20 11:08:27

learning-to-defermulti-expertneural-networksstatistical-estimationdeep-learning

Japanese Translation

arXiv:2604.09414v2 Announce Type: replace-cross サブスクリプション: arXiv:2604.09414v2 Announce Type: replace-cross 要旨: 既存のマルチエクスパート学習に遅延交付サロゲートは統計的に一貫性があるにもかかわらず、アンダーフィットを行う、有用なエクスパートを抑制するか、エクスポートプールの拡大につれて劣化することがあります。これらの失敗は、クラスとエクスパートを单一の拡張予測幾何学内でアクションとして casting するという共有アーキテクチャの選択に起因すると追跡しました。一貫性は目標集団についてのみ統治し、サロゲートが訓練中に勾配質量をどのように分布させるかについては何も述べていません。5 つのサロゲートを両方の軸に沿って分析し、それぞれが別の解決策に対する修正と別の失敗に対する交換であることを示しました。その後、クラス後方を softmax と、各エクスポート効用を独立した sigmoid で推定する結合されたサロゲートを導入しました。それは $\mathcal{H}$-一貫性束を許容し、その定数は固定された各エクスポート重み $\beta{=}\lambda/J$ に対して $J$ に依存せず、そしてその勾配は拡張家族の増幅、飢餓、および結合の病態から自由です。合成ベンチマーク、CIFAR-10、CIFAR-10H、そして Covertype における実験が、結合されたサロゲートは冗長性下での増幅を回避し、稀な専門家を保存し、すべての設定で単独の分類器を一貫して超過することを確認しました。

Original Content

arXiv:2604.09414v2 Announce Type: replace-cross Abstract: Existing multi-expert learning-to-defer surrogates are statistically consistent, yet they can underfit, suppress useful experts, or degrade as the expert pool grows. We trace these failures to a shared architectural choice: casting classes and experts as actions inside one augmented prediction geometry. Consistency governs the population target; it says nothing about how the surrogate distributes gradient mass during training. We analyze five surrogates along both axes and show that each trades a fix on one for a failure on the other. We then introduce a decoupled surrogate that estimates the class posterior with a softmax and each expert utility with an independent sigmoid. It admits an $\mathcal{H}$-consistency bound whose constant is $J$-independent for fixed per-expert weight $\beta{=}\lambda/J$, and its gradients are free of the amplification, starvation, and coupling pathologies of the augmented family. Experiments on synthetic benchmarks, CIFAR-10, CIFAR-10H, and Covertype confirm that the decoupled surrogate is the only method that avoids amplification under redundancy, preserves rare specialists, and consistently improves over a standalone classifier across all settings.