arxiv_cs_gr 2026年4月17日

エピトープシフト条件下における信頼性の高い TCR–pMHC 結合予測のための定式化棄却

Calibrated Abstention for Reliable TCR--pMHC Binding Prediction under Epitope Shift

Translated: 2026/4/17 3:27:01

tcr-pmhcselective-predictionepitope-shiftcalibrated-abstentionconformal-prediction

Japanese Translation

arXiv:2604.13254v1 Announce Type: new 要約：T 細胞受容体 (TCR)–ペプチド-MHC(pMHC) 結合の予測は、ワクチン設計や T 細胞療法において中心的役割を果たしていますが、実用化されたモデルはトレーニング時に未遭遇エピトープに直面し、沈黙した過剰自信と信頼性の低い優先付けを引き起こすことがよくあります。私たちはこれを解決し、TCR–pMHC 予測を extit{選択的予測} 問題として再定義しました：定式化されたモデルは、信頼性の高い確信スコアを出力するか、明示的に棄却するべきです。具体的には、私たちは (1) 事前トレーニングされたタンパク質言語モデルを使用して、両方の CDR3α/CDR3βおよびペプチドシーケンスをエンコードするデュアルエンコーダーアーキテクチャを提案しました。(2) システム的な確率ミスマルケラを修正するために温度スケーリングを適用しました。(3) ユーザー指定の目標エラー率における有限サンプルのカバレッジ保証を提供するコンフォーマル棄却ルールを課しました。3 つの分割戦略（ランダム、エピトープ保留、距離対応）下で評価した際、我々の方法は困難なエピトープ保留プロトコルにおいて AUROC 0.813、ECE 0.043 達成し、未定式化のベースラインと比較して ECE を 69.7％低下させました。80％のカバレッジにおいて、選択的モデルはエラー率を 18.7％から 10.9％に低下させ、定式化された棄却が、実用的なスクリーニング予算に一致した原理的なカバレッジ・リスクのトレードオフを可能であることを示しました。

Original Content

arXiv:2604.13254v1 Announce Type: new Abstract: Predicting T-cell receptor (TCR)--peptide-MHC (pMHC) binding is central to vaccine design and T-cell therapy, yet deployed models frequently encounter epitopes unseen during training, causing silent overconfidence and unreliable prioritization. We address this by framing TCR--pMHC prediction as a \emph{selective prediction} problem: a calibrated model should either output a trustworthy confidence score or explicitly abstain. Concretely, we (1) introduce a dual-encoder architecture encoding both CDR3$\alpha$/CDR3$\beta$ and peptide sequences via a pre-trained protein language model; (2) apply temperature scaling to correct systematic probability miscalibration; and (3) impose a conformal abstention rule that provides finite-sample coverage guarantees at a user-specified target error rate. Evaluated under three split strategies -- random, epitope-held-out, and distance-aware -- our method achieves AUROC 0.813 and ECE 0.043 under the challenging epitope-held-out protocol, reducing ECE by 69.7\% relative to an uncalibrated baseline. At 80\% coverage, the selective model further reduces error rate from 18.7\% to 10.9\%, demonstrating that calibrated abstention enables principled coverage-risk trade-offs aligned with practical screening budgets.