arxiv_cs_lg 2026年2月10日

TerraBind: 粗構造表現を用いた高速かつ高精度な結合親和性予測

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

Translated: 2026/3/15 14:47:48

tunnel-bindingdiffusion-modelsprotein-ligandbinding-affinitydeep-learning

Japanese Translation

arXiv:2602.07735v1 発表タイプ：新要約：私たちは、state-of-the-art 方法と比べて 26 倍もの高速かつ、結合親和性の予測精度を約 20% 向上させるタンパク質リガンド構造および結合親和性予測のための基礎モデル TerraBind を提案します。現在の構造に基づく創薬アプローチは、高価な全原子拡散を使用して 3 次元座標を生成するのを依存しており、これが推論のボトルネックとなり、大規模な化合物スクリーニングを計算的に不可能なものにしてしまいます。われわれは、正確な小分子姿勢と結合親和性の予測にはフルアトム分解精度が不要であるという重要な仮説を、このパラダイムに挑戦します。TerraBind は、COATI-3 マolecularエンコーディングと ESM-2 タンパク質埋め込み（embedding）を組み合わせるマルチモーダルアーキテクチャ内、タンパク質 C$_eta$原子とリガンドヘビー原子のみを含む粗いポケットレベルの表現を用いてこの仮説を検証します。これらは、姿勢生成と結合親和性確率予測のための拡散を伴わない最適化モジュールで使用されます。構造予測ベンチマーク（FoldBench, PoseBusters, Runs N' Poses）において、TerraBind は拡散ベースの基準ラインにおけるリガンド姿勢の精度に合致します。特に重要なのは、TerraBind が公衆ベンチマーク（CASP16）および多様なプロプライエタリデータのセット（18 つの生化学的/細胞実験）において、結合親和性の予測において Boltz-2 を約 20% 凌駕したことです。我々は、親和性予測モジュールが信頼できる化合物の優先付けのための創薬において重要なギャップを埋めるために、良好に校正された親和性不確実性の推定を提供することを示しました。さらに、このモジュールは、シミュレーションされた創薬サイクルにおいて貪欲に基づくアプローチを超えて選定された分子の親和性改善を 6 倍達成する、継続学習フレームワークとヘッジされたバッチ選択戦略を可能にしました。

Original Content

arXiv:2602.07735v1 Announce Type: new Abstract: We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction that achieves 26-fold faster inference than state-of-the-art methods while improving affinity prediction accuracy by $\sim$20\%. Current deep learning approaches to structure-based drug design rely on expensive all-atom diffusion to generate 3D coordinates, creating inference bottlenecks that render large-scale compound screening computationally intractable. We challenge this paradigm with a critical hypothesis: full all-atom resolution is unnecessary for accurate small molecule pose and binding affinity prediction. TerraBind tests this hypothesis through a coarse pocket-level representation (protein C$_\beta$ atoms and ligand heavy atoms only) within a multimodal architecture combining COATI-3 molecular encodings and ESM-2 protein embeddings that learns rich structural representations, which are used in a diffusion-free optimization module for pose generation and a binding affinity likelihood prediction module. On structure prediction benchmarks (FoldBench, PoseBusters, Runs N' Poses), TerraBind matches diffusion-based baselines in ligand pose accuracy. Crucially, TerraBind outperforms Boltz-2 by $\sim$20\% in Pearson correlation for binding affinity prediction on both a public benchmark (CASP16) and a diverse proprietary dataset (18 biochemical/cell assays). We show that the affinity prediction module also provides well-calibrated affinity uncertainty estimates, addressing a critical gap in reliable compound prioritization for drug discovery. Furthermore, this module enables a continual learning framework and a hedged batch selection strategy that, in simulated drug discovery cycles, achieves 6$\times$ greater affinity improvement of selected molecules over greedy-based approaches.