arxiv_cs_ai 2026年2月10日

Scdfm：Distributional Flow Matching モデルに対する統合生物学におけるシングル・セルの変化予測

scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction

Translated: 2026/3/7 11:39:02

deep-learningcell-biologydrug-perturbationdistribution-level

Japanese Translation

システム生物と薬物開発に於ける中央目標は、細胞に対する変化を予測することです。それを難しい理由は、シングル・セルのメッシュ全体がノイズや限られている部分があるという点と、変化は個々の細胞ではなく_population_のレベルで振替されるからです。現在の深層学習手法通常は個体の対応付けを仮定し、そのような全量物質上の影響を捉える能力が低いです。我々はscdfmを開発しましたこれは、条件付きフロー一致に基づく生成モデルで完全な事前条件下制約しているperturbated細胞の分布をモデル化します。最大平均差配数(MMD) オブジェクトを使用することで我們的方法はcellレベル上の対応関係からの外も対応付けを追従しています。パーソネル・アワード・ダイファスティンガー(PAD-Transformer)の導入ですgene交互作用グラフと分相的な注目が特効性の表現変化を捕捉し、背景特定な状況で機能します。多種多様なゲノミックと薬物による変化に対するbenchmarkにおいてscdfmは現時点での前の方法よりも常に優れています。見せられていないもしくは、組み合わせ設定における平均二乗誤差が基準の最も強力なベーシックスキーで19. 6パーセント減少します。これらの結果から、生成建模に特有の事化モデルに対するin silicoの変更予測への重要性を示しています。 https://github.com/AI4Science-WestlakeU/scdfm を参照してコードが利用可能です。

Original Content

arXiv:2602.07103v1 Announce Type: cross Abstract: A central goal in systems biology and drug discovery is to predict the transcriptional response of cells to perturbations. This task is challenging due to the noisy and sparse nature of single-cell measurements, as well as the fact that perturbations often induce population-level shifts rather than changes in individual cells. Existing deep learning methods typically assume cell-level correspondences, limiting their ability to capture such global effects. We present scDFM, a generative framework based on conditional flow matching that models the full distribution of perturbed cells conditioned on control states. By incorporating a maximum mean discrepancy (MMD) objective, our method aligns perturbed and control populations beyond cell-level correspondences. To further improve robustness to sparsity and noise, we introduce the Perturbation-Aware Differential Transformer (PAD-Transformer), a backbone architecture that leverages gene interaction graphs and differential attention to capture context-specific expression changes. Across multiple genetic and drug perturbation benchmarks, scDFM consistently outperforms prior methods, demonstrating strong generalization in both unseen and combinatorial settings. In the combinatorial setting, it reduces mean squared error by 19.6% relative to the strongest baseline. These results highlight the importance of distribution-level generative modeling for robust in silico perturbation prediction. The code is available at https://github.com/AI4Science-WestlakeU/scDFM