arxiv_cs_lg 2026年2月10日

Bipartite Graph Attention に基づいた大規模 scRNA-seq データ用クラスタリング

Bipartite Graph Attention-based Clustering for Large-scale scRNA-seq Data

Translated: 2026/3/15 14:07:29

scRNA-seqclusteringbipartite-graphtransformerdeep-learning

Japanese Translation

arXiv:2602.07475v1 Announce Type: new 要約：単細胞 RNA 配列解析（scRNA-seq）データにおけるクラスタリングは、同様の遺伝子発現プロファイルを持つ細胞をグループ化するという極めて重要な課題であり、scRNA-seq データ解析の鍵となっています。強力な基礎モデルであるトランフォームは scRNA-seq クラスタリングにも適用されており、その自己注意力機構は同じクラスタ内に属する細胞に対して高い注意力重みづけを自動付与し、クラスタ間の区別を強化します。既存の scRNA-seq クラスタリング手法、例えばグラフトランフォームベースのモデルは、各細胞をシーケンスにおけるトークンとして扱い、計算リソースおよび空間複雑性が細胞数の関数で $\\\mathcal{O}(n^2)$ であり、大規模な scRNA-seq データセットへの適用を制限しています。この課題に対処するために、scRNA-seq データ用 Bipartite Graph Transformer ベースのクラスタリングモデル（BGFormer）を提案します。学習可能なアンカートークンのセットを導入し、これらを共有参照点として全体データセットを表すことに致しました。双部分グラフ注意力機構を導入し、細胞とアンカートークン間の類似性を学習させ、同じクラスに属する細胞を埋め込み空間において互いに近接させます。BGFormer は細胞数に対する線形計算複雑性を実現し、大規模データセットのスケール拡大を可能としました。複数の大規模 scRNA-seq データセットにおける実験結果は、BGFormer の有効性とスケール能力を検証しています。

Original Content

arXiv:2602.07475v1 Announce Type: new Abstract: scRNA-seq clustering is a critical task for analyzing single-cell RNA sequencing (scRNA-seq) data, as it groups cells with similar gene expression profiles. Transformers, as powerful foundational models, have been applied to scRNA-seq clustering. Their self-attention mechanism automatically assigns higher attention weights to cells within the same cluster, enhancing the distinction between clusters. Existing methods for scRNA-seq clustering, such as graph transformer-based models, treat each cell as a token in a sequence. Their computational and space complexities are $\mathcal{O}(n^2)$ with respect to the number of cells, limiting their applicability to large-scale scRNA-seq datasets.To address this challenge, we propose a Bipartite Graph Transformer-based clustering model (BGFormer) for scRNA-seq data. We introduce a set of learnable anchor tokens as shared reference points to represent the entire dataset. A bipartite graph attention mechanism is introduced to learn the similarity between cells and anchor tokens, bringing cells of the same class closer together in the embedding space. BGFormer achieves linear computational complexity with respect to the number of cells, making it scalable to large datasets. Experimental results on multiple large-scale scRNA-seq datasets demonstrate the effectiveness and scalability of BGFormer.