arxiv_cs_ai 2026年4月20日

LLM コード・オブ・ザグロフを用いた Mamba-SSM による特徴選択：忠実性を意識した生物マーカー発見

Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

Translated: 2026/4/20 11:19:27

mamba-ssmllm-reasoningchain-of-thoughtbiomarker-discoverygradient-saliency

Japanese Translation

arXiv:2604.14334v2 Announce Type: replace-cross 要約：深層学習の配列モデルから得られる勾配感度解析は、効率的に候補の生物マーカーを表面化しますが、得られる遺伝子リストは、後続分類器の性能を低下させる組織組成の混同因子に汚染されている可能性があります。本研究では、LLM のコード・オブ・ザグロフ（CoT）推論がこれらの混同因子をフィルタリングできるか、また推論の品質が後続のパフォーマンスと相関があるかを調査しました。TCGA-BRCA RNA-seq データセットで Mamba SSM を訓練し、勾配感度に基づいてトップ 50 つの遺伝子を抽出しました。DeepSeek-R1 は構造化された CoT を用いて各候補を評価し、最終的な 17 遺伝子のセットを生成しました。保持されたテストスプリットにおいて、生の 50 遺伝子の感度セット（LLM を使用せず）は、5,000 遺伝子の変異量基準（AUC 0.832 vs. 0.903）よりも劣っており、一方、LLM フィルタリングされたセットは（AUC 0.927）、特徴数 294 倍削減によりこれを凌駕しました。忠実性監査（COSMIC CGC, OncoKB, PAM50）により、17 つの選択された遺伝子の 6 つ（35.3%）が検証された BRCA 生物マーカーであることが分かりました。入力中に存在した既知の BRCA 遺伝子の 16 つのうち 10 つは、FOXA1 を含め、見逃されました。後続のパフォーマンスと推論の忠実性の間のこの差異は、この設定における選択的な忠実性を示唆しており、ターゲットとした混同因子の除去は、包括的な見落しなくとも予測性能を向上させることができます。

Original Content

arXiv:2604.14334v2 Announce Type: replace-cross Abstract: Gradient saliency from deep sequence models surfaces candidate biomarkers efficiently, but the resulting gene lists can be contaminated by tissue-composition confounders that degrade downstream classifiers. We study whether LLM chain-of-thought (CoT) reasoning can filter these confounders, and whether reasoning quality is associated with downstream performance. We train a Mamba SSM on TCGA-BRCA RNA-seq and extract the top-50 genes by gradient saliency; DeepSeek-R1 evaluates every candidate with structured CoT to produce a final 17-gene set. On the held-out test split, the raw 50-gene saliency set (no LLM) performs worse than a 5,000-gene variance baseline (AUC 0.832 vs. 0.903), while the LLM-filtered set surpasses it (AUC 0.927), using 294x fewer features. A faithfulness audit (COSMIC CGC, OncoKB, PAM50) shows that 6 of 17 selected genes (35.3%) are validated BRCA biomarkers, while 10 of 16 known BRCA genes present in the input were missed - including FOXA1. This divergence between downstream performance and reasoning faithfulness suggests selective faithfulness in this setting: targeted confounder removal can improve predictive performance without comprehensive recall.