arxiv_cs_ai 2026年4月24日

重要な要素に焦点を当てて：脆弱性検出のためのフィッシャー導致的適応型マルチモーダル融合

Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

Translated: 2026/4/24 20:33:50

multimodal-fusionsoftware-securityneural-networkscode-analysismachine-learning

Japanese Translation

arXiv:2601.02438v3 Announce Type: replace-cross Abstract: ソフトウェア脆弱性の検出は、与えられたコードスニペットがセキュリティの欠陥を含むかを判定する二値分類問題として定式化できます。既存のマルチモーダル手法は、事前学習されたモデルで抽出された自然言語コードシーケンス（NCS）表現と、グラフニューラルネットワークで抽出されたコードプロパティグラフ（CPG）表現を融合する典型的な構成を採用しており、追加のモーダルを導入すれば必ず情報增益が生ずると仮定しています。経験的分析を通じて、この仮定の限界を示唆しました。事前学習されたモデルはすでに大量の構造化情報を暗にコードリングしており、2 つのモーダル間で強い相関が生じ、またグラフエンコーダーは一般的に特徴抽出において事前学習された言語モデルよりも効果的ではないことがわかったためです。その結果、単純な融合は補完的な信号を得るのに困難であり、ノイズ伝搬のために有効な識別子を薄めてしまう可能性があります。これらの課題に対処するため、我々はタスク条件付けされた補完融合戦略を提案しました。この戦略はフィッシャー情報を活用してタスクの関連性を量化し、クロスモーダル相互作用をフルスペクトルの一致から、タスク感受性のサブ空間内での選択的融合へと変換します。私の理論分析は、等方的擾乱の仮定の下、この戦略が出力誤差の上界を顕著に絞り込むことを示しています。この洞察に基づき、オンラインの低ランクフィッシャーサブ空間推定と適応ゲート機構を組み合わせた TaCCS-DFA フレームワークを設計しました。BigVul、Devign、ReVeal のベンチマークにおける実験結果は、推論遅延を 3.4% 増加させた場合でも、F1 スコアにおいて最大 6.3 ポイントの增益を得ながら、低いカルリブレーション誤差を維持することを示しています。

Original Content

arXiv:2601.02438v3 Announce Type: replace-cross Abstract: Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security defects. Existing multimodal methods typically fuse Natural Code Sequence (NCS) representations extracted by pretrained models with Code Property Graph (CPG) representations extracted by graph neural networks, under the implicit assumption that introducing an additional modality necessarily yields information gain. Through empirical analysis, we demonstrate the limitations of this assumption: pretrained models already encode substantial structural information implicitly, leading to strong overlap between the two modalities; moreover, graph encoders are generally less effective than pretrained language models in feature extraction. As a result, naive fusion not only struggles to obtain complementary signals but can also dilute effective discriminative cues due to noise propagation. To address these challenges, we propose a task-conditioned complementary fusion strategy that uses Fisher information to quantify task relevance, transforming cross-modal interaction from full-spectrum matching into selective fusion within a task-sensitive subspace. Our theoretical analysis shows that, under an isotropic perturbation assumption, this strategy significantly tightens the upper bound on the output error. Based on this insight, we design the TaCCS-DFA framework, which combines online low-rank Fisher subspace estimation with an adaptive gating mechanism to enable efficient task-oriented fusion. Experiments on the BigVul, Devign, and ReVeal benchmarks demonstrate that TaCCS-DFA delivers up to a 6.3-point gain in F1 score with only a 3.4% increase in inference latency, while maintaining low calibration error.