arxiv_cs_lg 2026年4月24日

確証ケースの構造と由来解析のための文脈付与グラフとしての評価

Evaluating Assurance Cases as Text-Attributed Graphs for Structure and Provenance Analysis

Translated: 2026/4/24 20:05:43

assurance-caseprovenance-analysisgraph-neural-networkslink-predictionlarge-language-models

Japanese Translation

arXiv:2604.20577v1 発表タイプ: cross 要約：確証ケースは、システム要件または特性に関する主張を裏付ける構造化された論理文書であり、その根拠によって支えられています。規制された分野では、これらの確証ケースは業界標準のコンプライアンスおよび安全性要件を満たすために不可欠です。本稿では、確証ケースの構造と由来を解析するためのグラフ診断フレームワークを提案します。当社の焦点となる二つの主要なタスクは、(1) リンク予測によって論理要素間の接続を学習し特定する事、(2) グラフ分類によって最先端の大規模言語モデルが作成した確証ケースと人間によって作成されたものを区別してバイアスを検出する事にあります。当社は、ノードとエッジを持つグラフとして表現された、リンク予測と由来解析の両方をサポートする公開データセットを構築しました。実験结果显示、GNN は実際の確証ケースに対して優れたリンク予測性能（ROC-AUC 0.760）を示し、ドメイン間や半-supervised セッティングでも汎用性を示しました。由来検出において、GNN は人間による作成ケースと LLM 生成ケースを効果的に区別しました（F1 0.94）。また、LLM 生成の確証ケースは人間作成のケースとは異なる階層的リンクパターンを持つことを観察しました。さらに、既存の GNN 解釈手法は中等度の忠実性を示し、予測された推論と実際の論理構造の間のギャップを明らかにしました。

Original Content

arXiv:2604.20577v1 Announce Type: cross Abstract: An assurance case is a structured argument document that justifies claims about a system's requirements or properties, which are supported by evidence. In regulated domains, these are crucial for meeting compliance and safety requirements to industry standards. We propose a graph diagnostic framework for analysing the structure and provenance of assurance cases. We focus on two main tasks: (1) link prediction, to learn and identify connections between argument elements, and (2) graph classification, to differentiate between assurance cases created by a state-of-the-art large language model and those created by humans, aiming to detect bias. We compiled a publicly available dataset of assurance cases, represented as graphs with nodes and edges, supporting both link prediction and provenance analysis. Experiments show that graph neural networks (GNNs) achieve strong link prediction performance (ROC-AUC 0.760) on real assurance cases and generalise well across domains and semi-supervised settings. For provenance detection, GNNs effectively distinguish human-authored from LLM-generated cases (F1 0.94). We observed that LLM-generated assurance cases have different hierarchical linking patterns compared to human-authored cases. Furthermore, existing GNN explanation methods show only moderate faithfulness, revealing a gap between predicted reasoning and the true argument structure.