arxiv_cs_ai 2026年4月20日

TPA: RAG の嘘発見のための次トークン確率帰属

TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG

Translated: 2026/4/20 11:18:17

raghallucinationllmtoken-probabilityattribution-analysis

Japanese Translation

arXiv:2512.07515v4 Announce Type: replace-cross 要約: リトリバル・アンゲスト・ジェネレーション（RAG）における嘘の検出は依然として課題です。既往の研究は、嘘を内部に保持された知識（FFN）とリトリバウンドコンテキストとの二項的衝突に帰属してきました。しかし、この視点のみでは不十分で、LLM の他の構成要素、例えばユーザークエリ、以前生成されたトークン、セルフトークン、そして最終 LayerNorm 調整の影響を考慮していません。これらの構成要素が嘘の検出に与える影響を包括的に捉えるために、TPA（次トークン確率帰属）を提案します。TPA は、次トークンの確率を 7 つの異なる源に数学的に帰属します：クエリ、RAG コンテキスト、過去のトークン、セルフトークン、FFN、最終 LayerNorm、および初期エンベディング。この帰属は、各源が次トークンの生成にどのように寄与するかを定量化します。具体的には、私達はこれらの帰属スコアを名詞詞（POS）タグで集計し、各モデルコンポーネントがレスポンス内の特定の言語カテゴリの生成にどのように寄与するかを定量化します。これらのパターンを活用して、名詞が LayerNorm に大きく依存しているような例外を検出するなど、TPA は効率的に嘘のレスポンスを識別します。大規模な実験により、TPA が最優秀性能を達成したことが示されました。

Original Content

arXiv:2512.07515v4 Announce Type: replace-cross Abstract: Detecting hallucinations in Retrieval-Augmented Generation remains a challenge. Prior approaches attribute hallucinations to a binary conflict between internal knowledge stored in FFNs and the retrieved context. However, this perspective is incomplete, failing to account for the impact of other components of the LLM, such as the user query, previously generated tokens, the self token, and the final LayerNorm adjustment. To comprehensively capture the impact of these components on hallucination detection, we propose TPA which mathematically attributes each token's probability to seven distinct sources: Query, RAG Context, Past Token, Self Token, FFN, Final LayerNorm, and Initial Embedding. This attribution quantifies how each source contributes to the generation of the next token. Specifically, we aggregate these attribution scores by Part-of-Speech (POS) tags to quantify the contribution of each model component to the generation of specific linguistic categories within a response. By leveraging these patterns, such as detecting anomalies where Nouns rely heavily on LayerNorm, TPA effectively identifies hallucinated responses. Extensive experiments show that TPA achieves state-of-the-art performance.