arxiv_cs_cv 2026年4月20日

VIB-Probe: Variational Information Bottleneck を用いた視覚言語モデルにおける幻覚の検出と緩和

VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck

Translated: 2026/4/20 10:50:48

vision-language-modelshallucination-detectionvariational-information-bottleneckattention-mechanismmitigation-strategy

Japanese Translation

arXiv:2601.05547v2 Announce Type: replace Abstract: 視覚言語モデル (VLMs) はマルチモーダルタスクにおいて顕著な進展を遂げましたが、生成されたテキストが下敷きの視覚コンテンツから逸脱する「幻覚」に対して感受性が高く残っています。既存の幻覚検出手法は主に出力ロジットや外部検証ツールに頼っており、内部メカニズムを見落としてしまいがちです。本研究では、内部注意力ヘッダーの出力を検討し、特定のヘッダーが誠実な生成の主要な信号を担っていると仮説を提示します。しかし、視覚言語構文とノイズの絡み合いのために、これらの高次元状態を直接調べることは困難です。この課題に対処するために、VIB-Probe という新しい幻覚検出・緩和枠ワークを提案します。これは変分情報ボトルネック (VIB) 理論を利活用し、層間およびヘッダー間の鑑別性のあるパターンを抽出し、情報ボトルネック原理を通じてセマンティックなノイズを除去します。さらに、我々の VIB プロブの勾配を活用することで、幻覚に強い因果的影響を有する注意力ヘッダーを特定し、推論時の介入戦略を導入しました。幅広いベンチマークをまたいで実施された広範な実験结果表明、VIB-Probe は既存の基盤モデルよりも両方の設定において著しく高いパフォーマンスを示しました。当社のコードは公開提供されます。

Original Content

arXiv:2601.05547v2 Announce Type: replace Abstract: Vision-Language Models (VLMs) have demonstrated remarkable progress in multimodal tasks, but remain susceptible to hallucinations, where generated text deviates from the underlying visual content. Existing hallucination detection methods primarily rely on output logits or external verification tools, often overlooking their internal mechanisms. In this work, we investigate the outputs of internal attention heads, postulating that specific heads carry the primary signals for truthful generation.However, directly probing these high-dimensional states is challenging due to the entanglement of visual-linguistic syntax and noise. To address this, we propose VIB-Probe, a novel hallucination detection and mitigation framework leveraging the Variational Information Bottleneck (VIB) theory. Our method extracts discriminative patterns across layers and heads while filtering out semantic nuisances through the information bottleneck principle. Furthermore, by leveraging the gradients of our VIB probe, we identify attention heads with strong causal influence on hallucinations and introduce an inference-time intervention strategy for hallucination mitigation. Extensive experiments across diverse benchmarks demonstrate that VIB-Probe significantly outperforms existing baselines in both settings. Our code will be made publicly available.