arxiv_cs_cv 2026年4月24日

プロンプトが視覚を凌駕する時：LVLM におけるプロンプト誘発的な虚構 HalluScope による調査と解決策

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Translated: 2026/4/24 19:47:20

lvlmhallucinationprompt-engineeringvision-language-modelreinforcement-learning

Japanese Translation

arXiv:2604.21911v1 発表タイプ：新規摘要：大規模視覚言語モデル（LVLM）の能力に対する顕著な進歩にもかかわらず、これらのシステムは、視覚入力を土台としなかった出力である虚構（hallucinations）に対して依然として脆弱です。先行研究では、LVLM における虚構の原因が、視覚バックボーンの制約や言語成分の優位性など様々な要因に起因すると見なされてきましたが、これらの要因の相対的重要性は依然として不明です。この曖昧性を解消するために、我々は異なる要因が虚構を誘発する程度をよりよく理解するためのベンチマークである HalluScope を提案しました。我々の分析は、虚構が主にテキスト先験と背景知識への過度な依存、特にテキスト指示によって導入された情報から生じると示しています。テキスト指示先験によって誘発される虚構を軽減するため、我々はオフザシェルフ LVLM をより視覚的に根付いたレスポンスに向けて微調するためのフレームワークである HalluVL-DPO を提案しました。HalluVL-DPO は、我々が構築した厳選されたトレーニングデータセットを用いた好意度最適化を活用し、モデルを虚構のあるレスポンスよりも根付いたレスポンスを好むように導きます。我々は、最適化されたモデルが対象とする虚構の失敗モードを効果的に軽減しつつ、他の虚構ベンチマークや視覚能力評価における性能を維持または向上させることを示しました。再現性とさらなる研究支援のため、我々は評価ベンチマーク、好意度トレーニングデータセット、およびコードを https://pegah-kh.github.io/projects/prompts-override-vision/ に公開する予定です。

Original Content

arXiv:2604.21911v1 Announce Type: new Abstract: Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the language component, yet the relative importance of these factors remains unclear. To resolve this ambiguity, We propose HalluScope, a benchmark to better understand the extent to which different factors induce hallucinations. Our analysis indicates that hallucinations largely stem from excessive reliance on textual priors and background knowledge, especially information introduced through textual instructions. To mitigate hallucinations induced by textual instruction priors, we propose HalluVL-DPO, a framework for fine-tuning off-the-shelf LVLMs towards more visually grounded responses. HalluVL-DPO leverages preference optimization using a curated training dataset that we construct, guiding the model to prefer grounded responses over hallucinated ones. We demonstrate that our optimized model effectively mitigates the targeted hallucination failure mode, while preserving or improving performance on other hallucination benchmarks and visual capability evaluations. To support reproducibility and further research, we will publicly release our evaluation benchmark, preference training dataset, and code at https://pegah-kh.github.io/projects/prompts-override-vision/ .