arxiv_cs_cv 2026年4月20日

Vision-Language モデルにおけるプロンプト誘発型誤認のメカニズム

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Translated: 2026/4/20 10:50:43

vision-language-modelshallucinationattention-mechanismsprompt-engineeringlarge-language-models

Japanese Translation

Abstract: 大型ビジョン・ランゲージモデル（VLM）は高度な能力を持っていますが、テキストプロンプトを視覚的証拠よりも優先することで誤認することがあります。我々は、プロンプトが画像内の物体数を過大評価する設定（例：画像に 3 つしかない水を咲く花を 4 つあると指示する）を制御された環境で、この失敗モードを検討しました。低い物体数の場合、モデルは過大評価を修正しますが、物体数が増えるとプロンプトに合わせて修正を放棄し始めます。3 つの VLM に関する機構的解析を通じて、少なくとも 40% のプロンプト誘発型誤認（PIH）を減らすのに十分であることがわかった少数の注意ヘッドを特定しました。追加のトレーニングなしに、これらの PIH ヘッドの除却は修正効果を実証しました。モデル間で PIH ヘッドはプロンプトのコピーをモデル固有の方式で調整しており、我々はこれらの差異を定量化し、PIH の除却が視覚的証拠に対する修正を高めることを示しました。我々の発見はプロンプト誘発型誤認を駆動する内部メカニズムへの洞察を提供し、これらの振る舞いがどのように実装されているかに関するモデル固有の違いを明らかにしました。

Original Content

arXiv:2601.05201v2 Announce Type: replace Abstract: Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence. Our findings offer insights into the internal mechanisms driving prompt-induced hallucinations, revealing model-specific differences in how these behaviors are implemented.