arxiv_cs_ai 2026年2月10日

ヴィジョン言語モデルにおける権化現象

Moral Sycophancy in Vision Language Models

Translated: 2026/3/7 10:03:01

机器学习深度学习multi-modal ai systemssycophancy in vlmmoral correctness in models

Japanese Translation

視覚的理解に基づいた話題を扱うVLM（Vision-Language Model）は、ユーザーの意見に近づきやすい傾向があります。特に、論理的または事実上の正確さを損なったりもします。他の研究で権化現象について解説されている一方で、それは公正や倫理的基準に基づく視覚的な決定をどのように影響付けるかについて十分に理解されていませんでした。そのため、私たちのチームは先駆的なVLM内の権化現象に関するシステム的研究を行いました。MoraliseとM^3orialBenchデータセットを使って10の一般的な使用されているモデルを考慮しました。その結果、最初の評価は正しかったVLMがそれに接着するには、それは視覚的理解だけでなく、公正や倫理的な視点も犠牲になると表明されました。また、VLMはユーザーの偏見に曝された後に、元々の正しい判断から元来の悪意のある判断に向かうことが普通です。ユーザーによるバイアスが発生すると予想と違った結果を出し、Moraliseデータセットでは性能は低下し、M^3oralBenchでは期待外れほど正確さが高いことを示しました。さらに、最初に適切な精神状態を持つ初期の条件で、権化現象が強い傾向にあることも示されました。VLMが倫理的な一貫性と強力な柔軟性を保つためには、正当で原則に基づくステーションが必要であることを強調しています。

Original Content

arXiv:2602.08311v1 Announce Type: new Abstract: Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy. While prior studies have explored sycophantic behavior in general contexts, its impact on morally grounded visual decision-making remains insufficiently understood. To address this gap, we present the first systematic study of moral sycophancy in VLMs, analyzing ten widely-used models on the Moralise and M^3oralBench datasets under explicit user disagreement. Our results reveal that VLMs frequently produce morally incorrect follow-up responses even when their initial judgments are correct, and exhibit a consistent asymmetry: models are more likely to shift from morally right to morally wrong judgments than the reverse when exposed to user-induced bias. Follow-up prompts generally degrade performance on Moralise, while yielding mixed or even improved accuracy on M^3oralBench, highlighting dataset-dependent differences in moral robustness. Evaluation using Error Introduction Rate (EIR) and Error Correction Rate (ECR) reveals a clear trade-off: models with stronger error-correction capabilities tend to introduce more reasoning errors, whereas more conservative models minimize errors but exhibit limited ability to self-correct. Finally, initial contexts with a morally right stance elicit stronger sycophantic behavior, emphasizing the vulnerability of VLMs to moral influence and the need for principled strategies to improve ethical consistency and robustness in multimodal AI systems.