arxiv_cs_cv 2026年2月10日

VLM はどのプライバシー属性について合意し、どのように正確に予測するか

Which private attributes do VLMs agree on and predict well?

Translated: 2026/3/15 19:02:51

visual-language-modelsprivacy-attributeszero-shot-evaluationinter-annotator-agreementlarge-scale-image-datasets

Japanese Translation

arXiv:2602.07931v1 Announce Type: new Abstract: 視覚言語モデル（VLM）は、画像内の視覚的属性を検出するためのゼロショット検出に広く使用されています。ここでは、オープンソースの VLM をプライバシー関連属性の認識に向けたゼロショット評価を提示します。VLM が強い合意を示す属性を特定し、人間と VLM の注釈における不一致のケースについても議論します。私たちの結果は、人間による注釈と比較された場合、VLM が人間の注釈者よりもプライバシー属性が存在するとより頻繁に予測することを示しています。さらに、VLM 間で高い合意がある場合、それらは人間による注釈を補完し、人間が見過ごした属性を特定できることが分かりました。これは、VLM の大規模画像データセットのプライバシー注釈をサポートする潜在力を強調しています。

Original Content

arXiv:2602.07931v1 Announce Type: new Abstract: Visual Language Models (VLMs) are often used for zero-shot detection of visual attributes in the image. We present a zero-shot evaluation of open-source VLMs for privacy-related attribute recognition. We identify the attributes for which VLMs exhibit strong inter-annotator agreement, and discuss the disagreement cases of human and VLM annotations. Our results show that when evaluated against human annotations, VLMs tend to predict the presence of privacy attributes more often than human annotators. In addition to this, we find that in cases of high inter-annotator agreement between VLMs, they can complement human annotation by identifying attributes overlooked by human annotators. This highlights the potential of VLMs to support privacy annotations in large-scale image datasets.