arxiv_cs_cv 2026年2月10日

Face Verification における狭い受容野の利用

Restricted Receptive Fields for Face Verification

Translated: 2026/3/15 14:46:56

face-verificationdeep-neural-networkscomputer-visionmodel-explainabilityreceptive-fields

Japanese Translation

arXiv:2510.10753v2 宣言タイプ：置換要約：深層ニューラルネットワークの意思決定プロセスを理解することは、その振る舞いを解析し、失敗ケースを診断する上で不可欠である。コンピュータビジョンにおいて、解釈可能性を向上させる一般的なアプローチは、事後法を用いて個別のピクセルに重要性を割り当てることである。これらはブラックボックスモデルを説明するために広く使用されているが、信頼性の高い評価指標の欠如により、モデルの実際の推論との整合性が不確かである。この制限は、意思決定プロセスが本質的に解釈可能であることを設計する代替アプローチを促している。为此，我們提出了将全局相似性分解为受限受容野贡献的脸部相似度量。我们的方法定义了两个面部图像之间的相似性为补丁级相似性评分之和，提供了一种基于局部加性解释的方法，而不依赖事后分析。我们表明，所提出的方法在 112x112 面部图像中仅使用 28x28 大小的补丁即可实现具有竞争力的验证性能，并且在 56x56 大小的补丁使用时超越了最先进的方法。

Original Content

arXiv:2510.10753v2 Announce Type: replace Abstract: Understanding how deep neural networks make decisions is crucial for analyzing their behavior and diagnosing failure cases. In computer vision, a common approach to improve interpretability is to assign importance to individual pixels using post-hoc methods. Although they are widely used to explain black-box models, their fidelity to the model's actual reasoning is uncertain due to the lack of reliable evaluation metrics. This limitation motivates an alternative approach, which is to design models whose decision processes are inherently interpretable. To this end, we propose a face similarity metric that breaks down global similarity into contributions from restricted receptive fields. Our method defines the similarity between two face images as the sum of patch-level similarity scores, providing a locally additive explanation without relying on post-hoc analysis. We show that the proposed approach achieves competitive verification performance even with patches as small as 28x28 within 112x112 face images, and surpasses state-of-the-art methods when using 56x56 patches.