arxiv_cs_cv 2026年4月24日

MaskDiME: 正確で効率的な可視的な反事象説明のための適応的マスク拡散モデル

MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Translated: 2026/4/24 19:51:32

maskdimediffusion-modelscounterfactual-explanationsvisual-explainabilityneural-networks

Japanese Translation

arXiv:2602.18792v3 Announce Type: replace Abstract: 可視的な反事象説明は、モデルの予測を反転させるための最小限の構文変更を明らかにし、深層学習への因果的な、かつ解釈可能な洞察を提供することを目的としている。しかし、既存の拡散ベースの反事象生成手法は、一般的に計算コストが高く、サンプリングに遅く、修正された領域を局所的に正確に特定できないという課題を抱えている。これらの制限を克服するために、私たちは、局所サンプリングを通じて構文的整合性と空間精度を統合する、シンプルで高速かつ効果的な拡散枠組みである MaskDiME を提案した。私たちの手法は、意思決定に関連する領域に適応的に焦点を当て、局所的かつ構文整合的な反事象生成を実現すると同時に、高画質を維持する。トレーニングフリーのフレームワークである MaskDiME は、基線と比べて 30 倍高速で推論を実行し、5 つのベンチマークデータセット（多様な可視的ドメインにわたる）で同水準または最前線の性能を実現し、効率的な反事象説明のための実用的で汎用的な解決策を確立した。

Original Content

arXiv:2602.18792v3 Announce Type: replace Abstract: Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, yet effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, performs inference over 30x faster than the baseline and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.