arxiv_cs_cv 2026年4月24日

AttDiff-GAN: 混合ディフュージョン-GAN フレームワークを用いた顔属性編集

AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing

Translated: 2026/4/24 19:42:14

attdiff-ganfacial-attribute-editinggandiffusion-modelface-recognition

Japanese Translation

arXiv:2604.21289v1 Announce Type: new 摘要：顔属性編集は、ターゲット属性のみを変更し、属性に関連のないコンテンツや全体の画像忠実度を維持する目的で行われます。既存の GAN ベース手法は制御性が優れているものの、スタイルコードと属性意味の間の整合性が弱く、課題を抱えています。一方、ディフューションベース手法は高实写性の画像生成が可能ですが、異なる属性間の意味方向の絡み合いにより編集精度に限界があります。本稿では、GAN ベースの属性操作とディフューションベースの画像生成を組み合わせた混合フレームワークである AttDiff-GAN を提案します。この統合における主要な課題は、一歩の対抗学習と多段階のディフューション除雑間の不一致があり、効果的な最適化が困難である点です。この問題を解決するために、属性編集を画像合成から分離し、明示的な属性操作を学習するために特徴レベルの対抗学習スキームを導入し、修正された特徴を用いて画像生成のディフューションプロセスを導きながら、意味方向ベースの編集への依存を除去しました。さらに、スタイル属性の整合性を向上させるために、顔を事前知識に組み込む PriorMapper と、Transformer を使用してグローバルな意味関係を捉えより精密なスタイル抽出を行う RefineExtractor を導入しました。CelebA-HQ における実験結果から、提案手法は定性的・定量的評価において最先端手法よりも正確な顔属性編集と、ターゲット以外の属性のより良い保持を実現したことが示されました。

Original Content

arXiv:2604.21289v1 Announce Type: new Abstract: Facial attribute editing aims to modify target attributes while preserving attribute-irrelevant content and overall image fidelity. Existing GAN-based methods provide favorable controllability, but often suffer from weak alignment between style codes and attribute semantics. Diffusion-based methods can synthesize highly realistic images; however, their editing precision is limited by the entanglement of semantic directions among different attributes. In this paper, we propose AttDiff-GAN, a hybrid framework that combines GAN-based attribute manipulation with diffusion-based image generation. A key challenge in such integration lies in the inconsistency between one-step adversarial learning and multi-step diffusion denoising, which makes effective optimization difficult. To address this issue, we decouple attribute editing from image synthesis by introducing a feature-level adversarial learning scheme to learn explicit attribute manipulation, and then using the manipulated features to guide the diffusion process for image generation, while also removing the reliance on semantic direction-based editing. Moreover, we enhance style-attribute alignment by introducing PriorMapper, which incorporates facial priors into style generation, and RefineExtractor, which captures global semantic relationships through a Transformer for more precise style extraction. Experimental results on CelebA-HQ show that the proposed method achieves more accurate facial attribute editing and better preservation of non-target attributes than state-of-the-art methods in both qualitative and quantitative evaluations.