arxiv_cs_cv 2026年4月24日

テキスト・トゥ・イメージ拡散モデルにおける射影勾配忘却：概念再生攻撃への防御

Projected Gradient Unlearning for Text-to-Image Diffusion Models: Defending Against Concept Revival Attacks

Translated: 2026/4/24 19:40:45

diffusion-modelsmachine-unlearningconcept-revivalgradient-unlearningdeep-learning-security

Japanese Translation

arXiv:2604.21041v1 Announce Type: new Abstract: テキスト・トゥ・イメージ拡散モデルにおける機械的忘却は、事前学習済みモデルから望ましくない概念を選択的に除去し、高価な再学習を伴わないことを目的とする。現在の忘却方法は共通の欠陥を持っており、モデルをダウンストリームデータで微調整（ファインチューニング）すると、消去された概念が再出現してしまう。これは、対象データと完全に無関係な場合でも同様である。我々は分類分野から射影勾配忘却（PGU: Projected Gradient Unlearning）を採用し、拡散領域への応用を後付けの強化手順として構築した。維持対象概念のアクティベーションから核勾配空間（CGS: Core Gradient Space）を構築し、勾配更新をその直交補空間に投影することで、PGU はその後の微調整で達成された消去が撤回されることを防止する。既存の方法（ESD, UCE, Receler）の上に PGU を適用すると、スタイル概念における再出現は解消され、オブジェクト概念における再出現は顕著に遅延する。計算時間はメタ忘却（Meta-Unlearning）の約 2 時間に対し、約 6 分である。PGU とメタ忘却は補完的であり、どちらが優れているかは概念のエンコード方法に依存する。また、維持対象概念の選択は、意味の分類グループに基づくものではなく、視覚的特性の類似性を基準とするべきである。

Original Content

arXiv:2604.21041v1 Announce Type: new Abstract: Machine unlearning for text-to-image diffusion models aims to selectively remove undesirable concepts from pre-trained models without costly retraining. Current unlearning methods share a common weakness: erased concepts return when the model is fine-tuned on downstream data, even when that data is entirely unrelated. We adapt Projected Gradient Unlearning (PGU) from classification to the diffusion domain as a post-hoc hardening step. By constructing a Core Gradient Space (CGS) from the retain concept activations and projecting gradient updates into its orthogonal complement, PGU ensures that subsequent fine-tuning cannot undo the achieved erasure. Applied on top of existing methods (ESD, UCE, Receler), the approach eliminates revival for style concepts and substantially delays it for object concepts, running in roughly 6 minutes versus the ~2 hours required by Meta-Unlearning. PGU and Meta-Unlearning turn out to be complementary: which performs better depends on how the concept is encoded, and retain concept selection should follow visual feature similarity rather than semantic grouping.