arxiv_cs_cv 2026年2月10日

DICE: Diffusion モデルにおけるコントラスト性部分分解を用いた作家スタイルとコンテンツの分離

DICE: Disentangling Artist Style from Content via Contrastive Subspace Decomposition in Diffusion Models

Translated: 2026/3/15 19:04:17

diffusion-modelsstyle-mimicrycopyright-protectioncontrastive-learningdisentanglement

Japanese Translation

論文: arXiv:2602.08059v1 発表タイプ: new 要約: 最近のディフュージョンモデルの普及により、スタイルの模倣が容易になり、許可なく独自の芸術スタイルを模倣できるようになりました。デプロイされたプラットフォームでは、これは著作権および知的財産のリスクを高め、信頼性の高い保護を求めています。しかし、既存の対策は、新しいスタイルが登場する際に高価な重み編集が必要だったり、明示的に指定された編集スタイルに頼っていたりで、デプロイ時の安全性の実用性を制限しています。この課題に対処するために、当社は DICE（作家スタイルとコンテンツの分離にコントラスト性部分分解を用いた）を提案します。これは、即時的な作家スタイルのエラーを可能にするトレーニングフリーのフレームワークです。明示的に指定される置換スタイルを必要とするスタイル編集とは異なり、DICE はスタイルの純化を行います。つまり、アーティストの特徴を除去しつつ、ユーザーが意図したコンテンツを保持します。我々の核心的な洞察は、モデルが単一のテキストや画像から真に作家スタイルを理解できないという点です。したがって、我々は孤立したサンプルからのスタイル特定という伝統のパラダイムを棄てます。代わりに、我々はモデルが隠れ空間内でスタイルと非スタイルの機能を見分けることを強制するための対比トリプルを構築します。この分離プロセスを解ける一般化固有値問題として形式化することで、我々はスタイル部分空間の正確な特定を達成しました。さらに、我々は各トークンのスタイル集中度を動的に評価し、QKV ベクトルに対して差異抑制とコンテンツ強化を行うアダプティブアテンションデカップリング編集戦略を導入しました。大規模な実験は、DICE がスタイルエラーの徹底性とコンテンツ保全の整合性の間に優れたバランスを達成することを示しています。DICE はスタイルを分離するために追加のオーバーヘッドで 3 秒しかかからず、スタイルの模倣を抑止する実用的かつ効率的な手法を提供します。

Original Content

arXiv:2602.08059v1 Announce Type: new Abstract: The recent proliferation of diffusion models has made style mimicry effortless, enabling users to imitate unique artistic styles without authorization. In deployed platforms, this raises copyright and intellectual-property risks and calls for reliable protection. However, existing countermeasures either require costly weight editing as new styles emerge or rely on an explicitly specified editing style, limiting their practicality for deployment-side safety. To address this challenge, we propose DICE (Disentanglement of artist Style from Content via Contrastive Subspace Decomposition), a training-free framework for on-the-fly artist style erasure. Unlike style editing that require an explicitly specified replacement style, DICE performs style purification, removing the artist's characteristics while preserving the user-intended content. Our core insight is that a model cannot truly comprehend the artist style from a single text or image alone. Consequently, we abandon the traditional paradigm of identifying style from isolated samples. Instead, we construct contrastive triplets to compel the model to distinguish between style and non-style features in the latent space. By formalizing this disentanglement process as a solvable generalized eigenvalue problem, we achieve precise identification of the style subspace. Furthermore, we introduce an Adaptive Attention Decoupling Editing strategy dynamically assesses the style concentration of each token and performs differential suppression and content enhancement on the QKV vectors. Extensive experiments demonstrate that DICE achieves a superior balance between the thoroughness of style erasure and the preservation of content integrity. DICE introduces an additional overhead of only 3 seconds to disentangle style, providing a practical and efficient technique for curbing style mimicry.