arxiv_cs_lg 2026年2月10日

UTOPIA: 非学習可能な表形式データの達成のための結合されていないショートカット埋め込み

UTOPIA: Unlearnable Tabular Data via Decoupled Shortcut Embedding

Translated: 2026/3/15 14:05:27

utopiatable-dataunlearnabilitymachine-learningsafety

Japanese Translation

arXiv:2602.07358v1 発表タイプ：新規要約：非学習可能なサンプル（UE）は、私有の視覚データを許可されていないモデルトレーニングから保護するための実用的なメカニズムとして登場しました。しかし、この保護を表形式データに拡張することはいささか困難です。金融や医療分野の表形式データは極めて機微にすぎますが、既存の UE メソッドは表形式の特性が数値的制約とカテゴリカル制約を混在させ、サリエンシーの希薄さや、少数の次元によって学習が支配されるという特性のために、転移性が悪いのです。スペクトル支配という条件の下では、毒スプেকラムがクリーンなセマンティックスプেকラムを压倒する際に、認定された非学習可能性が実現可能であることを示しました。この指導に基づき、我们提案した UTOPIA（非学習可能な表形式データを達成するための結合されていないショートカット埋め込み）は、特性の冗長性を活用して最適化を 2 つのチャネルに分割します：セマンティックな不透明度のために高サリエンシーの特性、およびハイパー相関したショートカットを埋め込むために低サリエンシーの冗長な特性。これにより、表形式の妥当性を保ったまま、制約に気づいた支配的なショートカットを生み出します。多くの表形式データセットとモデルを跨って実施した実験は、UTOPIA が許可されていないトレーニングをほぼランダムなパフォーマンスに追い込むことを示しており、強力な UE ベースラインを凌駕し、複数のアーキテクチャにわたって良好な転移性を示しました。

Original Content

arXiv:2602.07358v1 Announce Type: new Abstract: Unlearnable examples (UE) have emerged as a practical mechanism to prevent unauthorized model training on private vision data, while extending this protection to tabular data is nontrivial. Tabular data in finance and healthcare is highly sensitive, yet existing UE methods transfer poorly because tabular features mix numerical and categorical constraints and exhibit saliency sparsity, with learning dominated by a few dimensions. Under a Spectral Dominance condition, we show certified unlearnability is feasible when the poison spectrum overwhelms the clean semantic spectrum. Guided by this, we propose Unlearnable Tabular Data via DecOuPled Shortcut EmbeddIng (UTOPIA), which exploits feature redundancy to decouple optimization into two channels: high saliency features for semantic obfuscation and low saliency redundant features for embedding a hyper correlated shortcut, yielding constraint-aware dominant shortcuts while preserving tabular validity. Extensive experiments across tabular datasets and models show UTOPIA drives unauthorized training toward near random performance, outperforming strong UE baselines and transferring well across architectures.