arxiv_cs_cv 2026年2月10日

トランスフォーマーベースモデルにおける敵対的ウォーターマークリングの探求：医療画像に対する転移性と防御メカニズムに対する頑健性

Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images

Open original article

Translated: 2026/3/15 6:02:11

transformeradversarial-watermarkingmedical-imagingvision-transformersdeep-learning-security

Japanese Translation

arXiv:2506.06389v3 Announce Type: replace 摘要: 深層学習モデルは皮膚科画像分析における驚異的な成功を遂げ、自動的皮肤疾患診断の可能性を秘めています。以前、写象ニューラルネットワーク (CNN) に基づいたアーキテクチャは、皮膚画像認識、生成、およびビデオ分析などのコンピュータビジョン (CV) タスクにおいて圧倒的な人気と成功を収めました。しかし、トランスフォーマーベースモデルの出現により、CV タスクは現在、これらのモデルを使用して実行されています。Vision Transformers (ViT) はそのようなトランスフォーマーベースのモデルで、コンピュータビジョンの various タスクにおいて最優先パフォーマンスを示しています。自己注意メカニズムを使用することで実現されています。しかし、そのグローバル注意メカニズムへの依存により、敵対的擾乱に対する感受性を生み出しています。本論文は、ViT が医療画像に対して敵対的ウォーターマークリングに対する感受性を調査することを目的としています。敵対的ウォーターマークは、モデルを欺くために呼ばれる「不感知の擾乱」を加える手法です。Projected Gradient Descent (PGD) を通じて敵対的ウォーターマークを生成し、そのような攻撃の CNN への転移性を調査し、防御メカニズムである敵対的トレーニングのパフォーマンスを分析しました。結果は、クリーン画像の性能が損なわれない一方で、ViT は明らかに敵対的攻撃に対してはるかに脆弱であると示唆しました：誤差率が 27.6% まで低下することがあります。ただし、敵対的トレーニングはそれを 90.0% に上げます。

Original Content

arXiv:2506.06389v3 Announce Type: replace Abstract: Deep learning models have shown remarkable success in dermatological image analysis, offering potential for automated skin disease diagnosis. Previously, convolutional neural network(CNN) based architectures have achieved immense popularity and success in computer vision (CV) based task like skin image recognition, generation and video analysis. But with the emergence of transformer based models, CV tasks are now are nowadays carrying out using these models. Vision Transformers (ViTs) is such a transformer-based models that have shown success in computer vision. It uses self-attention mechanisms to achieve state-of-the-art performance across various tasks. However, their reliance on global attention mechanisms makes them susceptible to adversarial perturbations. This paper aims to investigate the susceptibility of ViTs for medical images to adversarial watermarking-a method that adds so-called imperceptible perturbations in order to fool models. By generating adversarial watermarks through Projected Gradient Descent (PGD), we examine the transferability of such attacks to CNNs and analyze the performance defense mechanism -- adversarial training. Results indicate that while performance is not compromised for clean images, ViTs certainly become much more vulnerable to adversarial attacks: an accuracy drop of as low as 27.6%. Nevertheless, adversarial training raises it up to 90.0%.