arxiv_cs_lg 2026年2月10日

LLM からトレーニングデータの機密性を推定可能か？

Can We Infer Confidential Properties of Training Data from LLMs?

Translated: 2026/3/15 13:02:46

llmproperty-inferencedata-securitymachine-learningadversarial-attacks

Japanese Translation

arXiv:2506.10364v4 Announce Type: replace 摘要：大規模言語モデル（LLM）は、ヘルスケア、金融、法務などの分野のアプリケーションを支援するために、ドメイン固有のデータセット上でより高度に微調整されています。これらの微調整データセットには、患者の人口統計データや疾患の有病率といった機密性の高いデータセット全体の特性が含まれており、これらは意図的に漏洩させられたものではありません。過去の研究では、判別モデル（例えば画像分類モデル）や生成モデル（例えば画像データ用の GAN）に対する特性推定攻撃について研究がなされてきましたが、LLM にも此类の攻撃が適用されるのかは未明です。本稿では、質問応答とチャット完了の 2 つの微調整パラダイムの下で LLM における特性推定を評価するベンチマークタスク PropInfer を導入します。ChatDoctor データセットに基づいた当社のベンチマークでは、多種多様の特性タイプとタスク構成を含めています。さらに、提示文ベースの生成攻撃と、単語頻度シグナルを活用するシャドウモデル攻撃という 2 つの专门化された攻撃を提案しています。複数の事前学習済み LLM を対象とした実証的な評価により、これらの攻撃の成功が確認され、LLM に以前は認識されていない脆弱性が明らかにされました。

Original Content

arXiv:2506.10364v4 Announce Type: replace Abstract: Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties -- such as patient demographics or disease prevalence -- that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e.g., image classification models) and generative models (e.g., GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based generation attack and a shadow-model attack leveraging word frequency signals. Empirical evaluations across multiple pretrained LLMs show the success of our attacks, revealing a previously unrecognized vulnerability in LLMs.