arxiv_cs_ai 2026年4月24日

Transient Turn Injection: Stateless Multi-Turn Vulnerabilities in Large Language Models への暴露

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

Translated: 2026/4/24 20:29:01

llmadversarial-aiprompt-injectionsecurity-vulnerabilitymulti-turn-chatbot

Japanese Translation

arXiv:2604.21860v1 Announce Type: cross 要旨：大型言語モデル（LLM）は現在、敏感なワークフローに徐々に統合され、攻撃的堅牢性と安全性のリスクが高まっています。本稿では、隔離された相互作用に攻撃的意図を分配することで、ステートレスなモデレーションを系統的に exploite し、マルチターン攻撃技法である Transient Turn Injection（TTI）を導入します。TTI は、大型言語モデルを駆使した自動的な攻撃者エージェントを活用して、商業ベースの LLM とオープンソース LLM の双方においてポリシー執行的回避を反復的にテスト・回避を行います。従来のジョークブレイク手法が通常、一貫した会話的文脈の維持に依存するのとは対照的に、TTI は新たなアプローチです。私たちの最新の評価では、OpenAI、Anthropic、Google Gemini、Meta、および主要なオープンソースの代替モデルを含め、最先端のモデルを評価した結果、TTI 攻撃への耐性を示す顕著な変化が明らかになりました。一部のアーキテクチャを除き、多くのモデルは本質的な頑健性を欠いています。さらに、当社は自動的なブラックボックス評価フレームワークを通じて、医療および高リスク分野など特に、過去に未知だったモデル固有の脆弱性と攻撃表面のパターンを明らかにしました。我々はさらに確立された攻撃的プロンプティング手法と TTI を比較し、セッションレベルの文脈集約やディープアライメントアプローチなどの実践的な緩和策について詳述します。私たちの研究は、LLM デプロイメントが進化するマルチターン脅威に対抗するために、包括的で文脈感度の高い防御と継続的な攻撃的テストの緊急性を強調しています。

Original Content

arXiv:2604.21860v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across isolated interactions. TTI leverages automated attacker agents powered by large language models to iteratively test and evade policy enforcement in both commercial and open-source LLMs, marking a departure from conventional jailbreak approaches that typically depend on maintaining persistent conversational context. Our extensive evaluation across state-of-the-art models-including those from OpenAI, Anthropic, Google Gemini, Meta, and prominent open-source alternatives-uncovers significant variations in resilience to TTI attacks, with only select architectures exhibiting substantial inherent robustness. Our automated blackbox evaluation framework also uncovers previously unknown model specific vulnerabilities and attack surface patterns, especially within medical and high stakes domains. We further compare TTI against established adversarial prompting methods and detail practical mitigation strategies, such as session level context aggregation and deep alignment approaches. Our study underscores the urgent need for holistic, context aware defenses and continuous adversarial testing to future proof LLM deployments against evolving multi-turn threats.