arxiv_cs_ai 2026年4月20日

検索が失敗する時：Web 拡張型大規模言語モデルのレッドチーム

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

Translated: 2026/4/20 11:17:57

large-language-modelsred-teamingweb-searchsafety-alignmentadversarial-attacks

Japanese Translation

arXiv:2510.09689v3 Announce Type: replace-cross 要約：大規模言語モデル（LLM）は、オープンインターネットから最新の情報にアクセスすることで、静的な知識の限界を克服するためにウェブ検索と統合されています。この統合はモデルの能力向上をもたらす一方で、単独で動作する大規模言語モデル（LLM）に主に焦点が当てられている既存のレッドチーム手法では、複雑な検索ワークフローから生じるリスクを無視しているため、明確な安全性の脅威表面を導き出します。このギャップに対処するため、私たちはウェブ検索付きの LLM 用の革新的なレッドチーム・フレームワーク「CREST-Search」を提案します。CREST-Search の核となるのは、一見無害な検索クエリを生成して無効な引用を誘発させる 3 つの新しい攻撃戦略です。また、ブラックボックス制約下での敵対的効果向上を強化するために、反復的なコンテキスト・リファインメント機構を採用しています。さらに、検索固有の有害データセット「WebSearch-Harm」を構築し、クエリ品質を向上させるための専門的なレッドチームモデルの微調整を可能にしました。我々の実験は、CREST-Search が安全性フィルターを効果的にバイパスでき、ウェブ検索に基づく LLM システムの脆弱性を系統的に暴露できることを示しており、頑健な検索モデルの開発の必要性を強調しています。

Original Content

arXiv:2510.09689v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date information from the open Internet. While this integration enhances model capability, it also introduces a distinct safety threat surface: the retrieval and citation process has the potential risk of exposing users to harmful or low-credibility web content. Existing red-teaming methods are largely designed for standalone LLMs as they primarily focus on unsafe generation, ignoring risks emerging from the complex search workflow. To address this gap, we propose CREST-Search, a pioneering red-teaming framework for LLMs with web search. The cornerstone of CREST-Search is three novel attack strategies that generate seemingly benign search queries yet induce unsafe citations. It also employs an iterative in-context refinement mechanism to strengthen adversarial effectiveness under black-box constraints. In addition, we construct a search-specific harmful dataset, WebSearch-Harm, which enables fine-tuning a specialized red-teaming model to improve query quality. Our experiments demonstrate that CREST-Search can effectively bypass safety filters and systematically expose vulnerabilities in web search-based LLM systems, underscoring the necessity of the development of robust search models.