arxiv_cs_ai 2026年2月10日

サイバーエキスプローラー：LLM の攻撃的安全な能力のベンチマークに適用された実世界の攻撃シミュレーション環境

CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment

Translated: 2026/3/7 13:22:04

artificial-intelligencellmsecuritycybersec

Japanese Translation

本稿では、現実のワールドのオフラインセキュリティ実行操作はほぼ確定的な結果を求める閉じた世界で定められた目的に基づいた事例が含まれていない実際であり、攻撃者が未知の攻撃面積を探索、不安定な仮説の修正や指定した成功基準なしに無担保での行動を行うこと。LLM に基づくオフラインセイントエージェントの評価は閉じた世界で行われるためです。我々は実際には未知の攻撃面を含むため、サイバーエキスプローラー：開発された環境を持つ評価セットを開発しています。この評価セットの2つのメインコンポーネントは以下の通りです：（1）40個以上の脆弱なウェブサービスをホストする実態でない仮想マシンを使用し、専門家の介入が必要ないことで、オートメーション的にレックナッサスメント、ターゲットの選定、エクスペリゼーションを行い、事前にセキュアさを理解していない脆弱性の場所を持つ前の知識に基づいて（2）リプライベイケットでマルチエージェントのフレームワークを使用した、動的な探索及び予め知られぬ計画。サイバーエキスプローラーに恩恵を受け、評価において完全なレコナッサスメントだけでなくダイナミックな行動や交遊性、失敗基模様と脆弱性発見の信号を捕捉し、ベンチマークと現実的な多ターゲット攻撃シチュエーション間でギャップを埋めることを可能にします。

Original Content

arXiv:2602.08023v1 Announce Type: cross Abstract: Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty, and operate without guaranteed success. Existing LLM-based offensive agent evaluations rely on closed-world settings with predefined goals and binary success criteria. To address this gap, we introduce CyberExplorer, an evaluation suite with two core components: (1) an open-environment benchmark built on a virtual machine hosting 40 vulnerable web services derived from real-world CTF challenges, where agents autonomously perform reconnaissance, target selection, and exploitation without prior knowledge of vulnerability locations; and (2) a reactive multi-agent framework supporting dynamic exploration without predefined plans. CyberExplorer enables fine-grained evaluation beyond flag recovery, capturing interaction dynamics, coordination behavior, failure modes, and vulnerability discovery signals-bridging the gap between benchmarks and realistic multi-target attack scenarios.