arxiv_cs_ai 2026年2月10日

アシスタントから二重に-Agent：形式化とベンチマークの攻撃をOpenClawに強化する

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Translated: 2026/3/7 10:10:42

securitypersonalized-agentai-securitybenchmark-analysis

Japanese Translation

広範囲な言語モデル (LLM) を基盤とするAgentは、例として OpenClawになると、複雑な実世界タスクを解決するためにタスク指向システムから personalized AI アシスタントへと変化し始めています。しかし、オープンソースのAIアシスタントへの実世実の展開では、強固なセキュリティリスクも顕在化し始めています。しかしながら、個人的アジェンダに対する個人的理解のセキュリティ研究や検証体制は既に存在するための設定が主に合成的なものやタスク中心的な状況を対象としていて、現実世界での展開が具体的な攻撃面とリスク伝播メカニズムを確実に捉えることができていないことが問題となっています。そこで、我々は個人的アジェンダに対するセキュリティ評価プロトコルとしての Personalized Agent Security Benchmark (PASB) の提案を行い、リアルな現状へ対応している personalized アジェンダへのセキュリティ評価に特別にカスタマイズされています。開発者攻撃モデルに基づいたアジェンダ攻撃プロトコルを組み入れた上で、実際のツールチェーンと長期間の反復的な協力を反映させていますので、現実世界的なシステムに対してブラックボックス的なセキュリティ評価も可能となります。OpenClaw を代表例として開発において我々は各種 personalized 使用状況の安全性能についてシステム的にお客様の理解に対してセキュリティ評価を行いました。これらでの結果は OpenClaw では、使用したタスクのユーザーへの提示、使用するツール、そしてメモリからの取得でそれぞれに危険性が存在しているためであると結論付けられました。PASB プロトコルのオープンソースなコードがあるのは、https://github.com/AstorYH/PASB です。

Original Content

arXiv:2602.08412v1 Announce Type: new Abstract: Although large language model (LLM)-based agents, exemplified by OpenClaw, are increasingly evolving from task-oriented systems into personalized AI assistants for solving complex real-world tasks, their practical deployment also introduces severe security risks. However, existing agent security research and evaluation frameworks primarily focus on synthetic or task-centric settings, and thus fail to accurately capture the attack surface and risk propagation mechanisms of personalized agents in real-world deployments. To address this gap, we propose Personalized Agent Security Bench (PASB), an end-to-end security evaluation framework tailored for real-world personalized agents. Building upon existing agent attack paradigms, PASB incorporates personalized usage scenarios, realistic toolchains, and long-horizon interactions, enabling black-box, end-to-end security evaluation on real systems. Using OpenClaw as a representative case study, we systematically evaluate its security across multiple personalized scenarios, tool capabilities, and attack types. Our results indicate that OpenClaw exhibits critical vulnerabilities at different execution stages, including user prompt processing, tool usage, and memory retrieval, highlighting substantial security risks in personalized agent deployments. The code for the proposed PASB framework is available at https://github.com/AstorYH/PASB.