arxiv_cs_ai 2026年2月10日

機能から行動へ：伝統的とアグレニシスのAIシステムに対する解釈性

From Features to Actions: Explainability in Traditional and Agentic AI Systems

Translated: 2026/2/14 7:07:38

Japanese Translation

近年、持続可能な予測に基づく説明方法に焦点を当てた説明可能AIは主に個々のモデル予測の解釈を行うことへ焦点を当け来ています。大量の言語モデル（LLM）の発展により、多段階の流れから行動が見出されるアグレネックス的なAIシステムは次第に実装されています。これらの状況では、成功・失敗は一貫する決定においてではなく、序列化された選択に基づいて解釈されます。そのような場面では役立っていますが、説明方法の設計が直近の固定された予測からどのように適用されるのかは不明です。この研究ではアト tributionベースの解釬を比較し、traceベースの検査により、両方の状況でギャップを埋めることに成功しました。説明可能な性質の特性ランキングについては、静的な設定（Spearman $ ho = 0.86$）では、定性的な結果に対しても安定するという観察がなされました。一方、アト tribution方法はエクス ecutionレベルの失敗を診断することができず、trace 基準の評価方式により、行為の崩壊と状態追跡の不整合は失敗確率を2.7倍に減しましたことで結果的に特定され、これにより成功確率が49 ext百分点低下することが明らかとなりました。これらの結果から、自律的なAI行動に対する評価および診断には、行程レベルの説明が必要となることが示されました。これらの資源も日本語で用意されています: https://github.com/VectorInstitute/unified-xai-evaluation-framework https://vectorinstitute.github.io/unified-xia-evaluation-framework

Original Content

arXiv:2602.06841v2 Announce Type: replace Abstract: Over the last decade, explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories. In these settings, success and failure are determined by sequences of decisions rather than a single output. While useful, it remains unclear how explanation approaches designed for static predictions translate to agentic settings where behaviour emerges over time. In this work, we bridge the gap between static and agentic explainability by comparing attribution-based explanations with trace-based diagnostics across both settings. To make this distinction explicit, we empirically compare attribution-based explanations used in static classification tasks with trace-based diagnostics used in agentic benchmarks (TAU-bench Airline and AssistantBench). Our results show that while attribution methods achieve stable feature rankings in static settings (Spearman $\rho = 0.86$), they cannot be applied reliably to diagnose execution-level failures in agentic trajectories. In contrast, trace-grounded rubric evaluation for agentic settings consistently localizes behaviour breakdowns and reveals that state tracking inconsistency is 2.7$\times$ more prevalent in failed runs and reduces success probability by 49\%. These findings motivate a shift towards trajectory-level explainability for agentic systems when evaluating and diagnosing autonomous AI behaviour. Resources: https://github.com/VectorInstitute/unified-xai-evaluation-framework https://vectorinstitute.github.io/unified-xai-evaluation-framework