arxiv_cs_ai 2026年2月10日

CAM: Multi-Agent コード生成システムにおける因果性ベースの解析フレームワーク

CAM: A Causality-based Analysis Framework for Multi-Agent Code Generation Systems

Translated: 2026/2/14 8:17:00

Japanese Translation

マルチ・アジェントコード生成システム（MACGS）は素晴らしい結果を出しています。しかし、マルチ・アジェントアーキテクチャ固有の複雑さは多くの中間出力が生成されます。これらの中間出力に対する個別重要性は不明であり、これによりMACGSデザインにおけるターゲット優先的に最適化することは困難です。そのため我々はCAMを提案しました。最初のマルチ・アジェントコード生成システム（MACGS)と因果関係に基づく解析フレームワーク CAMです。これは実装が行われるための中間オウムからの貢献度を計測しますが、これがマルチ・アジェントコード生成システム（MACGS）の正しさに影響を与える。したがって、その中から複数の種類の中間出力を具体的な区分けし、中間オウムの現実的なエラーユニットに対してシステム化された検証をしますにより、システムの正しさにとって重要な要素を見つけ出し、それらの貢献度ランキングを集約できます。私たちには多くの実装解析が行われるでしょう。その結果から興味深い発見を行います。最初に我々は状況に依存する要素（エントリ点を引きずり込む要素）は、MACGS品質管理においてまた他要素の役割を明らかにするのです。第二に我々は混合バックエンド・MACGSである異なるバックエンド LLMと関連性強さで配置されると7.2% Pass@1の改善が観察されることになります。これは、未来のMACGS設計における promising方向性だと私たちには示されます。それだけでなく、CAMが現実のユーザにとって利便性があることに対して我々は2つのアプリケーションを示します:（1）Failure修理で最高に3つの重要ポイントから修復するための73.3％成功率。（2）特徴の処分と最も消費される中間トークンを66.8％も減らして、生成パフォーマンスを持続させる。私たちの研究はMACGSデザインや展開に対する行動上の洞察、因果関係解析がMACGSを理解し改善する強力なアプローチであることを確認します。

Original Content

arXiv:2602.02138v2 Announce Type: replace-cross Abstract: Despite the remarkable success that Multi-Agent Code Generation Systems (MACGS) have achieved, the inherent complexity of multi-agent architectures produces substantial volumes of intermediate outputs. To date, the individual importance of these intermediate outputs to the system correctness remains opaque, which impedes targeted optimization of MACGS designs. To address this challenge, we propose CAM, the first \textbf{C}ausality-based \textbf{A}nalysis framework for \textbf{M}ACGS that systematically quantifies the contribution of different intermediate features for system correctness. By comprehensively categorizing intermediate outputs and systematically simulating realistic errors on intermediate features, we identify the important features for system correctness and aggregate their importance rankings. We conduct extensive empirical analysis on the identified importance rankings. Our analysis reveals intriguing findings: first, we uncover context-dependent features\textemdash features whose importance emerges mainly through interactions with other features, revealing that quality assurance for MACGS should incorporate cross-feature consistency checks; second, we reveal that hybrid backend MACGS with different backend LLMs assigned according to their relative strength achieves up to 7.2\% Pass@1 improvement, underscoring hybrid architectures as a promising direction for future MACGS design. We further demonstrate CAM's practical utility through two applications: (1) failure repair which achieves a 73.3\% success rate by optimizing top-3 importance-ranked features and (2) feature pruning that reduces up to 66.8\% intermediate token consumption while maintaining generation performance. Our work provides actionable insights for MACGS design and deployment, establishing causality analysis as a powerful approach for understanding and improving MACGS.