arxiv_cs_ai 2026年2月10日

Agyn: チームベースの自律型ソフトウェアエンジニアリング用のマルチエージェントシステム

Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering

Translated: 2026/2/14 6:31:27

Japanese Translation

大型言語モデルは、個々のソフトウェアエンジニアリングタスクにおいて強力な能力を示し、一方で大多数の autonomous 系統は問題解決に至るまでが単一の大いなるプロセスかパイプラインベースのプロセスと考えている。しかし実世界のソフトウェア開発は組織としての活動をチーム中心に行うことで実現され、役割の明確な区別・通信・レビューが行われます。この報告では、全自動化モジュールとしてソフトウェアエンジニアリングが組織的なプロセスであることを特定し模型したマルチエージェントシステムを提案します。Agyn というオープンソースのエージェントチーム設定プラットフォーム上に立つもので、そのシステムでは協力性への専門的エージェント(統合・研究・実装・レビュー)が割り当てられ、各エージェントは独立したサンボックス内で試験実行を行うと共に構造化された通信を可能にするように設計されます。<br>システムの開発モードとして定義するアプローチに基づいて問題を解決するため、分析・タスク要求作成・ピルクリーン提出・段階的なレビューといった作業を行います。一方でヒトはその中に介入しないまま稼働させます。<br>重要なことですが、システムの設計が実際生産利用に向けられ、それはSWE-バッジに適合しています。<br>評価結果に関しては、Post hoc SWE-バッジ500においてシステムでの問題の72. 2%を解決し、単一エージェントベース的な基準モデルから比較時、優れたパフォーマンスを示しました。<br>このシステムに関する結果は、チーム構造・モドキュメント設計についての強力なパラダイムが自律型ソフトウェアエンジニアリングにおいて有用であることを示しています。一方、モデルの改善に関しても組織とエージェントのインフラが必要となるのは明らかです。

Original Content

arXiv:2602.01465v2 Announce Type: replace Abstract: Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-based process. In contrast, real-world software development is organized as a collaborative activity carried out by teams following shared methodologies, with clear role separation, communication, and review. In this work, we present a fully automated multi-agent system that explicitly models software engineering as an organizational process, replicating the structure of an engineering team. Built on top of agyn, an open-source platform for configuring agent teams, our system assigns specialized agents to roles such as coordination, research, implementation, and review, provides them with isolated sandboxes for experimentation, and enables structured communication. The system follows a defined development methodology for working on issues, including analysis, task specification, pull request creation, and iterative review, and operates without any human intervention. Importantly, the system was designed for real production use and was not tuned for SWE-bench. When evaluated post hoc on SWE-bench 500, it resolves 72.2% of tasks, outperforming single-agent baselines using comparable language models. Our results suggest that replicating team structure, methodology, and communication is a powerful paradigm for autonomous software engineering, and that future progress may depend as much on organizational design and agent infrastructure as on model improvements.