arxiv_cs_lg 2026年4月24日

ChipCraftBrain: 多エージェントオーケストレーションを活用した検証第一の RTL 生成

ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration

Translated: 2026/4/24 20:02:49

chipcraftbrainrtl-generationmulti-agent-orchestrationlarge-language-modelsformal-verification

Japanese Translation

arXiv:2604.19856v1 Announce Type: cross 要旨：大規模言語モデル（LLM）は、自然言語仕様に基づいて登録転送レベル（RTL）コードを生成する可能性を示していますが、ワンショット生成では標準ベンチマークにおいて機能正確性が 60-65% と低く留まります。MAGE などのマルチエージェントアプローチは VerilogEval で 95.9% まで到達していますが、NVIDIA の CVDP などより厳しい産業用ベンチマークで未検証であり、総合化への考慮が不足しており、API コストも高まっています。当論文では、記号的・ニューラル推論と適応型マルチエージェントオーケストレーションを組み合わせた自動化 RTL 生成フレームワーク「ChipCraftBrain」を提唱します。本システムの 4 つの革新点は以下の通りです：(1) 168 次元の状態ベクトルをベースにした PPO ポリシーを用いて、6 つの専門エージェントへの適応型オーケストレーション（代替としてワールドモデルに基づく MPC プランナーも評価済み）；(2) 特殊目的エージェントが波形タイミングと汎用 RTL を扱い、他のエージェントが K マップや真偽テーブル問題をアルゴリズミックに解決する、混合記号的・ニューラルアーキテクチャ；(3) 321 パターンベースと 971 つのオープンソース実装から構成される知識拡張型生成、かつ焦点に合わせたリトリエバーによる処理；(4) 依存関係順序によるサブモジュールへの階層的仕様に分解、およびインターフェース同期。 VerilogEval-Human では、ChipCraftBrain は 97.2% の平均 pass@1（7 ルン間 96.15-98.72%、ベスト 154/156）を達成し、ChipAgents（97.4%、自報）と同水準、MAGE（95.9%）を凌駕しました。NVIDIA の CVDP の 302 のみ問題を対象とする非エージェント的サブセット（5 つのカテゴリ）については、3 ルン平均の 94.7%（286/302）を達成し、公知のワンショットベースラインに対して各カテゴリで 36-60 ポイントもアップグレードしました。NVIDIA の ACE-RTL を共有する 4 つのカテゴリのうち 3 つでは最善を記録しています。さらに、課題当ごご試行回数が約 30 倍少ないにもかかわらず、これらカテゴリをリードしています。RISC-V SoC における事例研究では、階層的分解により 8/8 の lint 通過モジュール（689 LOC）を生成し、FPGA 上で検証を完了しました。一方、一気立て生成では完全に失敗しました。

Original Content

arXiv:2604.19856v1 Announce Type: cross Abstract: Large Language Models (LLMs) show promise for generating Register-Transfer Level (RTL) code from natural language specifications, but single-shot generation achieves only 60-65% functional correctness on standard benchmarks. Multi-agent approaches such as MAGE reach 95.9% on VerilogEval yet remain untested on harder industrial benchmarks such as NVIDIA's CVDP, lack synthesis awareness, and incur high API costs. We present ChipCraftBrain, a framework combining symbolic-neural reasoning with adaptive multi-agent orchestration for automated RTL generation. Four innovations drive the system: (1) adaptive orchestration over six specialized agents via a PPO policy over a 168-dim state (an alternative world-model MPC planner is also evaluated); (2) a hybrid symbolic-neural architecture that solves K-map and truth-table problems algorithmically while specialized agents handle waveform timing and general RTL; (3) knowledge-augmented generation from a 321-pattern base plus 971 open-source reference implementations with focus-aware retrieval; and (4) hierarchical specification decomposition into dependency-ordered sub-modules with interface synchronization. On VerilogEval-Human, ChipCraftBrain achieves 97.2% mean pass@1 (range 96.15-98.72% across 7 runs, best 154/156), on par with ChipAgents (97.4%, self-reported) and ahead of MAGE (95.9%). On a 302-problem non-agentic subset of CVDP spanning five task categories, we reach 94.7% mean pass@1 (286/302, averaged over 3 runs), a 36-60 percentage-point lift per category over the published single-shot baseline; we additionally lead three of four categories shared with NVIDIA's ACE-RTL despite using roughly 30x fewer per-problem attempts. A RISC-V SoC case study demonstrates hierarchical decomposition generating 8/8 lint-passing modules (689 LOC) validated on FPGA, where monolithic generation fails entirely.