arxiv_cs_ai 2026年4月20日

Mind DeepResearch 技術報告

Mind DeepResearch Technical Report

Translated: 2026/4/20 11:16:43

mind-deepresearchmulti-agent-rlsearch-optimizationdeep-researchagent-training

Japanese Translation

arXiv:2604.14518v2 発表タイプ：差し替え要約: 私たちは、精密に設計されたデータ合成および多段階トレーニングパイプラインを介して、約 300 億パラメータのモデルのみでも最先クラスの性能を実現する効率的なマルチエージェント深層研究枠組み「Mind DeepResearch（MindDR）」を発表します。MindDR の核心的な革新は、コラボレーション型の 3 エージェントアーキテクチャ（プランニングエージェント、ディープサーチエージェント、レポートエージェント）と、SFT クールドスタート、サーチ・RL、レポート・RL、および好意準則整合を包含する 4 段階のエージェント専用トレーニングパイプラインにあります。この制度により、MindDR は約 300 億規模のモデルであっても競合的な性能を達成することが示されました。具体的には、BrowseComp-ZH で 45.7%、BrowseComp で 42.8%、WideSearch で 46.5%、xbench-DS で 75.0%、DeepResearch Bench で 52.5% というスコアを達成し、同等規模のオープンソースエージェントシステムを凌駕し、より大規模なモデルと同等のパフォーマンスを発揮しました。MindDR は現在、理想汽輪（Li Auto）でオンラインプロダクトとして展開されています。さらに、私たちは理想汽輪の内部プロダクトユーザー相互作用から収集した 500 のリアルな世界中の中国語クエリを含む「MindDR Bench」を提案しました。これは単一の RACE メトリックに依存するのではなく、包括的な多次元ルールブックシステムを通じた評価を備えています。MindDR Bench では、MindDR が 51.8 という最上級スコアを達成しました。

Original Content

arXiv:2604.14518v2 Announce Type: replace Abstract: We present Mind DeepResearch (MindDR), an efficient multi-agent deep research framework that achieves leading performance with only ~30B-parameter models through a meticulously designed data synthesis and multi-stage training pipeline. The core innovation of MindDR lies in a collaborative three-agent architecture (Planning Agent, DeepSearch Agent, and Report Agent) and a four-stage agent-specialized training pipeline comprising SFT cold-start, Search-RL, Report-RL and preference alignment. With this regime, MindDR demonstrates competitive performance even with ~30B-scale models. Specifically, MindDR achieves 45.7% on BrowseComp-ZH, 42.8% on BrowseComp, 46.5% on WideSearch, 75.0% on xbench-DS, and 52.5 on DeepResearch Bench, outperforming comparable-scale open-source agent systems and rivaling larger-scale models. MindDR has been deployed as an online product in Li Auto. Furthermore, we introduce MindDR Bench, a curated benchmark of 500 real-world Chinese queries from our internal product user interactions, evaluated through a comprehensive multi-dimensional rubric system rather than relying on a single RACE metric. On MindDR Bench, MindDR achieves a state-of-the-art score of 51.8.