arxiv_cs_lg 2026年4月24日

DR-Venus: 10K オープンデータのみで構築した、フラントエンド・エッジスケール Deep Research Agent への挑戦

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Translated: 2026/4/24 19:54:45

dr-venusedge-aismall-lmreinforcement-learningdeep-research

Japanese Translation

arXiv:2604.19859v1 Announce Type: new Abstract: エッジスケールな Deep Research Agent を Small Language Models (SLM) で実装することは、コスト、遅延、プライバシーの利点により実世界デプロイメントにおいて魅力的である。この研究では、限られたオープンデータにおいて強力な SLM デープリサーチエージェントをどのように訓練できるかについて検討した。データ品質とデータ利用率の両方を改善し、我々はフラントエンドの 4B パラメータ Deep Research Agentである DR-Venus を提示した。このエージェントは、ほぼ 10K のオープンデータのみで構築され、完全なオープンデータに基づいている。訓練レシピは 2 ステージから構成される。最初の段階では、エージェント型の上流学習 (SFT) を用いて基本的なエージェント機能を確立し、厳格なデータクリーニングと長射程軌道の再サンプルによってデータの品質と利用率を改善した。第二の段階では、長射程のディープリサーチタスクにおける実行信頼性をさらに向上させるためにエージェント型強化学習 (RL) を適用した。この設定における小規模なエージェントに対して RL を効果的に機能させるために、IGPO に基づき、情報利得とフォーマット感知の正規化に基づくターンレベルの報酬を作成した。これにより、監督密度とターンレベルの信用割り当てを強化した。約 10K のオープンデータ全体で構築された DR-Venus-4B は、9B パラメータ未満の先行的エージェントモデルを複数のディープリサーチベンチマークで大幅に上回り、また大幅に大きな 30B クラスのシステムとの差も縮めることができた。追加分析は、4B エージェントが驚くほど強い性能ポテンシャルを持っていることを示し、この文脈における小規模モデルのデプロイメント可能性とテスト時のスケールリングの価値を浮き彫りにした。我々は、エッジスケールなディープリサーチエージェントにおける再現性研究をサポートするために、モデル、コード、および重要なレシピを公開した。

Original Content

arXiv:2604.19859v1 Announce Type: new Abstract: Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.