arxiv_cs_lg 2026年4月20日

EnvScaler: LLM エージェントのためのプログラム合成に基づくスケーラブルなツールインタラクション環境の構築

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Translated: 2026/4/20 11:07:39

llmagentprogram-synthesisreinforcement-learningenvironment-simulation

Japanese Translation

arXiv:2601.05808v2 発表タイプ：replace-cross 要約：大規模言語モデル（LLM）は、さまざまな実世界環境でエージェントとして動作するよう訓練されると予想されていますが、このプロセスには豊かで多様なツールインタラクションサンドボックスが不可欠です。しかし、実システムのアクセスは頻繁に制限されており、LLM シミュレーション環境は幻覚と矛盾に悩まされ、手動で構築されたサンドボックスはスケーラブルではありません。本論文では、プログラム合成に基づくスケーラブルなツールインタラクション環境のための自動化されたフレームワークである EnvScaler を提案します。EnvScaler は 2 つのコンポーネントから構成されています。第一に、SkelBuilder はトピックマイニング、ロジックモデリング、品質評価を通じて多様な環境の骨格を構築します。次に、ScenGenerator は各環境に対して複数のタスクシナリオと規則ベースの軌道検証関数を実装します。EnvScaler を使用して、私たちは 191 つの環境と約 7,000 つのシナリオを合成し、これらを Qwen3 シリーズモデルへの監視微調整（SFT）と強化学習（RL）に応用しました。3 つのベンチマークにおける結果は、EnvScaler がマルチターン・マルチツールのインタラクションを伴う複雑な環境におけるタスク解決における LLM の能力を大幅に向上させたと示しています。コードとデータを https://github.com/RUC-NLPIR/EnvScaler にリリースしました。

Original Content

arXiv:2601.05808v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, access to real systems is often restricted; LLM-simulated environments are prone to hallucinations and inconsistencies; and manually built sandboxes are hard to scale. In this paper, we propose EnvScaler, an automated framework for scalable tool-interaction environments via programmatic synthesis. EnvScaler comprises two components. First, SkelBuilder constructs diverse environment skeletons through topic mining, logic modeling, and quality evaluation. Then, ScenGenerator generates multiple task scenarios and rule-based trajectory validation functions for each environment. With EnvScaler, we synthesize 191 environments and about 7K scenarios, and apply them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multi-turn, multi-tool interactions. We release our code and data at https://github.com/RUC-NLPIR/EnvScaler.