arxiv_cs_cv 2026年2月10日

ComfyBench: ComfyUI における LLM ベームの自律的協力型 AI システム設計ベンチマーク

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Translated: 2026/3/15 17:01:47

llm-agentcomfyuibenchmarkmulti-agent-systemautonomous-design

Japanese Translation

arXiv:2409.01392v3 Announce Type: replace-cross Abstract: 以前のアール・イー（AI）研究の多くは、特定のタスクでのパフォーマンス向上を主な目的として、知的に最大限にするための単一構造モデルの開発に焦点を当てていました。これに対し、本作品では、LLM ベームのエージェントを用いて、協力型 AI システムを自律的に設計することを研究します。この問題を探索するために、私たちはまず、ComfyUI における協力型 AI システムの設計能力を評価する ComfyBench を導入しました。ComfyBench は、命令従順生成の課題を多岐にわたる 200 の多様なタスクを網羅しており、3,205 のノードと 20 のワークフローに詳しいアノテーションを含み、総合的なベンチマークです。ComfyBench を基に、私たちは LLM ベームのエージェントがワークフローを生成することで、協力型 AI システムを自律的に設計するための新しいフレームワーク、ComfyAgent を発展させました。ComfyAgent は 2 つの核となる概念に基づいています。第一に、ワークフローをコードで表現し、それが解釈器によって反転可能にワークフローに変換され、協力型システムとして実行されます。第二に、既存のワークフローから学習し、与えられたタスク用の新しいワークフローを生成するための相互協働するマルチエージェントシステムを構築しました。実験結果は、ComfyAgent が o1-preview に相当する解決率を実現し、ComfyBench 上では他のエージェントを大きく凌駕することを示しています。しかし、ComfyAgent はまだ創造的なタスクの 15% しか解決していません。LLM ベームのエージェントは、協力型 AI システムを自律的に設計する分野において、長い道があります。ComfyBench の進歩は、より知的で自律的な協力型 AI システムの扉を開いています。

Original Content

arXiv:2409.01392v3 Announce Type: replace-cross Abstract: Much previous AI research has focused on developing monolithic models to maximize their intelligence, with the primary goal of enhancing performance on specific tasks. In contrast, this work attempts to study using LLM-based agents to design collaborative AI systems autonomously. To explore this problem, we first introduce ComfyBench to evaluate agents's ability to design collaborative AI systems in ComfyUI. ComfyBench is a comprehensive benchmark comprising 200 diverse tasks covering various instruction-following generation challenges, along with detailed annotations for 3,205 nodes and 20 workflows. Based on ComfyBench, we further develop ComfyAgent, a novel framework that empowers LLM-based agents to autonomously design collaborative AI systems by generating workflows. ComfyAgent is based on two core concepts. First, it represents workflows with code, which can be reversibly converted into workflows and executed as collaborative systems by the interpreter. Second, it constructs a multi-agent system that cooperates to learn from existing workflows and generate new workflows for a given task. While experimental results demonstrate that ComfyAgent achieves a comparable resolve rate to o1-preview and significantly surpasses other agents on ComfyBench, ComfyAgent has resolved only 15\% of creative tasks. LLM-based agents still have a long way to go in autonomously designing collaborative AI systems. Progress with ComfyBench is paving the way for more intelligent and autonomous collaborative AI systems.