arxiv_cs_ai 2026年4月24日

最後にビルドするハネッサム：自動化されたハネッサムの設計を可能にする自動進化フレームワーク

The Last Harness You'll Ever Build

Translated: 2026/4/24 20:14:56

automated-harness-engineeringai-agentsmeta-learningreinforcement-learningtask-automation

Japanese Translation

arXiv:2604.21003v1 Announce Type: new 要約：AI エージェントは、数十回のクリックとフォーム入力が必要な企業向け Web アプリケーションをナビゲートし、検索、抽出、合成にまたがる多段階研究パイプラインを調整し、未知のリポジトリでのコードレビューを自動化し、高度なドメイン知識を要する顧客の Eskalation（エスカレーション）処理を処理するよう、複雑で特定のドメインに特化したワークフローでますます展開されています。 **各新しいタスクドメインでは、 painstaking（緻密で）かつ専門家主導のハネッサムエンジニアリングが必要です**。プロンプト、ツール、調整ロジック、評価基準の設計を行い、ファウンデーションモデルを効果的にします。われわれは、このプロセスを自動化する 2 層のフレームワークを提示します。第一のレベル、**ハネッサム進化ループ**は、単一タスクに対して作業員エージェントのハネッサム$\mathcal{H}$を最適化します：作業員エージェント$W_{\mathcal{H}}$ がタスクを実行し、評価者エージェント$V$ が失敗を敵対的に診断してパフォーマンスを評価し、進化エージェント$E$ は過去の実行履歴に基づいてハネッサムを変更します。第二のレベル、**メタ進化ループ**は、**$\Lambda = (W_{\mathcal{H}}, \mathcal{H}^{(0)}, V, E)** という進化プロトコル自体を多様なタスクに対して最適化します。これにより、**任意の新しいタスクにおいて迅速なハネッサム収束を可能にするプロトコル\(\Lambda^{(\text{best})}$を学習させます**。これによって、エージェントを新しいドメインに適応させるには、人間によるハネッサムエンジニアリングが一切不要になります。われわれはメタラーニングへの対応を形式化し、両方のアルゴリズムを提示します。このフレームワークは、**手動ハネッサムエンジニアリングを自動ハネッサムエンジニアリングに変換する**ものであり、さらに一歩進んで、**自動化そのものの設計を自動化する**ものです。

Original Content

arXiv:2604.21003v1 Announce Type: new Abstract: AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. \textbf{Each new task domain requires painstaking, expert-driven harness engineering}: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the \textbf{Harness Evolution Loop} optimizes a worker agent's harness $\mathcal{H}$ for a single task: a Worker Agent $W_{\mathcal{H}}$ executes the task, an Evaluator Agent $V$ adversarially diagnoses failures and scores performance, and an Evolution Agent $E$ modifies the harness based on the full history of prior attempts. At the second level, the \textbf{Meta-Evolution Loop} optimizes the evolution protocol $\Lambda = (W_{\mathcal{H}}, \mathcal{H}^{(0)}, V, E)$ itself across diverse tasks, \textbf{learning a protocol $\Lambda^{(\text{best})}$ that enables rapid harness convergence on any new task -- so that adapting an agent to a novel domain requires no human harness engineering at all.} We formalize the correspondence to meta-learning and present both algorithms. The framework \textbf{shifts manual harness engineering into automated harness engineering}, and takes one step further -- \textbf{automating the design of the automation itself}.