arxiv_cs_ai 2026年4月20日

ARC-AGI-3: フロントアジエンティックインテリジェンスのための新しい挑戦

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

Translated: 2026/4/20 11:16:33

agentic-intelligencearc-benchmarkreinforcement-learningai-agencyhuman-ai-comparison

Japanese Translation

arXiv:2603.24621v2 Announce Type: replace 要旨: 私たちは、新しい抽象的なターンベースの環境においてエージェントが探査、目標の推測、環境ダイナミクスの内部モデル構築、そして明示的な指示なしに効果的な行動シーケンスの計画を行うことでアジエンティックインテリジェンスを研究するためのベンチマーク ARC-AGI-3 を導入します。先鞭の作品である ARC-AGI-1 と 2 と同様、ARC-AGI-3 は言語および外部知識を排除し、単に新しいタスクにおける流動的適応的効率の評価に焦点を当てています。ARC-AGI-3 の環境はコア・Knowledge プライアンのみを利用し、人間の受験者との大規模なテストを通じて難易度が調整されています。私たちのテストでは、人間は環境の 100% を解決できると示されていますが、2026 年 3 月時点の frontier AI システムは 1% 未満のスコアしか達成できていません。この論文では、ベンチマークの設計、人間行動基準に基づいた効率性ベースのスコアリングフレームワーク、そして環境の構築、検証、調整で使用された手法を呈示します。

Original Content

arXiv:2603.24621v2 Announce Type: replace Abstract: We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents must explore, infer goals, build internal models of environment dynamics, and plan effective action sequences without explicit instructions. Like its predecessors ARC-AGI-1 and 2, ARC-AGI-3 focuses entirely on evaluating fluid adaptive efficiency on novel tasks, while avoiding language and external knowledge. ARC-AGI-3 environments only leverage Core Knowledge priors and are difficulty-calibrated via extensive testing with human test-takers. Our testing shows humans can solve 100% of the environments, in contrast to frontier AI systems which, as of March 2026, score below 1%. In this paper, we present the benchmark design, its efficiency-based scoring framework grounded in human action baselines, and the methodology used to construct, validate, and calibrate the environments.