arxiv_cs_lg 2026年4月24日

MIRROR: 大規模言語モデルにおけるメタ認知の定式化のための階層的ベンチマーク

MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models

Translated: 2026/4/24 20:02:10

mllmmachine-learningllm-evaluationagentic-aimetacognition

Japanese Translation

arXiv:2604.19809v1 発表型: 横断抜粋: 私たち、自己認識を用いてより良い意思決定を行えるか評価するため、4 つのメタ認知レベルにわたる 8 つの実験からなるベンチマーク「MIRROR」を導入します。8 つのラボから選ばれた 16 モデルを、約 250,000 の評価インスタンスを介する 5 つの独立した行動計測チャネルを用いて評価しました。コア実験はモデルのフルセットにおいて実行され、専用インフラ要件を持つ実験では明示的にマークされたモデルサブセットのみが対象となります。私たちが実証した 2 つの現象は、エージェント型デプロイメントに直接的な含意を持ちます：(1) 構成的自己予測は普遍的成功せず——元 15 モデルの Exp3-v1 セットにおいて構成定式化誤差は 0.500 から 0.943（バランスの取れた 16 モデルの Exp3-v2 拡張では 0.434 から 0.758）の範囲にあり、モデルがマルチドメインタスクにおける自らの性能を予測できないことを示唆しています。(2) モデルは偶発率を超えるが不完全なドメイン固有の自己認識を示しつつも、この部分的な認識を適切なエージェント行動選択へ体系化的に変換するに失敗します——外部メタ認知制御は確信した失敗率を 0.600 から 0.143 に低下させました（温度 0 で 76% の削減、温度 0.7 における 4 ラボの 5 モデルから平均 70% の削減）。モデルに自身の定式化スコアを提供することは有意な改善をもたらさず（p > 0.05）; 効果的なのは構造的制約のみです。これは、よりよい自己認識ではなく、外部メタ認知の足場（scaffolding）が、より安全な自律 AI システムへの道であることを示唆しています。コード、データ、および Croissant メタデータは、ベンチマークとともに公開されることが計画されています。

Original Content

arXiv:2604.19809v1 Announce Type: cross Abstract: We introduce MIRROR, a benchmark comprising eight experiments across four metacognitive levels that evaluates whether large language models can use self-knowledge to make better decisions. We evaluate 16 models from 8 labs across approximately 250,000 evaluation instances using five independent behavioral measurement channels. Core experiments are run across the full model roster; experiments with specialized infrastructure requirements report explicitly marked model subsets. We find two phenomena with direct implications for agentic deployment: (1) compositional self-prediction fails universally -- the Compositional Calibration Error ranges from 0.500 to 0.943 on the original 15-model Exp3-v1 set (and 0.434 to 0.758 on the balanced 16-model Exp3-v2 expansion), indicating that models cannot predict their own performance on multi-domain tasks, and (2) models exhibit above-chance but imperfect domain-specific self-knowledge yet systematically fail to translate even this partial awareness into appropriate agentic action-selection -- external metacognitive control reduces the Confident Failure Rate from 0.600 to 0.143 (76% reduction at temperature 0; mean 70% at temperature 0.7 across 5 models from 4 labs). Providing models with their own calibration scores produces no significant improvement (p > 0.05); only architectural constraint is effective. This suggests that external metacognitive scaffolding -- not improved self-knowledge -- is the path to safer autonomous AI systems. Code, data, and Croissant metadata will be released publicly with the benchmark.