arxiv_cs_ai 2026年4月24日

実践者の AI ガバナンスプロンプトにおける構造的欠陥：5 つの原理に基づく評価枠組みを用いた実証研究

Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework

Open original article

Translated: 2026/4/24 20:23:22

ai-governanceprompt-engineeringagents.mdcomputability-theoryrequirements-engineering

Japanese Translation

arXiv:2604.21090v1 Announce Type: cross 摘要: AI ガバナンスプログラムは、自然言語プロンプトを介して AI エージェントの挙動を制約し方向付けるよう、ますます依存しています。これらのプロンプトは実行可能な仕様として機能し、エージェントの委任、範囲、および品質基準を定義します。にもかかわらず、ガバナンスプロンプトが構造的に完全かどうかを評価する体系的な枠組みは存在しません。我々は計算可能性理論、証明理論、ベイズ的认识論に基づく 5 つの原理評価枠組みを導入し、GitHub から公開されている AGENTS.md ガバナンスファイルの 34 件の公衆向け集合体に適用しました。我々の評価では、37% の評価されたファイル - モデルペアが構造的完全性の閾値を下回るスコアを示し、データ分類と評価基準の基準が最も頻繁に欠如していることが判明しました。これらの結果は、実践者が作成したガバナンスプロンプトが一貫した構造的パターンを持ち、自動化された静的分析で検出および修正できる可能性を示唆しています。我々は AI 支援開発文脈における要件工学実践への影響、AGENTS.md 慣例における以前も文書化されていなかったアーティファクト分類の欠如を特定し、ツールサポートの方針を提案しています。

Original Content

arXiv:2604.21090v1 Announce Type: cross Abstract: AI governance programmes increasingly rely on natural language prompts to constrain and direct AI agent behaviour. These prompts function as executable specifications: they define the agent's mandate, scope, and quality criteria. Despite this role, no systematic framework exists for evaluating whether a governance prompt is structurally complete. We introduce a five-principle evaluation framework grounded in computability theory, proof theory, and Bayesian epistemology, and apply it to an empirical corpus of 34 publicly available AGENTS.md governance files sourced from GitHub. Our evaluation reveals that 37% of evaluated file-model pairs score below the structural completeness threshold, with data classification and assessment rubric criteria most frequently absent. These results suggest that practitioner-authored governance prompts exhibit consistent structural patterns that automated static analysis could detect and remediate. We discuss implications for requirements engineering practice in AI-assisted development contexts, identify a previously undocumented artefact classification gap in the AGENTS.md convention, and propose directions for tool support.