arxiv_cs_ai 2026年2月10日

AIツールの使用によるQマトリックスの開発と検証

The Use of AI Tools to Develop and Validate Q-Matrices

Translated: 2026/3/7 11:15:58

ai-toolcognitive-diagnostic-modelq-matrix-development

Japanese Translation

認識診断モデル (CDM) の構築には重要なステップでありながら、作業としては大変手間がかかるのがQマトリックスの開発です。この研究では人工知能ツール（総合的な言語モデル）が、Cognitive diagnostic modeling(CDM)でQマトリックスを自動生成できるかをLiとSuen (2013) の validated Qマトリックスを使用して比較しました。2025年5月には、訓練資料の同じものを提供し、人間の専門家と人工知能（AI）モデルがQマトリックスを作成しています。AIモデルの間での合意度だけでなく、そのAIモデルとの validated Qマトリックスとの共通点を Cohenの kappaを使用して評価しました。結果は、AI アイテムの生成には多大なばらつきがありましたが、Google Gemini 2.5 Proは最も高い (Kappa = 0.63) 合意度を達成し、すべての人間のレビュワーの合意度を超えていました。2回目では、新しいAIバージョンを使用して調査が行われ、validated Qマトリックスに対する AI の合意度は低下しました。今後の研究の可能性については指摘しています。

Original Content

arXiv:2602.08796v1 Announce Type: new Abstract: Constructing a Q-matrix is a critical but labor-intensive step in cognitive diagnostic modeling (CDM). This study investigates whether AI tools (i.e., general language models) can support Q-matrix development by comparing AI-generated Q-matrices with a validated Q-matrix from Li and Suen (2013) for a reading comprehension test. In May 2025, multiple AI models were provided with the same training materials as human experts. Agreement among AI-generated Q-matrices, the validated Q-matrix, and human raters' Q-matrices was assessed using Cohen's kappa. Results showed substantial variation across AI models, with Google Gemini 2.5 Pro achieving the highest agreement (Kappa = 0.63) with the validated Q-matrix, exceeding that of all human experts. A follow-up analysis in January 2026 using newer AI versions, however, revealed lower agreement with the validated Q-matrix. Implications and directions for future research are discussed.