arxiv_cs_ai 2026年4月24日

イデア・エラボレーションがアイデアの独自性の自動評価に及ぼす影響

The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality

Translated: 2026/4/24 20:19:07

creative-assessmentlarge-language-modelsartificial-intelligenceself-preference-biasidea-elaboration

Japanese Translation

arXiv:2604.20569v1 Announce Type: cross Abstract: 創造的なタスクにおける回答の独自性を評価する自動システムが増加しています。これらは、人間の評価における主要な制約（コスト、疲労、主観性）に対する潜在的な解決策を提供しますが、自己偏愛バイアスの prelimiary（予備的）証拠が存在します。したがって、自動システムは、人間のものよりも自らのスタイルに密接に関連した結果を優先する傾向にあります。本稿では、大規模言語モデル（LLMs）が離散思考タスクにおける回答の独自性を評価する際に、人間の評価者与（rater）どのように整合するかを調査しました。4,813 の回答を分析しました。これらは、高い創造性と低い創造性を示す人間および ChatGPT-4o によって生成されたAlternate Uses Task（別用途タスク）の回答です。人間の評価者は、激しい訓練を受けた 2 名の大学生でした。機械的な評価者は、AUT（別用途タスク）の回答と対応する人間の評価に基づいた微調整（fine-tuning）を施された 2 つの専門システム（OCSAI および CLAUS）でした。さらに、人間の評価者と同じ指示を提示された ChatGPT-4o も含まれていました。結果は、LLM における自己偏愛バイアスの存在を裏付けました。自動システムは、人工的な回答を優先する傾向がありました。しかし、イデア・エラボレーション（イデア elaboration/詳細化）を制御した分析において、この自己偏愛バイアスは消失しました。本研究の結果は、創造性評価の研究における今後の方向性を強調することで、理論的および方法論的な含意について議論します。

Original Content

arXiv:2604.20569v1 Announce Type: cross Abstract: Automatic systems are increasingly used to assess the originality of responses in creative tasks. They offer a potential solution to key limitations of human assessment (cost, fatigue, and subjectivity), but there is preliminary evidence of a self-preference bias. Accordingly, automatic systems tend to prefer outcomes that are more closely related to their style, rather than to the human one. In this paper, we investigated how Large Language Models (LLMs) align with human raters in assessing the originality of responses in a divergent thinking task. We analysed 4,813 responses to the Alternate Uses Task produced by higher and lower creative humans and ChatGPT-4o. Human raters were two university students who underwent intensive training. Machine raters were two specialised systems fine-tuned on AUT responses and corresponding human ratings (OCSAI and CLAUS) and ChatGPT-4o, which was prompted with the same instructions as human raters. Results confirmed the presence of a self-preference bias in LLMs. Automatic systems tended to privilege artificial responses. However, this self-preference bias disappeared when the analyses controlled for the idea elaboration. We discuss theoretical and methodological implications of these findings by highlighting future directions for research on creativity assessment.