arxiv_cs_ai 2026年4月24日

機械生成テキストにおけるバイアスの特定：検出システムへの考察

Identifying Bias in Machine-generated Text Detection

Translated: 2026/4/24 20:33:40

machine-generated-textbiastext-generationdiscriminationai-safety

Japanese Translation

arXiv:2512.09292v2 Announce Type: replace-cross 摘要：テキスト生成能力の急速な向上に伴い、機械生成テキストの検出に関する関心も増大しました。これは、特定のテキストがモデルによって生成されたか、人間によって書かれたかを識別する能力を指します。検出モデルは高い性能を示しており、しかし重大なマイナス影響を及ぼす潜在能力も持っています。我々は、英語の機械生成テキスト検出システムにおける潜在的なバイアスを探求しました。私たちが整理した学生エッセイのデータセットを用い、4 つの属性——性別、人種・民族、英語の学習者（ELL）状態、経済的地位——に対して 16 つの異なる検出システムをバイアス観点から評価しました。これら 4 つの属性の有意性と効果力度を決定し、それらを回帰ベースモデルを用いて評価し、さらにサブグループ解析を実施しました。結果として、バイアスはおおむねシステム間で一貫していないものの、いくつかの重要な問題が浮上しました。いくつかのモデルは不利なグループを機械生成であるとして分類する傾向を示し、ELLのエッセイは機械生成として分類される可能性が高く、経済的に不利な学生のエッセイは機械生成として分類される可能性が低いことが見出されました。また、人種の多様な ELL エッセイは、同系統の白人のエッセイと比較して、機械生成として過剰に分類される傾向を示しました。最後に、我々は人間の注釈を行った結果、人間は検出タスクにおいて一般的に低い性能を示しましたが、研究対象の属性に対して有意なバイアスはないことを発見しました。

Original Content

arXiv:2512.09292v2 Announce Type: replace-cross Abstract: The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.