arxiv_cs_ai 2026年4月24日

誰が公平性を定義するか？生成モデルにおける人口統計的表現に対するターゲットベースのプロンプトング

Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models

Translated: 2026/4/24 20:15:13

generative-aifairnessprompt-engineeringbias-mitigationdemographic-representation

Japanese Translation

arXiv:2604.21036v1 発表タイプ：新規要約：Stable Diffusion や DALL-E など、テキストから画像生成（T2I）モデルは生成 AI を広く普及させましたが、最近の研究ではこれらのシステムが社会的バイアスを複製しており、特に職業における人口統計グループの描写において顕著であることが示されています。「医師」や「CEO」といったプロンプトはしばしば明るい肌色の出力を生成し、一方、「掃除人」といった低層位の役職については多様性が認められ、ステレオタイプを強化しています。既存の緩和策は通常、モデルの再トレーニングや手作業で収集されたデータセットを必要とするため、多くのユーザーへのアクセスが困難です。我々は、基本モデルを変更することなく、プロンプトレベルでの干渉を介して表現バイアスを緩和するための軽量な推論時フレームワークを提案します。公平性の単一な定義を前提とせず、我々のアプローチは、単純な選択（例：一様分布）から、大規模言語モデル（LLM）による出典に基づく信頼度推定を含むより複雑な定義に至るまで、複数の公平性仕様にユーザーを選ばせることを可能にします。これらの分布は、対応する割合で人口統計的に特化したプロンプトバリエーションの構築を導き、その結果、宣言された目標への準拠度と得られた肌色分布を測定することで、一様分布を「公平性」と見なすことを前提とせず、整合性を監査します。30 つの職業と 6 つの非職業的文脈を跨ぐ 36 つのプロンプトを対象とした我々の方法は、宣言された目標と一貫する方向で観測された肌色の結果を変化させ、目標が肌色の空間に直接定義されている場合（フォールバック）、目標からの偏差を低減します。この研究は、公平性介入を透明化し、制御可能にし、推論時に利用可能にする方法及び、そのことで生成 AI のユーザーに直接エンパワーメントをもたらす方法を示しています。

Original Content

arXiv:2604.21036v1 Announce Type: new Abstract: Text-to-image(T2I) models like Stable Diffusion and DALL-E have made generative AI widely accessible, yet recent studies reveal that these systems often replicate societal biases, particularly in how they depict demographic groups across professions. Prompts such as 'doctor' or 'CEO' frequently yield lighter-skinned outputs, while lower-status roles like 'janitor' show more diversity, reinforcing stereotypes. Existing mitigation methods typically require retraining or curated datasets, making them inaccessible to most users. We propose a lightweight, inference-time framework that mitigates representational bias through prompt-level intervention without modifying the underlying model. Instead of assuming a single definition of fairness, our approach allows users to select among multiple fairness specifications-ranging from simple choices such as a uniform distribution to more complex definitions informed by a large language model(LLM) that cites sources and provides confidence estimates. These distributions guide the construction of demographic specific prompt variants in the corresponding proportions, and we evaluate alignment by auditing adherence to the declared target and measuring the resulting skin tone distribution rather than assuming uniformity as 'fairness'. Across 36 prompts spanning 30 occupations and 6 non-occupational contexts, our method shifts observed skin-tone outcomes in directions consistent with the declared target, and reduces deviation from targets when the target is defined directly in skin-tone space(fallback). This work demonstrates how fairness interventions can be made transparent, controllable, and usable at inference time, directly empowering users of generative AI.