arxiv_cs_cv 2026年4月20日

文化が交錯する時：多文化テキスト生成画像生成（Multicultural Text-to-Image Generation）

When Cultures Meet: Multicultural Text-to-Image Generation

Translated: 2026/4/20 10:49:01

text-to-image-generationmulticultural-aillm-personasai-benchmarkcultural-bias

Japanese Translation

arXiv:2502.15972v2 Announce Type: replace Abstract: テキストから画像生成モデルは、文化的に均質な環境では強力な性能を発揮していますが、人々やランドマークが異なる文化に起源を持つ多文化シーンにおいてそれを生成する能力は、ほぼ未探索の領域に留まっています。我々は、多文化テキスト生成画像生成（Multicultural Text-to-Image Generation）を新たなタスクとして導入し、この設定を研究するための最初のベンチマークを提示しました。我々のデータセットは、5 つの国、3 つの年齢層、2 つの性別、25 の歴史的ランドマーク、および 5 の言語を網羅する 9,000 枚の画像を含んでいます。このベンチマークを用いて、我々は最上位のテキストから画像生成モデルを、整合性、画像の品質、美学、知識、公平性という複数の次元で分析しました。文化的・人口統計的情報を組み立てる一つの戦略として、我々は、異なる文化的人物の持つ LLM を活用し、多文化画像生成を向上させる Multi-Agent 枠組みである MosAIG を探求しました。我々の分析は、より豐富的プロンプトの編成の方が単純なプロンプトと比較して画像品質と文化的文脈を改善できることを示し、一方で言語と人口統計的グループ間の大きな差が存在することを示唆しました。我々はデータセットとコードを https://github.com/AIM-SCU/MosAIG で公開しています。

Original Content

arXiv:2502.15972v2 Announce Type: replace Abstract: Text-to-image generation models have achieved strong performance in culturally homogeneous settings, yet their ability to generate multicultural scenes, where people and landmarks originate from different cultures, remains largely unexplored. We introduce multicultural text-to-image generation as a new task and present the first benchmark designed to study this setting. Our dataset contains 9,000 images spanning five countries, three age groups, two genders, 25 historical landmarks, and five languages. Using this benchmark, we analyze the behavior of state-of-the-art text-to-image models across multiple dimensions, including alignment, image quality, aesthetics, knowledge, and fairness. As one strategy for composing cultural and demographic information, we explore MosAIG, a Multi-Agent framework that enhances multicultural Image Generation by leveraging LLMs with distinct cultural personas. Our analysis shows that richer prompt composition can improve image quality and cultural grounding compared to simple prompts, while revealing substantial disparities across languages and demographic groups. We release our dataset and code at https://github.com/AIM-SCU/MosAIG.