arxiv_cs_cv 2026年4月20日

テキストベース人物検索において合成データを検証するための実証的研究

An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval

Translated: 2026/4/20 10:49:17

text-based-person-retrievalsynthetic-datadata-synthesisimage-generationdeep-learning

Japanese Translation

arXiv:2503.22171v2 Announce Type: replace 要約：データはテキストベース人物検索（Text-Based Person Retrieval, TBPR）研究において決定的な役割を果たしています。主流の研究パラダイムは、モデルを訓練するために実世界の人物画像と手動でのテキストアノテーションを必要とすることで、プライバシー懸念やアノテーションの負荷を生み出しています。いくつかの先駆的な試みは合成データ生成を探索していますが、まだ基盤として実データを依存しており、同じ限界を継承しています。完全な合成 TBPR データの faisibility は未検証されており、現在、さまざまな実世界シナリオにおける合成データの有効性の境界について体系的な研究は存在しません。本研究では、TBPR における合成データに対する最初の包括的な実証的研究を提示し、2 つの主要な側面を備えています。(1) 私たちは、実世界の人物データなしに完全に動作する統合データ合成パイプラインを提案します。それは、自動プロンプト構築戦略によって多様なアイデンティティ中心の画像を生み出すクラス間画像生成モジュールと、テキスト駆動の画像編集を通じてアイデンティティの変動を強化するクラス内拡張モジュールを組み合わせています。(2) これらのパイプラインと自動テキスト記述生成を活用し、合成データの効果を多様なシナリオで広範な実験を通じて調査し、それが実データに対するスタンドアロンな代替手段や、補足的な拡張手段として実際的な有用性を持っているかを見出します。

Original Content

arXiv:2503.22171v2 Announce Type: replace Abstract: Data plays a pivotal role in Text-Based Person Retrieval (TBPR) research. Mainstream research paradigm necessitates real-world person images with manual textual annotations for training models, posing privacy concerns and annotation burdens. Several pioneering efforts explore synthetic data generation, and yet still depend on real data as a foundation, inheriting the same limitations. The feasibility of purely synthetic TBPR data remains unexplored, and there is currently no systematic study on the effectiveness boundaries of synthetic data across various real-world scenarios. In this work, we present the first comprehensive empirical study of synthetic data for TBPR, with two key aspects. (1) We propose a unified data synthesis pipeline that can operate entirely without real person data. It combines an inter-class image generation module that produces diverse identity-centric images by means of an automatic prompt construction strategy, and an intra-class augmentation module that enhances identity variation through text-driven image editing. (2) Leveraging this pipeline and an automatic textual description generation, we explore the effectiveness of synthetic data in diverse scenarios through extensive experiments, to reveal its practical utility as either a standalone replacement or a complementary augmentation to real data.