arxiv_cs_cv 2026年4月24日

制御可能な人間中心型動画生成における合成データ拡張の役割の探求

Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

Translated: 2026/4/24 19:42:25

synthetic-datahuman-video-generationdiffusion-modelssim2real-gapembodied-ai

Japanese Translation

arXiv:2604.21291v1 Announce Type: new Abstract: 制御可能な人間動画生成は、明示的に指導された動きと外見を持つリアリズムの高い人間の動画を生成することを目的としており、デジタルヒューマンの基盤、アニメーション、そしてエンボディメント AI の基礎となっています。しかし、大規模で多様な、かつプライバシーに配慮された人間動画データセットの希少さは、特に珍しい身分と複雑な動作において大きなボトルネックとなっています。合成データは拡張性があり制御可能な代替案を提供しますが、その実用的な貢献が依然として未開拓のまま、Sim2Real（シミュレーションから現実への）ギャップが持続しているためです。本研究では、合成データが制御可能な人間動画生成に及ぼす影響を体系的に調査します。私たちが提案する拡散モデル（diffusion-based）ベースの枠組みは、外見と動きを細粒度で制御可能にしつつ、合成データが実際のデータとどのように相互作用するかを分析するための統一的なテストベッドを提供します。広範な実験を通じ、私らは合成データと実際のデータの補完的な役割を明らかにし、動きのリアリズム、時系列の一貫性、および身分の保持を向上させるために合成サンプルを効率的に選択する方法を示しました。私たちの研究は、人間中心型動画合成における合成データの役割に対する最初の包括的な探求を提供するとともに、データ効率が高く汎用性の高い生成モデルを構築するための実践的な知見を与えます。

Original Content

arXiv:2604.21291v1 Announce Type: new Abstract: Controllable human video generation aims to produce realistic videos of humans with explicitly guided motions and appearances,serving as a foundation for digital humans, animation, and embodied AI.However, the scarcity of largescale, diverse, and privacy safe human video datasets poses a major bottleneck, especially for rare identities and complex actions.Synthetic data provides a scalable and controllable alternative,yet its actual contribution to generative modeling remains underexplored due to the persistent Sim2Real gap.In this work,we systematically investigate the impact of synthetic data on controllable human video generation. We propose a diffusion-based framework that enables fine-grained control over appearance and motion while providing a unfied testbed to analyze how synthetic data interacts with real world data during training. Through extensive experiments, we reveal the complementary roles of synthetic and real data and demonstrate possible methods for efficiently selecting synthetic samples to enhance motion realism,temporal consistency,and identity preservation.Our study offers the first comprehensive exploration of synthetic data's role in human-centric video synthesis and provides practical insights for building data-efficient and generalizable generative models.