arxiv_cs_cv 2026年4月24日

Sculpt4D：スパース・アテンション・ディフュージョン・トランフォーマーによる 4D 形状の生成

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Translated: 2026/4/24 19:45:14

diffusion-transformersgenerative-ai3d-generative-modelingsparse-attention4d-generation

Japanese Translation

arXiv:2604.21592v1 Announce Type: new Abstract: 最近の 3D 生成モデルの飛躍的進歩により、静的な形状の合成において驚くべき進展が達成されましたが、高忠実度な動的 4D 生成は、時間的なアーティファクトと膨大な計算資源という障壁によって未熟です。私たちは、前準備済みの 3D Diffusion Transformer（Hunyuan3D 2.1）に効率的な時間モデルをシームレスに統合し、4D トレーニングデータの希少性を軽減する、ネイティブ 4D 生成フレームワーク「Sculpt4D」を提示します。その核は、初期フレームにアンカーを付けながら物体の同一性を保ち、時間減衰スパースマスクを通じて豊かな運動ダイナミクスをキャプチャする Block Sparse Attention メカニズムにあります。この設計は、複雑な空間時間依存関係を高い忠実度で模索すると同時に、フルアテンションの二次関数のオーバーヘッドを避け、ネットワーク総計算を 56% 削減します。したがって、Sculpt4D は時間的に整合性の取れた 4D 合成における新しい最優秀値（SOTA）を確立し、効率的かつスケーラブルな 4D 生成への道筋を描き出します。

Original Content

arXiv:2604.21592v1 Announce Type: new Abstract: Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in temporally coherent 4D synthesis and charts a path toward efficient and scalable 4D generation.