arxiv_cs_cv 2026年2月10日

VideoNeuMat: 生成モデルから神経素材の抽出

VideoNeuMat: Neural Material Extraction from Generative Video Models

Translated: 2026/3/15 18:02:21

videogenerationneuralradiancefields3drenderingmaterialsciencesarxiv2025

Japanese Translation

arXiv:2602.07272v1 発表タイプ: 新しいアブストラクト：3 次元レンダリング用の写実的な素材を作成するには、非凡な芸術的スキルが必要です。素材のための生成モデルが助けを貸すことはできつつありますが、現在は高品質なトレーニングデータの欠如によって制限されています。最新のビデオ生成モデルは、実際の素材の外観を容易に生成しますが、その知識は幾何学と照明とに密接に絡みついており、分離できていません。われわれは、ビデオ拡散モデルから再利用可能な神経素材資産を抽出するための 2 ステージのパイプライン VideoNeuMat を提案します。第一に、我々は素材サンプルビデオを生成するために、制御されたカメラ軌道と照明軌道の下で大規模なビデオモデル（Wan 2.1 14B）をファインチューニングしました。これは、モデルの素材のリアリズムを保ちながら、構造化された測定パターンを学習し、実効的に「バーチャル・ゴンイオReflectometer（仮想分光計）」を創出したことを意味します。第二に、われわれはこれらビデオから、より小さな Wan 1.3B バックボーンからファインチューニングされた大型復元モデル（LRM）を通じて、コンパクトな神経素材を復元しました。生成された 17 フレームのビデオから、我々の LRM は 1 パスで推論を実行し、新しい視点および照明条件下で一般化できる神経素材パラメータを予測します。結果的に得られた素材は、有限な合成トレーニングデータに比べ、リアリズムと多様性が大幅に上回り、素材の知識が、インターネット規模のビデオモデルから、自立した再利用可能な 3 次元資産へ成功して転送できることを示しています。

Original Content

arXiv:2602.07272v1 Announce Type: new Abstract: Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently limited by the lack of high-quality training data. While recent video generative models effortlessly produce realistic material appearances, this knowledge remains entangled with geometry and lighting. We present VideoNeuMat, a two-stage pipeline that extracts reusable neural material assets from video diffusion models. First, we finetune a large video model (Wan 2.1 14B) to generate material sample videos under controlled camera and lighting trajectories, effectively creating a "virtual gonioreflectometer" that preserves the model's material realism while learning a structured measurement pattern. Second, we reconstruct compact neural materials from these videos through a Large Reconstruction Model (LRM) finetuned from a smaller Wan 1.3B video backbone. From 17 generated video frames, our LRM performs single-pass inference to predict neural material parameters that generalize to novel viewing and lighting conditions. The resulting materials exhibit realism and diversity far exceeding the limited synthetic training data, demonstrating that material knowledge can be successfully transferred from internet-scale video models into standalone, reusable neural 3D assets.