arxiv_cs_cv 2026年4月24日

PercHead: DINOv2 と SAM 2.1 に基づく、単一画像から 3D ヘア重建および編集を可能にする認識モデル

PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Translated: 2026/4/24 19:50:24

3d-head-reconstructionvision-transformersperceptual-losssingle-image-3ddeep-learning

Japanese Translation

arXiv:2511.02777v2 発表形式：代替摘要：私たちは、単一画像から 3D ヘア重建および解離型 3D 編集を行うための模型「PercHead」を提案します。これらは、同じ入力に対して複数の合理的な説明が存在するがゆえに本質的に困難なタスクです。私たちのアプローチの核心は、DINOv2 と SAM 2.1 に基づく創新的な認識損失関数にあります。広く採用されている低レベルの損失関数（LPIPS, SSIM, L1 など）とは異なり、私たちは画像に対する深い視覚的理解とその結果が生み出す汎用監督信号に頼ります。我々は、新しい損失関数が標準的な損失関数のドロップイン・リプレイセスとして機能し、高周波数領域の視覚的質を向上させることができることを示しました。私たちのモデルアーキテクチャはビジョン・トランスフォーマー（ViT）に基づいており、これにより 3D 表現を 2D 入力から分離できるようになりました。我々は、視点の一貫性のためにマルチビュー画像、新しい環境への高い転送性を証明するためにはワイルドでの画像を使用してモデルを訓練しました。我々の模型は、ノビェルビュー合成において状態の最優の性能を達成し、さらに極端な視点角度に対しても例外的なrobustnessを示します。我々は、エンコーダーを交換しネットワークを微調整することで、基本模型を解離型 3D 編集に拡張しました。セグメンテーションマップが幾何学、テキストプロンプトまたは参照画像がアッピーランスを制御します。我々はインタラクティブな GUI を通じて、直感的かつ強力な 3D 編集能力を強調しました。プロジェクトページ：https://antoniooroz.github.io/PercHead ビデオ：https://www.youtube.com/watch?v=4hFybgTk4kE

Original Content

arXiv:2511.02777v2 Announce Type: replace Abstract: We present PercHead, a model for single-image 3D head reconstruction and disentangled 3D editing - two tasks that are inherently challenging due to ambiguity in plausible explanations for the same input. At the heart of our approach lies our novel perceptual loss based on DINOv2 and SAM 2.1. Unlike widely-adopted low-level losses like LPIPS, SSIM or L1, we rely on deep visual understanding of images and the resulting generalized supervision signals. We show that our new loss can be a drop-in replacement for standard losses and used to improve visual quality in high-frequency areas. We base our model architecture on Vision Transformers (ViTs), allowing us to decouple the 3D representation from the 2D input. We train our method on multi-view images for view-consistency and in-the-wild images for strong transferability to new environments. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles. We also extend our base model to disentangled 3D editing by swapping the encoder and fine-tuning the network. A segmentation map controls geometry and either a text prompt or a reference image specifies appearance. We highlight the intuitive and powerful 3D editing capabilities through an interactive GUI. Project Page: https://antoniooroz.github.io/PercHead Video: https://www.youtube.com/watch?v=4hFybgTk4kE