arxiv_cs_cv 2026年2月10日

PAL-Net: P Patch Attention を備えた点ごとの CNN を用いた 3D 顔地標的定位

PAL-Net: A Point-Wise CNN with Patch-Attention for 3D Facial Landmark Localization

Translated: 2026/3/15 14:04:34

pal-netfacial-landmark-localization3d-facial-analysispointwise-cnnpatch-attention

Japanese Translation

arXiv:2510.00910v2 Announce Type: replace 摘要:3D 顔スキャン上の解剖学的地標の手動アノテーションは、時間のかかる専門性を要するタスクであるにもかかわらず、臨床評価、形態計量解析および頭面部研究において依然として重要である。複数の深層学習方法が顔地標的定位のために提案されながら、多くのものは擬似地標に焦点を当てているか、複雑な入力表現を必要とし、これにより臨床的な適用性が制限されている。本研究は、立体計測顔モデル上の 50 の解剖学的地標を定位するための完全に自動化された深層学習パイプライン（PAL-Net）を提示する。この方法は、粗い整列、関心領域フィルタリング、地標の初期近似、パッチベースの点ごとの CNN、およびアテンション機構の組み合わせを用いている。214 件（健常成人の標本）の注釈付けられたスキャンでトレーニング・評価された PAL-Net は、意味定位誤差を 3.686 mm に抑え、解剖学的距離の整合性を 2.822 mm の平均誤差で維持しており、同定間変動性と同様に相当する。一般化能力の評価のために、モデルはさらに FaceScape データセットの 700 人分の被験者で評価され、点ごとの誤差を 0.41 mm に、距離ごとの誤差を 0.38 mm に達した。既存の方法と比較すると、PAL-Net は精度と計算コストの間に良い妥協点を提供する。パフォーマンスは不良なメッシュ品質の領域（耳、髪線など）で劣化する可能性があるが、解剖学的領域全体にわたって一貫した精度を示す。PAL-Net はデータセットおよび顔の領域をまたぎ、点ごとのおよび構造的な評価の両方で既存の方法を上回る一般化能力を示す。これは、高スループットの 3D 人体計測解析のための軽量でスケーラブルなソリューションを提供し、臨床ワークフローをサポートし、手動アノテーションへの依存を削減する可能性があります。ソースコードは https://github.com/Ali5hadman/PAL-Net-A-Point-Wise-CNN-with-Patch-Attention で入手可能です

Original Content

arXiv:2510.00910v2 Announce Type: replace Abstract: Manual annotation of anatomical landmarks on 3D facial scans is a time-consuming and expertise-dependent task, yet it remains critical for clinical assessments, morphometric analysis, and craniofacial research. While several deep learning methods have been proposed for facial landmark localization, most focus on pseudo-landmarks or require complex input representations, limiting their clinical applicability. This study presents a fully automated deep learning pipeline (PAL-Net) for localizing 50 anatomical landmarks on stereo-photogrammetry facial models. The method combines coarse alignment, region-of-interest filtering, and an initial approximation of landmarks with a patch-based pointwise CNN enhanced by attention mechanisms. Trained and evaluated on 214 annotated scans from healthy adults, PAL-Net achieved a mean localization error of 3.686 mm and preserves relevant anatomical distances with a 2.822 mm average error, comparable to intra-observer variability. To assess generalization, the model was further evaluated on 700 subjects from the FaceScape dataset, achieving a point-wise error of 0.41\,mm and a distance-wise error of 0.38\,mm. Compared to existing methods, PAL-Net offers a favorable trade-off between accuracy and computational cost. While performance degrades in regions with poor mesh quality (e.g., ears, hairline), the method demonstrates consistent accuracy across most anatomical regions. PAL-Net generalizes effectively across datasets and facial regions, outperforming existing methods in both point-wise and structural evaluations. It provides a lightweight, scalable solution for high-throughput 3D anthropometric analysis, with potential to support clinical workflows and reduce reliance on manual annotation. Source code can be found at https://github.com/Ali5hadman/PAL-Net-A-Point-Wise-CNN-with-Patch-Attention