arxiv_cs_cv 2026年2月10日

3Dシーンセグメンテーションにおける幾何学増強パラメータ効率化ファインチューニング

On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation

Translated: 2026/3/15 6:02:02

geometric-encoding-mixerpoint-cloudpeft3d-segmentationcomputer-vision

Japanese Translation

arXiv:2505.22444v3 Announce Type: replace Abstract: 大規模事前学習済みポイントクラウドモデルの出現は、3Dシーン理解を劇的に進歩させましたが、これらを特定のダウンストリーミングタスクに適応させるには通常、完全なファインチューニングを必要とします。これにより、高い計算コストとストレージコストが発生します。自然言語処理および2Dビジョンタスクにおいて成功裏に運用されてきたパラメータ効率化ファインチューニング（PEFT）技術は、3Dポイントクラウドモデルに単純に適用された場合、顕著な幾何学および空間分布シフトのためにパフォーマンスが低下する可能性があり、既存のPEFT手法は通常、点を順序なしトークンとして扱い、3Dモデリングにおける重要な局所空間構造および全体幾何学的文脈を無視しています。このギャップを埋めるために、3Dポイントクラウドトランスフォーマーに特化した革新的な幾何学感知PEFTモジュールである幾何学エンコードミッサー（GEM）を提案します。GEMは、微細な局所位置エンコードを明示的に統合し、軽量な潜在Attentionメカニズムを用いて包括的な全体文脈を捉えることで、空間および幾何学的分布の不一致を効果的に解決します。大規模実験により、GEMは完全なファインチューニングと同等、あるいは時にはそれを超えたパフォーマンスを達成し、他のPEFT手法よりもはるかに少ない1.6%のパラメータのみを更新することが示されました。大幅に削減されたトレーニング時間およびメモリ要件により、我々のアプローチは、大規模3Dポイントクラウドモデルの効率的でスケーラブルかつ幾何学感知なファインチューニングのための新たな基準を確立しました。コードはhttps://github.com/LiyaoTang/GEMに利用可能です。

Original Content

arXiv:2505.22444v3 Announce Type: replace Abstract: The emergence of large-scale pre-trained point cloud models has significantly advanced 3D scene understanding, but adapting these models to specific downstream tasks typically demands full fine-tuning, incurring high computational and storage costs. Parameter-efficient fine-tuning (PEFT) techniques, successful in natural language processing and 2D vision tasks, would underperform when naively applied to 3D point cloud models due to significant geometric and spatial distribution shifts. Existing PEFT methods commonly treat points as orderless tokens, neglecting important local spatial structures and global geometric contexts in 3D modeling. To bridge this gap, we introduce the Geometric Encoding Mixer (GEM), a novel geometry-aware PEFT module specifically designed for 3D point cloud transformers. GEM explicitly integrates fine-grained local positional encodings with a lightweight latent attention mechanism to capture comprehensive global context, thereby effectively addressing the spatial and geometric distribution mismatch. Extensive experiments demonstrate that GEM achieves performance comparable to or sometimes even exceeding full fine-tuning, while only updating 1.6% of the model's parameters, fewer than other PEFT methods. With significantly reduced training time and memory requirements, our approach thus sets a new benchmark for efficient, scalable, and geometry-aware fine-tuning of large-scale 3D point cloud models. Code is available at https://github.com/LiyaoTang/GEM.