arxiv_cs_cv 2026年4月24日

モデル融合による効率的なマルチソース知識継承

Efficient Multi-Source Knowledge Transfer by Model Merging

Translated: 2026/4/24 19:52:52

model-mergingmulti-source-learningsingular-value-decompositiontransfer-learningfine-tuning

Japanese Translation

arXiv:2508.19353v2 発表タイプ：cross-replace 要旨：転移学習は効果的な戦略であるが、オンライン上で利用可能な多数のモデルから知識を活用する機会を見過ごす傾向がある。このマルチソース転移学習の問題に対処することは、適応性を向上させ再訓練コストを削減する有望な道筋となる。ただし、既存の手法は本質的に粗粒度であり、微細粒度の知識抽出に必要な精度や、多数のソースモデル、あるいは高パラメータ数を有するモデルからの知識を統合する必要があるスケーラビリティを欠いている。本稿では、奇異値分解 (SVD) を活用して各ソースモデルを単一階数要素に分解し、その後続の統合段階でそれらのソースから最も顕著な要素のみを選択することで、これらの制限を克服している。合成された知識ベースを最適に保持・活用するために、我々の手法は統合された行列の主奇異値のみを微調整することで目標タスクに適応する。本質的に、この過程はトップ SVD 成分の重要性を再調整している。提案されたフレームワークは、ビジョンおよび言語の両分野において、効率的かつスケーラブルなマルチソース転移学習を可能にし、入力空間およびパラメータ空間の乱れに対して頑健である。

Original Content

arXiv:2508.19353v2 Announce Type: replace-cross Abstract: While transfer learning is an effective strategy, it often overlooks the opportunity to leverage knowledge from numerous available models online. Addressing this multi-source transfer learning problem is a promising path to boost adaptability and cut re-training costs. However, existing methods remain inherently coarse-grained: they lack the precision needed for fine-grained knowledge extraction as well as the scalability required to aggregate knowledge from either large numbers of source models or models with high parameter counts. We address these limitations by leveraging Singular Value Decomposition (SVD) to first decompose each source model into its elementary, rank-one components. A subsequent aggregation stage then selects only the most salient components from all sources, thereby overcoming the previous efficiency and precision limitations. To best preserve and leverage the synthesized knowledge base, our method adapts to the target task by fine-tuning only the principal singular values of the merged matrix. In essence, this process recalibrates the importance of top SVD components. The proposed framework allows for efficient and scalable multi-source transfer learning in both vision and language domains, while remaining robust to perturbations in both the input space and the parameter space.