arxiv_cs_ai 2026年4月24日

HyperAdapt: 単純かつ高ランク適応

HyperAdapt: Simple High-Rank Adaptation

Translated: 2026/4/24 20:32:19

hyperadaptpeftfine-tuningfoundation-modelsllm-optimization

Japanese Translation

arXiv:2509.18629v3 Announce Type: replace-cross 摘要：ファウンデーションモデルは多様なタスクにおいて卓越しますが、それらを専門的なアプリケーションに適応させるには、メモリと計算資源の消費が大きいファインチューニングがしばしば必要となります。パラメータ効率性の高いファインチューニング（PEFT）手法は、これらを更新する必要がある重みの少数のみを変更することで、この問題を緩和します。本論文では、LoRA などの最新手法と比較して学習可能なパラメータの数を大幅に削減するパラメータ効率性の高いファインチューニング手法である HyperAdapt を導入します。具体的には、HyperAdapt は対角行列を用いた行と列のスケーリングを適用することで事前学習された重み行列を変更し、高ランクの更新をもたらしながらも $n imes m$ の行列に対してのみ $n+m$ の学習可能なパラメータを必要とします。理論的には、HyperAdapt の更新のランクの上限を確立し、実証的に、それが高ランクの変換をモデルのレイヤー全体で一貫して誘導することを確認しました。GLUE、算術論理、共通知能論理のベンチマークで、最大 14B パラメータまでのモデルを対象とした実験により、HyperAdapt はフルファインチューニングや最新 PEFT 手法と同等、あるいはほぼ同等のパフォーマンスを実現すると同時に、桁違いに少ない学習可能なパラメータを使用することが示されました。

Original Content

arXiv:2509.18629v3 Announce Type: replace-cross Abstract: Foundation models excel across diverse tasks, but adapting them to specialized applications often requires fine-tuning, an approach that is memory and compute-intensive. Parameter-efficient fine-tuning (PEFT) methods mitigate this by updating only a small subset of weights. In this paper, we introduce HyperAdapt, a parameter-efficient fine-tuning method that significantly reduces the number of trainable parameters compared to state-of-the-art methods like LoRA. Specifically, HyperAdapt adapts a pre-trained weight matrix by applying row- and column-wise scaling through diagonal matrices, thereby inducing a high-rank update while requiring only $n+m$ trainable parameters for an $n \times m$ matrix. Theoretically, we establish an upper bound on the rank of HyperAdapt's updates, and empirically, we confirm that it consistently induces high-rank transformations across model layers. Experiments on GLUE, arithmetic reasoning, and commonsense reasoning benchmarks with models up to 14B parameters demonstrate that HyperAdapt matches or nearly matches the performance of full fine-tuning and state-of-the-art PEFT methods while using orders of magnitude fewer trainable parameters.