arxiv_cs_lg 2026年2月10日

凸優位性と深層学習 I: 損失と学習率のスケールリング法則

Convex Dominance in Deep Learning I: A Scaling Law of Loss and Learning Rate

Translated: 2026/3/15 13:04:05

deep-learningoptimizationscaling-lawsloss-landscapeconvex-optimization

Japanese Translation

arXiv:2602.07145v1 発表タイプ: 新しい要約: 深層学習は非凸の損失ランドスケープを持ち、その最適化ダイナミクスの解析や制御は困難である。それでも、様々なタスク、モデル、最適化器、ハイパーパラメータにおいて、そのダイナミクスは実証的に凸的特徴を示すことが多い。本研究では、深層学習における凸性とリプシッツ連続性の適用性を検証し、学習率スケジューリングを通じて損失ダイナミクスを正確に制御することを試みる。深層学習は訓練初期の短い期間後、急速に弱に凸に転じ、損失は最後のイテレーションの上界によって予測可能であることが示され、これは最適な学習率のスケールをさらに情報提供する。凸性の視点から、我々はトレーニングホライズンで最大 80 倍、モデルサイズで最大 70 倍に extrapolate する学習率と損失のスケールリング法則を構築した。

Original Content

arXiv:2602.07145v1 Announce Type: new Abstract: Deep learning has non-convex loss landscape and its optimization dynamics is hard to analyze or control. Nevertheless, the dynamics can be empirically convex-like across various tasks, models, optimizers, hyperparameters, etc. In this work, we examine the applicability of convexity and Lipschitz continuity in deep learning, in order to precisely control the loss dynamics via the learning rate schedules. We illustrate that deep learning quickly becomes weakly convex after a short period of training, and the loss is predicable by an upper bound on the last iterate, which further informs the scaling of optimal learning rate. Through the lens of convexity, we build scaling laws of learning rates and losses that extrapolate as much as 80X across training horizons and 70X across model sizes.