arxiv_cs_lg 2026年2月10日

不変なニューラルネットワーク：内部ノイズに強健なモデルを一般化させるトレーニング

Incorruptible Neural Networks: Training Models that can Generalize to Large Internal Perturbations

Translated: 2026/3/15 13:06:11

neural-networksmachine-learningrobustnessoptimizationgeneralization

Japanese Translation

arXiv:2602.07320v1 Announce Type: new Abstract: ニューラルネットワークの損失地形における平坦領域は、より良い汎化性能と相関していると考えられてきた。これに関連するが別個の課題として、ウェイトに対する内部擾乱に対するモデルの頑健性を備えたモデルのトレーニングが存在する。これは、将来の低電力ハードウェアプラットフォームにとって重要な要件となる可能性がある。本稿では、ウェイトの多様なランダムな腐敗に対する強健な極小値を見つけるために、シャープネス・アウェア・ミニマイゼーション (SAM) とランダム・ウェイト・擾乱 (RWP) の 2 つの方法の使用を検討する。2 つの視点からこの問題を検討する：汎化（ノイズ強健な汎化ギャップをどのように減少させるか）と最適化（強い擾乱を被った状況下で最適化器から最大のパフォーマンスを得る方法）。まず、理論的および経験的に、過度の正規化された RWP トレーニング目標がノイズ強健な汎化にとって最適であると確立する。小規模のノイズについては、SAM の敵対的オブジェクト関数がどの RWP 設定よりもパフォーマンスを向上させますが、大規模なノイズに対してはパフォーマンスが低く出ます。この理由を、SAM と RWP の両方に影響を与える損失地形の不均一性によるバニッシング・グラディエント効果に結び付ける。最後に、損失地形の進化に合わせて擾乱強度を動的に調整することで、これらの擾乱目標に対する最適化が改善されることを示す。

Original Content

arXiv:2602.07320v1 Announce Type: new Abstract: Flat regions of the neural network loss landscape have long been hypothesized to correlate with better generalization properties. A closely related but distinct problem is training models that are robust to internal perturbations to their weights, which may be an important need for future low-power hardware platforms. In this paper, we explore the usage of two methods, sharpness-aware minimization (SAM) and random-weight perturbation (RWP), to find minima robust to a variety of random corruptions to weights. We consider the problem from two angles: generalization (how do we reduce the noise-robust generalization gap) and optimization (how do we maximize performance from optimizers when subject to strong perturbations). First, we establish, both theoretically and empirically, that an over-regularized RWP training objective is optimal for noise-robust generalization. For small-magnitude noise, we find that SAM's adversarial objective further improves performance over any RWP configuration, but performs poorly for large-magnitude noise. We link the cause of this to a vanishing-gradient effect, caused by unevenness in the loss landscape, affecting both SAM and RWP. Lastly, we demonstrate that dynamically adjusting the perturbation strength to match the evolution of the loss landscape improves optimizing for these perturbed objectives.