arxiv_cs_lg 2026年2月10日

トランスフォーマーモデルのノイズ耐性

Noise Stability of Transformer Models

Translated: 2026/3/15 15:04:00

transformernoise-stabilitydeep-learninggeneralizationinterpretability

Japanese Translation

arXiv:2602.08287v1 Announce Type: new 要約：機械学習における単純さバイアスの理解は、信頼性の高い AI 開発への有望な経路を提供します。このための一般的な指標の一つに、真の関数解析に着想を得た平均感度があります。これは、モデルの単一トークン誤差への強健性を捉えるものです。平均感度には二つの主要な制限があるとわれ、それは実数ドメインにおける自然な一般化を欠き、現代の LLM（大規模言語モデル）で経験的に観察される「ジュンタのような」入力依存性を説明できないことです。これらの制限に対処するため、私たちが提案するのは、より包括的な単純さ指標としてのノイズ耐性です。ノイズ耐性は、すべての入力座標に同時に適用される相関付きノイズに対するモデルの強健性を表します。単一レイヤーのアテンションおよび ReLU MLP レイヤーのノイズ耐性に関する理論解析を提供し、多層伝搬の問題を共変差間隔伝搬アプローチで攻めました。この理論に基づき、実践的なノイズ耐性正則化法を発展させました。アルゴリズム的および次のトークン予測タスクでの実験では、私らの正則化器は一貫してgrokking（理解の突発的獲得）を促進し、それぞれ約$35 ext{ extper thousand}$と$75 ext{ extper thousand}$のトレーニング加速をもたらしたことを示しました。私らの結果は、ニューラルネットワークにおけるシグナル伝播と可読性との間の新しい接続を形作り、ノイズ耐性はそのための理解と改善に強力なツールとして現れた現代のトランスフォーマーを。

Original Content

arXiv:2602.08287v1 Announce Type: new Abstract: Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately $35\%$ and $75\%$ respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.