arxiv_cs_lg 2026年2月10日

線形次元における確率的凸最適化におけるすべての経験リスク最小化器は失敗する：線形次元における下界

All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension

Translated: 2026/3/15 16:05:42

convex-optimizationoverfittingstochastic-convex-optimizationgradient-descentsample-complexity

Japanese Translation

arXiv:2602.08350v1 発表タイプ：新規要旨: 我々は、確率的凸最適化の setting において、最も良いケースの経験リスク最小化器 (ERM) のサンプル複雑性を研究した。我々は、サンプルサイズが次元に比例し、学習が可能であるにもかかわらず、経験リスク最小化器が一意になり、過剰適合する可能性があるようなインスタンスが存在することを示した。これは Feldman の残された質問を解決した。我々はまた、近似 ERM にもこの結果を拡張した。我々の構築に基づき、我々はさらに、時間数と学習率がサンプルサイズに対して増大する際に、(制約付き) 勾配降下法が過剰適合する可能性があることを示した。具体的には、勾配降下法について $\ heta\left(\\sqrt{\\\eta T/m^{1.5}}\ ight)$ の新しい一般化下界を提供し、ここで $\\eta$ は学習率、$T$ は時間数、$m$ はサンプルサイズを表す。これにより、最良の既知の上限 $\\theta(\\\eta T/m)$$$ と既存の以前の構築から得られた下界とのギャップを指数関数的に狭め、明確にする。

Original Content

arXiv:2602.08350v1 Announce Type: new Abstract: We study the sample complexity of the best-case Empirical Risk Minimizer in the setting of stochastic convex optimization. We show that there exists an instance in which the sample size is linear in the dimension, learning is possible, but the Empirical Risk Minimizer is likely to be unique and to overfit. This resolves an open question by Feldman. We also extend this to approximate ERMs. Building on our construction we also show that (constrained) Gradient Descent potentially overfits when horizon and learning rate grow w.r.t sample size. Specifically we provide a novel generalization lower bound of $\Omega\left(\sqrt{\eta T/m^{1.5}}\right)$ for Gradient Descent, where $\eta$ is the learning rate, $T$ is the horizon and $m$ is the sample size. This narrows down, exponentially, the gap between the best known upper bound of $O(\eta T/m)$ and existing lower bounds from previous constructions.