arxiv_cs_ai 2026年4月24日

幾何単項式 (GEM): ライナライザー的な性能で純粋な有理数算術に基づく C^{2N}級滑らかなアクティベーション関数の一族

Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

Translated: 2026/4/24 20:28:08

activation-functionsneural-networksdeep-learningoptimizationmachine-learning

Japanese Translation

arXiv:2604.21677v1 Announce Type: cross 要約：アクティベーション関数の選択は、ディープニューラルネットワークの最適化とパフォーマンスに決定的な役割を果たす。Rectified Linear Unit (ReLU) は、その単純さと効果から支配的地位にあるが、その滑らかさの欠如はディープアーキテクチャにおける勾配ベースの最適化を阻害する可能性がある。本研究では、ReLU 様の性能を達成しつつ、完全に有理数算術に基づく log-logistic CDF を持つ閾値を持つ C^{2N}級滑らかなアクティベーション関数の一族を提案する。我々は 3 つのバリアントを提案する：GEM（基本族）、E-GEM（任意の L^p-ReLU 近似を可能にする epsilon パラメータ化了一般化）、および SE-GEM（dead neuron を除去し、C^{2N}接点滑らかさを保つ断続的なバリアント）。N によるアブラショーン研究により、標準の深さのネットワークに対して N=1 が最適であることが確定し、CIFAR-100 + ResNet-56 における GELU の不足（GELU deficit）を 6.10% から 2.12% に低下させた。滑らかさパラメータ N はさらに CNN-transformer のトレードオフを浮き彫りにする：深い CNN には N=1 が、Transformer には N=2 が好ましい。MNIST において、E-GEM は最良の基準値（99.23%）と同等の結果を示した。CIFAR-10 + ResNet-56 において、SE-GEM（epsilon=10^{-4}）は GELU（92.51% vs 92.44%）を上回り、GEM 一族のアクティベーション関数として GELU を上回る最初の例となった。CIFAR-100 + ResNet-56 において、E-GEM は GEM N=2（6.10% の不足）から僅か 0.62% の不足に低下させる結果を示した。GPT-2 (124M) において、GEM は最小の Perplexity（GELU の 73.76 対 72.57）を達成し、GEM N=1 も GELU（73.32）を上回った。BERT-small において、E-GEM（epsilon=10）はすべてのアクティベーション関数において最良の検証損失（6.656）を達成した。epsilon パラメータ化はスケール依存の最適解を浮き彫りにする：深い CNN と大きな Transformer には小さな epsilon（10^{-4}--10^{-6}）が、浅い深度に制限され勾配が制約されていない小規模 Transformer（BERT-small）には大きな epsilon（epsilon=10）が利点をもたらす。

Original Content

arXiv:2604.21677v1 Announce Type: cross Abstract: The choice of activation function plays a crucial role in the optimization and performance of deep neural networks. While the Rectified Linear Unit (ReLU) remains the dominant choice due to its simplicity and effectiveness, its lack of smoothness may hinder gradient-based optimization in deep architectures. In this work we propose a family of $C^{2N}$-smooth activation functions whose gate follows a log-logistic CDF, achieving ReLU-like performance with purely rational arithmetic. We introduce three variants: GEM (the base family), E-GEM (an $\epsilon$-parameterized generalization enabling arbitrary $L^p$-approximation of ReLU), and SE-GEM (a piecewise variant eliminating dead neurons with $C^{2N}$ junction smoothness). An $N$-ablation study establishes $N=1$ as optimal for standard-depth networks, reducing the GELU deficit on CIFAR-100 + ResNet-56 from 6.10% to 2.12%. The smoothness parameter $N$ further reveals a CNN-transformer tradeoff: $N=1$ is preferred for deep CNNs, while $N=2$ is preferred for transformers. On MNIST, E-GEM ties the best baseline (99.23%). On CIFAR-10 + ResNet-56, SE-GEM ($\epsilon=10^{-4}$) surpasses GELU (92.51% vs 92.44%) -- the first GEM-family activation to outperform GELU. On CIFAR-100 + ResNet-56, E-GEM reduces the GELU deficit from 6.10% (GEM $N=2$) to just 0.62%. On GPT-2 (124M), GEM achieves the lowest perplexity (72.57 vs 73.76 for GELU), with GEM $N=1$ also beating GELU (73.32). On BERT-small, E-GEM ($\epsilon=10$) achieves the best validation loss (6.656) across all activations. The $\epsilon$-parameterization reveals a scale-dependent optimum: small $\epsilon$ ($10^{-4}$--$10^{-6}$) for deep CNNs and larger transformers, with the special case of small transformers (BERT-small) benefiting from large $\epsilon$ ($\epsilon=10$) due to its limited depth and unconstrained gradients.