Large-width asymptotic properties of neural networks (NNs) with Gaussian distributed weights have been extensively investigated in the literature, with major results characterizing their large-width asymptotic behavior in terms of Gaussian processes and their large-width training dynamics in terms of the neural tangent kernel (NTK). In this paper, we study large-width asymptotics and training dynamics of α-Stable ReLU-NNs, namely NNs with ReLU activation function and α-Stable distributed weights, with α ∈ (0, 2). For α ∈ (0, 2], α-Stable distributions form a broad class of heavy tails distributions, with the special case α = 2 corresponding to the Gaussian distribution. Firstly, we show that if the NN’s width goes to infinity, then a rescaled α-Stable ReLU-NN converges weakly (in distribution) to an α-Stable process, which generalizes the Gaussian process. As a difference with respect to the Gaussian setting, our result shows that the activation function affects the scaling of the α-Stable NN; more precisely, in order to achieve the infinite-width α-Stable process, the ReLU activation requires an additional logarithmic term in the scaling with respect to sub-linear activations. Secondly, we characterize the large-width training dynamics of α-Stable ReLU-NNs in terms an infinite-width random kernel, which is referred to as the α-Stable NTK, and we show that the gradient descent achieves zero training error at linear rate, for a sufficiently large width, with high probability. Differently from the NTK arising in the Gaussian setting, the α-Stable NTK is a random kernel; more precisely, the randomness of the α-Stable ReLU-NN at initialization does not vanish in the large-width training dynamics.

Large-width asymptotics and training dynamics of alpha-Stable Re{LU} neural networks

Favaro, Stefano
;
Fortini, Sandra;
2024

Abstract

Large-width asymptotic properties of neural networks (NNs) with Gaussian distributed weights have been extensively investigated in the literature, with major results characterizing their large-width asymptotic behavior in terms of Gaussian processes and their large-width training dynamics in terms of the neural tangent kernel (NTK). In this paper, we study large-width asymptotics and training dynamics of α-Stable ReLU-NNs, namely NNs with ReLU activation function and α-Stable distributed weights, with α ∈ (0, 2). For α ∈ (0, 2], α-Stable distributions form a broad class of heavy tails distributions, with the special case α = 2 corresponding to the Gaussian distribution. Firstly, we show that if the NN’s width goes to infinity, then a rescaled α-Stable ReLU-NN converges weakly (in distribution) to an α-Stable process, which generalizes the Gaussian process. As a difference with respect to the Gaussian setting, our result shows that the activation function affects the scaling of the α-Stable NN; more precisely, in order to achieve the infinite-width α-Stable process, the ReLU activation requires an additional logarithmic term in the scaling with respect to sub-linear activations. Secondly, we characterize the large-width training dynamics of α-Stable ReLU-NNs in terms an infinite-width random kernel, which is referred to as the α-Stable NTK, and we show that the gradient descent achieves zero training error at linear rate, for a sufficiently large width, with high probability. Differently from the NTK arising in the Gaussian setting, the α-Stable NTK is a random kernel; more precisely, the randomness of the α-Stable ReLU-NN at initialization does not vanish in the large-width training dynamics.
2024
2024
Favaro, Stefano; Fortini, Sandra; Peluchetti, Stefano
File in questo prodotto:
File Dimensione Formato  
2651_Large_width_asymptotics_a.pdf

accesso aperto

Descrizione: Articolo
Tipologia: Pdf editoriale (Publisher's layout)
Licenza: Creative commons
Dimensione 5.48 MB
Formato Adobe PDF
5.48 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4069436
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact