Artificial networks have been studied through the prism of statistical mechanics as disordered systems since the 1980s, starting from the simple models of Hopfield’s associative memory and the single-neuron perceptron classifier. Assuming a data is generated by a teacher model, asymptotic generalisation predictions were originally derived using the replica method and the online learning dynamics has been described in the large system limit. In this chapter, we review the key original ideas of this literature along with their heritage in the ongoing quest to understand the efficiency of modern deep learning algorithms. One goal of current and future research is to characterize the bias of the learning algorithms toward well generalising minima in a complex overparametrized loss landscapes with many solutions perfectly interpolating the training data. Works on perceptrons, two-layer committee machines and kernel-like learning machines shed light on the benefits of overparametrizations. Another goal is to understand the advantage of depth while models now commonly feature tens or hundreds of layers. If replica computations apparently fall short in describing general deep neural networks learning, studies of simplified linear or untrained models, as well as the derivation of scaling laws provide first elements of answers.

Neural networks: from the perceptron to deep nets

Lucibello, Carlo
;
Zecchina, Riccardo
2023

Abstract

Artificial networks have been studied through the prism of statistical mechanics as disordered systems since the 1980s, starting from the simple models of Hopfield’s associative memory and the single-neuron perceptron classifier. Assuming a data is generated by a teacher model, asymptotic generalisation predictions were originally derived using the replica method and the online learning dynamics has been described in the large system limit. In this chapter, we review the key original ideas of this literature along with their heritage in the ongoing quest to understand the efficiency of modern deep learning algorithms. One goal of current and future research is to characterize the bias of the learning algorithms toward well generalising minima in a complex overparametrized loss landscapes with many solutions perfectly interpolating the training data. Works on perceptrons, two-layer committee machines and kernel-like learning machines shed light on the benefits of overparametrizations. Another goal is to understand the advantage of depth while models now commonly feature tens or hundreds of layers. If replica computations apparently fall short in describing general deep neural networks learning, studies of simplified linear or untrained models, as well as the derivation of scaling laws provide first elements of answers.
2023
9789811273919
9789811273926
Charbonneau, Patrick; Marinari, Enzo; Mézard, Marc; Parisi, Giorgio; Ricci-Tersenghi, Federico; Sicuro, Gabriele; Zamponi, Francesco
Spin glass theory and far beyond : replica symmetry breaking after 40 years
Gabrié, Marylou; Ganguli, Surya; Lucibello, Carlo; Zecchina, Riccardo
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4061099
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact