IRIS Università Commerciale Luigi Bocconihttps://iris.unibocconi.itIl sistema di repository digitale IRIS acquisisce, archivia, indicizza, conserva e rende accessibili prodotti digitali della ricerca.Mon, 17 Jan 2022 06:27:46 GMT2022-01-17T06:27:46Z10131Properties of the geometry of solutions and capacity of multilayer neural networks with rectified linear unit activationshttp://hdl.handle.net/11565/4022084Titolo: Properties of the geometry of solutions and capacity of multilayer neural networks with rectified linear unit activations
Abstract: Rectified linear units (ReLUs) have become the main model for the neural units in current deep learning systems. This choice was originally suggested as a way to compensate for the so-called vanishing gradient problem which can undercut stochastic gradient descent learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: While the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.
Tue, 01 Jan 2019 00:00:00 GMThttp://hdl.handle.net/11565/40220842019-01-01T00:00:00ZPlastic number and possible optimal solutions for an Euclidean 2-matching in one dimensionhttp://hdl.handle.net/11565/4029514Titolo: Plastic number and possible optimal solutions for an Euclidean 2-matching in one dimension
Abstract: In this work we consider the problem of finding the minimumweight loop cover of an undirected graph. This combinatorial optimization problem is called 2-matching and can be seen as a relaxation of the traveling salesman problem since one does not have a unique loop condition. We consider this problem both on the complete bipartite and complete graph embedded in a one dimensional (1D) interval, the weights are chosen as a convex function of the Euclidean distance between each couple of points. Randomness is introduced throwing the points in space independently and uniformly. We derive the average optimal cost in the limit of a large number of points. We prove that the possible solutions are characterized by the presence of shoelace loops containing two or three points of each type in the complete bipartite case, and three, four or five points in the complete one. This gives rise to an exponential number of possible solutions scaling as pN, where p is the plastic constant. This is at variance to what happens in the previously studied 1D models, such as the matching and the traveling salesman problem, where for every instance of the disorder there is only one possible solution.
Mon, 01 Jan 2018 00:00:00 GMThttp://hdl.handle.net/11565/40295142018-01-01T00:00:00ZSelberg integrals in 1D random Euclidean optimization problemshttp://hdl.handle.net/11565/4029510Titolo: Selberg integrals in 1D random Euclidean optimization problems
Abstract: We consider a set of Euclidean optimization problems in one dimension, where the cost function associated to the couple of points x and y is the Euclidean distance between them to an arbitrary power , the points are chosen at random with uniform measure. We derive the exact average cost for the random assignment problem, for any number of points using Selberg's integrals. Some variants of these integrals enable the exact average cost for the bipartite travelling salesman problem to be derived.
Tue, 01 Jan 2019 00:00:00 GMThttp://hdl.handle.net/11565/40295102019-01-01T00:00:00ZFluctuations in the random-link matching problemhttp://hdl.handle.net/11565/4029524Titolo: Fluctuations in the random-link matching problem
Abstract: Using the replica approach and the cavity method, we study the fluctuations of the optimal cost in the random-link matching problem. By means of replica arguments, we derive the exact expression of its variance. Moreover, we study the large deviation function, deriving its expression in two different ways, namely using both the replica method and the cavity method.
Tue, 01 Jan 2019 00:00:00 GMThttp://hdl.handle.net/11565/40295242019-01-01T00:00:00ZTwo-loop corrections to the large-order behavior of correlation functions in the one-dimensional N -vector modelhttp://hdl.handle.net/11565/4029526Titolo: Two-loop corrections to the large-order behavior of correlation functions in the one-dimensional N -vector model
Abstract: For a long time, the predictive limits of perturbative quantum field theory have been limited by our inability to carry out loop calculations to an arbitrarily high order, which become increasingly complex as the order of perturbation theory is increased. This problem is exacerbated by the fact that perturbation series derived from loop diagram (Feynman diagram) calculations represent asymptotic (divergent) series which limits the predictive power of perturbative quantum field theory. Here, we discuss an ansatz that could overcome these limits, based on the observations that (i) for many phenomenologically relevant field theories, one can derive dispersion relations which relate the large-order growth (the asymptotic limit of "infinite loop order") with the imaginary part of arbitrary correlation functions, for negative coupling ("unstable vacuum"), and (ii) one can analyze the imaginary part for negative coupling in terms of classical field configurations (instantons). Unfortunately, the perturbation theory around instantons, which could lead to much more accurate predictions for the large-order behavior of Feynman diagrams, poses a number of technical as well as computational difficulties. Here, we study, to further the above-mentioned ansatz, correlation functions in a one-dimensional (1D) field theory with a quartic self-interaction and an O(N) internal symmetry group, otherwise known as the 1D N-vector model. Our focus is on corrections to the large-order growth of perturbative coefficients, i.e., the limit of a large number of loops in the Feynman diagram expansion. We evaluate, in momentum space, the two-loop corrections for the two-point correlation function, and its derivative with respect to the momentum, as well as the two-point correlation function with a wigglet insertion. Also, we study the four-point function. These quantities, computed at zero momentum transfer, enter the renormalization-group functions (Callan-Symanzik equation) of the model. Our calculations pave the way for further development of related methods in field theory and for a better understanding of field-theoretical expansions at large order.
Wed, 01 Jan 2020 00:00:00 GMThttp://hdl.handle.net/11565/40295262020-01-01T00:00:00ZWide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtureshttp://hdl.handle.net/11565/4033612Titolo: Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures
Abstract: We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape in the vicinity of a Bayes-optimal solution, and show that the closer we get to such configurations, the higher the local entropy, implying that the Bayes-optimal solution lays inside a wide flat region. We also consider the algorithmically relevant case of targeting wide flat minima of the (differentiable) mean squared error loss. Our analytical and numerical results show not only that in the balanced case the dependence on the norm of the weights is mild, but also, in the unbalanced case, that the performances can be improved.
Wed, 01 Jan 2020 00:00:00 GMThttp://hdl.handle.net/11565/40336122020-01-01T00:00:00ZThe random fractional matching problemhttp://hdl.handle.net/11565/4020240Titolo: The random fractional matching problem
Abstract: We consider two formulations of the random-link fractional matching problem, a relaxed version of the more standard random-link (integer) matching problem. In one formulation, we allow each node to be linked to itself in the optimal matching configuration. In the other one, on the contrary, such a link is forbidden. Both problems have the same asymptotic average optimal cost of the random-link matching problem on the complete graph. Using a replica approach and previous results of Wastlund (2010 Acta Mathematica 204 91-150), we analytically derive the finite-size corrections to the asymptotic optimal cost. We compare our results with numerical simulations and we discuss the main differences between random-link fractional matching problems and the random-link matching problem.
Mon, 01 Jan 2018 00:00:00 GMThttp://hdl.handle.net/11565/40202402018-01-01T00:00:00ZAverage optimal cost for the Euclidean TSP in one dimensionhttp://hdl.handle.net/11565/4029520Titolo: Average optimal cost for the Euclidean TSP in one dimension
Abstract: The traveling salesman problem is one of the most studied combinatorial optimization problems, because of the simplicity of its statement and the difficulty in its solution. We study the traveling salesman problem when the positions of the cities are chosen at random in the unit interval and the cost associated with the travel between two cities is their distance elevated to an arbitrary power p ∈ R. We characterize the optimal cycle and compute theaverage optimal cost for every number of cities when the measure used to choose the position of the cities is flat and asymptotically for large number of cities in the other cases. We also show that the optimal cost is a self-averaging quantity, and we test our analytical results with extensive simulations.
Tue, 01 Jan 2019 00:00:00 GMThttp://hdl.handle.net/11565/40295202019-01-01T00:00:00ZSolution for a bipartite Euclidean traveling-salesman problem in one dimensionhttp://hdl.handle.net/11565/4029522Titolo: Solution for a bipartite Euclidean traveling-salesman problem in one dimension
Abstract: The traveling-salesman problem is one of the most studied combinatorial optimization problems, because of the simplicity in its statement and the difficulty in its solution. We characterize the optimal cycle for every convex and increasing cost function when the points are thrown independently and with an identical probability distribution in a compact interval. We compute the average optimal cost for every number of points when the distance function is the square of the Euclidean distance. We also show that the average optimal cost is not a self-averaging quantity by explicitly computing the variance of its distribution in the thermodynamic limit. Moreover, we prove that the cost of the optimal cycle is not smaller than twice the cost of the optimal assignment of the same set of points. Interestingly, this bound is saturated in the thermodynamic limit.
Mon, 01 Jan 2018 00:00:00 GMThttp://hdl.handle.net/11565/40295222018-01-01T00:00:00ZFinite-size corrections in the random assignment problemhttp://hdl.handle.net/11565/4029518Titolo: Finite-size corrections in the random assignment problem
Abstract: We analytically derive, in the context of the replica formalism, the first finite-size corrections to the average optimal cost in the random assignment problem for a quite generic distribution law for the costs. We show that, when moving from a power-law distribution to a Γ distribution, the leading correction changes both in sign and in its scaling properties. We also examine the behavior of the corrections when approaching a δ-function distribution. By using a numerical solution of the saddle-point equations, we provide predictions that are confirmed by numerical simulations.
Sun, 01 Jan 2017 00:00:00 GMThttp://hdl.handle.net/11565/40295182017-01-01T00:00:00Z