---
res:
bibo_abstract:
- "The optimization of multilayer neural networks typically leads to a solution\r\nwith
zero training error, yet the landscape can exhibit spurious local minima\r\nand
the minima can be disconnected. In this paper, we shed light on this\r\nphenomenon:
we show that the combination of stochastic gradient descent (SGD)\r\nand over-parameterization
makes the landscape of multilayer neural networks\r\napproximately connected and
thus more favorable to optimization. More\r\nspecifically, we prove that SGD solutions
are connected via a piecewise linear\r\npath, and the increase in loss along this
path vanishes as the number of\r\nneurons grows large. This result is a consequence
of the fact that the\r\nparameters found by SGD are increasingly dropout stable
as the network becomes\r\nwider. We show that, if we remove part of the neurons
(and suitably rescale the\r\nremaining ones), the change in loss is independent
of the total number of\r\nneurons, and it depends only on how many neurons are
left. Our results exhibit\r\na mild dependence on the input dimension: they are
dimension-free for two-layer\r\nnetworks and depend linearly on the dimension
for multilayer networks. We\r\nvalidate our theoretical findings with numerical
experiments for different\r\narchitectures and classification tasks.@eng"
bibo_authorlist:
- foaf_Person:
foaf_givenName: Alexander
foaf_name: Shevchenko, Alexander
foaf_surname: Shevchenko
- foaf_Person:
foaf_givenName: Marco
foaf_name: Mondelli, Marco
foaf_surname: Mondelli
foaf_workInfoHomepage: http://www.librecat.org/personId=27EB676C-8706-11E9-9510-7717E6697425
orcid: 0000-0002-3242-7020
bibo_volume: 119
dct_date: 2020^xs_gYear
dct_language: eng
dct_publisher: Proceedings of Machine Learning Research@
dct_title: Landscape connectivity and dropout stability of SGD solutions for over-parameterized
neural networks@
...