Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks

Shevchenko A, Mondelli M. 2020. Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. Proceedings of the 37th International Conference on Machine Learning. vol. 119, 8773–8784.

Download
OA 2020_PMLR_Shevchenko.pdf 5.34 MB
Conference Paper | Published | English
Author
Shevchenko, Alexander; Mondelli, MarcoIST Austria
Department
Abstract
The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.
Publishing Year
Date Published
2020-07-13
Proceedings Title
Proceedings of the 37th International Conference on Machine Learning
Acknowledgement
M. Mondelli was partially supported by the 2019 LopezLoreta Prize. The authors thank Phan-Minh Nguyen for helpful discussions and the IST Distributed Algorithms and Systems Lab for providing computational resources.
Volume
119
Page
8773-8784
IST-REx-ID

Cite this

Shevchenko A, Mondelli M. Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. In: Proceedings of the 37th International Conference on Machine Learning. Vol 119. Proceedings of Machine Learning Research; 2020:8773-8784.
Shevchenko, A., & Mondelli, M. (2020). Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. In Proceedings of the 37th International Conference on Machine Learning (Vol. 119, pp. 8773–8784). Proceedings of Machine Learning Research.
Shevchenko, Alexander, and Marco Mondelli. “Landscape Connectivity and Dropout Stability of SGD Solutions for Over-Parameterized Neural Networks.” In Proceedings of the 37th International Conference on Machine Learning, 119:8773–84. Proceedings of Machine Learning Research, 2020.
A. Shevchenko and M. Mondelli, “Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks,” in Proceedings of the 37th International Conference on Machine Learning, 2020, vol. 119, pp. 8773–8784.
Shevchenko A, Mondelli M. 2020. Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. Proceedings of the 37th International Conference on Machine Learning. vol. 119, 8773–8784.
Shevchenko, Alexander, and Marco Mondelli. “Landscape Connectivity and Dropout Stability of SGD Solutions for Over-Parameterized Neural Networks.” Proceedings of the 37th International Conference on Machine Learning, vol. 119, Proceedings of Machine Learning Research, 2020, pp. 8773–84.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
Access Level
OA Open Access
Date Uploaded
2021-03-02
MD5 Checksum
f042c8d4316bd87c6361aa76f1fbdbbe


Export

Marked Publications

Open Data IST Research Explorer

Sources

arXiv 1912.10095

Search this title in

Google Scholar