Does SGD implicitly optimize for smoothness?

Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition . DAGM GCPR: German Conference on Pattern Recognition , LNCS, vol. 12544 LNCS, 246–259.

Download
No fulltext has been uploaded. References only!

Conference Paper | Published | English

Scopus indexed
Department
Series Title
LNCS
Abstract
Modern neural networks can easily fit their training set perfectly. Surprisingly, despite being “overfit” in this way, they tend to generalize well to future data, thereby defying the classic bias–variance trade-off of machine learning theory. Of the many possible explanations, a prevalent one is that training by stochastic gradient descent (SGD) imposes an implicit bias that leads it to learn simple functions, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the smoothness conjecture which states that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and we conduct experiments to determine whether SGD indeed implicitly optimizes for these measures. Our findings rule out the possibility that smoothness measures based on first-order derivatives are being implicitly enforced. They are supportive, though, of the smoothness conjecture for measures based on second-order derivatives.
Publishing Year
Date Published
2021-03-17
Proceedings Title
42nd German Conference on Pattern Recognition
Volume
12544 LNCS
Page
246-259
Conference
DAGM GCPR: German Conference on Pattern Recognition
Conference Location
Virtual
Conference Date
2020-09-28 – 2020-10-01
ISSN
eISSN
IST-REx-ID

Cite this

Volhejn V, Lampert C. Does SGD implicitly optimize for smoothness? In: 42nd German Conference on Pattern Recognition . Vol 12544 LNCS. Springer; 2021:246-259. doi:10.1007/978-3-030-71278-5_18
Volhejn, V., & Lampert, C. (2021). Does SGD implicitly optimize for smoothness? In 42nd German Conference on Pattern Recognition (Vol. 12544 LNCS, pp. 246–259). Virtual: Springer. https://doi.org/10.1007/978-3-030-71278-5_18
Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” In 42nd German Conference on Pattern Recognition , 12544 LNCS:246–59. Springer, 2021. https://doi.org/10.1007/978-3-030-71278-5_18.
V. Volhejn and C. Lampert, “Does SGD implicitly optimize for smoothness?,” in 42nd German Conference on Pattern Recognition , Virtual, 2021, vol. 12544 LNCS, pp. 246–259.
Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition . DAGM GCPR: German Conference on Pattern Recognition , LNCS, vol. 12544 LNCS, 246–259.
Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” 42nd German Conference on Pattern Recognition , vol. 12544 LNCS, Springer, 2021, pp. 246–59, doi:10.1007/978-3-030-71278-5_18.

Export

Marked Publications

Open Data IST Research Explorer

Search this title in

Google Scholar
ISBN Search