What can flatness teach us: understanding generalisation in Deep Neural Networks

#artificialintelligence 

This is the third post in a series summarising work that seeks to provide a theory of generalisation in Deep Neural Networks (DNNs). Briefly, the first post summarises evidence that DNNs trained with stochastic optimisers (like SGD) find functions with probability proportional to their volume in parameter-space, and the second post argues that these high-volume functions are'simple', thus explaining why DNNs generalise. In the following, we summarise results in [1] which explain why the'flatness of the loss landscape' has been shown to correlate with generalisation -- a well-known result (see e.g. They provide substantial empirical evidence that this correlation is actually a combination of (1) a weak correlation between the local flatness and the volume of the surrounding function, and (2) a strong correlation between volume and generalisation. This combination produces a weak correlation between'flatness' and generalisation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found