Partial local entropy and anisotropy in deep weight spaces
Recent studies on the weight space of deep neural networks [1, 2] have highlighted the existence of rare subdominant clusters of configurations which yield a high test accuracy. Although these clusters constitute a deviation from typicality, they are efficiently encountered by stochastic gradient descent (SGD) algorithms and correspond to wide valleys of suitable loss functions, such as cross entropy [3]. An analogous circumstance occurs in the context of constraint satisfaction problems, where the chase after clusters of solutions is improved when the loss function gets supplemented by a term that encourages a local high density of solutions [4]. In order to find the number of solutions contained in a vicinity of a specific weight configuration, one can define a local solution-counting functional, namely, a local entropy. Classification tasks performed by means of quantized neural networks (where the weights are discrete) can be interpreted as constraint satisfaction problems. There are however two reasons to generalize the concept of local entropy: First, classification problems are typically required to reach a high but not necessarily perfect accuracy; second, they are often approached with machines that have continuous weights.
Sep-10-2020
- Country:
- South America > Chile
- Europe > Spain
- Galicia > A Coruña Province > Santiago de Compostela (0.04)
- Genre:
- Research Report > Experimental Study (0.46)