Reviews: SGD on Neural Networks Learns Functions of Increasing Complexity
–Neural Information Processing Systems
Update: I have read the author's response and am keeping my score Originality: The idea that SGD biases networks towards simple classifiers has been discussed in the community extensively, but lacked a crisp formalization. This paper proposes a formal description for this conjecture (and also extends it -- Conjecture 1 and Claim 2) using a rich and simple information-theoretic framework. This work is roughly related to recent work on understanding the inductive bias of SGD in function space, e.g. Valle-Perez et al, Savarese et al: both only analyze the solution returned by SGD (under different assumptions), and not the intermediate iterates like this paper does. By defining the'simplicity' of a neural network solely on the mutual information between its predictions and another classifier leads to objects which are invariant to reparameterization of the network, and only depend on the function that it implements.
Neural Information Processing Systems
Jan-26-2025, 14:32:49 GMT
- Technology: