The Implicit Bias of Minima Stability: A View from Function Space Supplementary Material

Neural Information Processing Systems 

This document contains supplementary material for the article'The Implicit Bias of Minima Stability: A View from Function Space', and includes the following parts: I. Experimental details and additional experiments II. Switching system formulation for single hidden layer ReLU networks VIII. Generalization of Lemma 4 to global minima that are not twice-differentiable XI. For the experiment in Sec. 5 we generated a different dataset of This learning rate warm-up was used in all training runs. Specifically, we used PyTorch's standard initialization, and multiplied it by different factors.