Supplementary material: Neural Anisotropy Directions

Neural Information Processing Systems 

D.1 NADs obtained through the eigendecomposition of the gradient covariance .... 11 D.2 NADs obtained through the SVD of the mixed second derivative.......... 16 Regarding the construction of the synthetic datasets used for the experiments of Sec. 2 and Sec. Regarding the setup and parameters for training the networks used for the experiments of Sec. 2 and Sec. We provide Fig. S2 as a validation of this, Fig. S3 illustrates the test accuracies of various architectures under different noise levels σ. As mentioned in Sec. 3, trying to identify the NADs of an architecture by measuring its performance To demonstrate this, we repeat the same experiment performed in Sec. 2, but instead of The results of this experiment are illustrated in Fig. S4. For more information about the properties of the 2D-DFT, we refer the reader to [1]. Figure S3: Test accuracies using different training sets drawn from D(v) (ɛ = 1, with 10, 000 training samples and 10, 000 test samples) for different levels of σ.