nullg
A Experiment Details and Complete Results
A.2 Model Architectures In this section we describe in detail each of the model architectures we use in our experiments. Our small ConvNet consists of the following layers: A convolutional layer with 32 kernels of size 3 3 and ReLU activation. A max pooling layer with pool size 2 2. A flatten layer. For inputs of shape 32 32 3, this model has 21,697 parameters. Our large ConvNet model consists of the following layers: A convolutional layer with 32 kernels of size 3 3, padding, and ReLU activation.
A Experiment Details and Complete Results
A.2 Model Architectures In this section we describe in detail each of the model architectures we use in our experiments. Our small ConvNet consists of the following layers: A convolutional layer with 32 kernels of size 3 3 and ReLU activation. A max pooling layer with pool size 2 2. A flatten layer. For inputs of shape 32 32 3, this model has 21,697 parameters. Our large ConvNet model consists of the following layers: A convolutional layer with 32 kernels of size 3 3, padding, and ReLU activation.
Appendix: On the Modularity of Hypernetworks
As an additional experiment, we repeated the same experiment (i.e., varying the number of layers of The experiments with Type I functions are presented in the main text. In all of the experiments, the weights of y are set using the He uniform initialization [11]. Figure 1: (a-b) The error obtained by hypernetworks and the embedding method with varying number of layers (x-axis). For the purpose of comparison, we considered the following setting. In the rotations prediction experiment in Sec. 5, we did not apply any regularization or normalization We compared the two models in the configuration of Sec. 5 when fixing the MNIST classification with a varying number of hidden neurons.
Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond (Supplementary File) Pan Zhou Hanshu Y an
It is structured as follows. Then Appendix D gives the proofs of the main results in Sec. 4, including Finally, Appendix E provides the proofs of the results in Sec. 5, including Theorems 5 and 6 which analyze the optimization error, generalization error and excess risk error of the The main limitation of this work is that the analysis in this work cannot be applicable to general nonconvex problems. This is because as explained in Sec. But as shown in Sec. 3, to bound the excess risk error, one needs to first bound In this way, our analysis cannot be applicable to general nonconvex problems. Due to space limitation, we defer more experimental results and details to this appendix.