A Proof for Theorem 1 and 2

Neural Information Processing Systems 

Gaussian Distributed Weights, which is inspired by [67-69]. This can be extended to convolutional neural networks (CNN). This can also be extended to residual blocks (used in ResNet). Figure 6: Kernel Density Estimation plot of the weight matrix for adversarially trained WRN-34-10. The learning rate is divided by 10 at the 75-th and 90-th epochs. Within the same stage, the same type of residual blocks having 2 convolution operations are used.