Appendix A Deferred proofs

Neural Information Processing Systems 

In this section, we show the proofs omitted from Sec. 3 and Sec. 4. A.1 Proof of Lemma 1 We state again Lemma 1 from Sec. 3 and present the proof. First, note that due to the Jensen's inequality, we can have a convenient upper bound which is For this purpose, in Figure 1 we plot: 15 Figure 9: Visualization of the key quantities involved in Lemma 2. We list detailed evaluation and training details below. The single-layer CNN that we study in Sec. 4 has 4 convolutional filters, each of them of size We describe here supporting experiments and visualizations related to Sec. 3 and Sec. 4. C.1 Quality of the linear approximation for ReLU networks The phenomenon is even more pronounced for FGSM perturbations as the linearization error is much higher there. C.2 Catastrophic overfitting in a single-layer CNN We describe here complementary figures to Sec. 4 which are related to the single-layer CNN. Laplace filter which is very sensitive to noise.