It is true that NPF learning happens when we train standard DNN with ReLU and it is only

Neural Information Processing Systems 

We thank all the reviewers for their detailed comments. Related work: We will include comparison with [Fiat et al., 2019] in the main section. Most analysis of DNNs with ReLU is on what happens at initialisation. In a DNN with ReLU, NPV and NPF are not statistically independent at initialisation, i.e., Assumption 5.1 does not Hence, though Assumption 5.1 may not hold However, we will move relevant work (example [Srivastava et al., However, studying these in our framework is future work. MNIST and CIFAR-10 are used as standard datasets in most analytical works such as ours, see [Arora et al., 2019] for By generalisation we mean performance on test data.