Reviews: Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

Neural Information Processing Systems 

The paper is well-written and the authors are clear about their claims. The idea of critical periods during training with reference to regularization is interesting. If true, this would give a different way to think about generalization. The authors have performed a number of experiments with different configurations. Although, there are deficiencies mentioned below.