Reviews: Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

Neural Information Processing Systems 

The paper describes how regularization matters in different ways during different parts of the training processs, i.e., the timing is important for the regularization to be effective. Reviewers have several suggestions, which should be incorporated to the extent possible, but the ideas/results shoule be of interest to members of the community.