Reviews: On Lazy Training in Differentiable Programming

Neural Information Processing Systems 

The paper provided some interesting understanding, but is not significant enough to explain interesting issues in deep learning. The paper showed that lazy training can be caused by parameter scaling, not special to overparameterization of neural networks. What does this tell us about the overparameterized neural networks? Does this result imply that lazy regime of overparameterized neural networks is necessarily due to parameter scaling? If not, lazy regime of overparameterized neural networks cannot be explained simply by parameter scaling.