Review for NeurIPS paper: Theory-Inspired Path-Regularized Differential Network Architecture Search

Neural Information Processing Systems 

The authors theoretically prove that "more skip connections the faster convergence" and "shallow cells benefit faster convergence rate than deep cells". Is there any experimental evidence to verify these claims? However, does pooling operations also have a slower convergence rate than skip connections? In lines 175-178, the authors mentioned that skip connection in shared path and convolution in private path can benefit the Gram matrix singularity of networks. Thus, the convergence rate can be greatly improved.