Review for NeurIPS paper: Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions
–Neural Information Processing Systems
For random initialization, I also believe that it still needs a lot of effort. The upper bound of E(A(t)) is clearly dependent on the condition number of A(0) instead of simply dividing the cases into full-rank and rank-deficient. Moreover, rather than only focusing on the full-rank case, the author may consider the problem uniformly and continuously, e.g., the MP-law from RMT may help to provide an asymptotic analysis for the random initialization since the universal distribution for the eigenvalues are given. Also, there may exist the non-asymptotic version, but more perturbation bounds are needed. BTW, due to my research background, I neglected the development of shallow neural networks with random Gaussian input. I am sorry about that and raise my score.
Neural Information Processing Systems
Jan-26-2025, 23:38:02 GMT
- Technology: