Review for NeurIPS paper: Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model
–Neural Information Processing Systems
Additional Feedback: I will increase my score if my concerns are addressed and if the authors could correct my potential misunderstanding. 1. I find the "double descent" phenomenon in the CL length to be interesting. Intuitively, the uncertainty of the model could relate to the variance of the prediction, which we know might blow up at the interpolation threshold due to the variance from label noise or from initialization. Can the author comment on the plausible mechanism of this observation? In this case what would be the motivation of considering a nonlinear perturbation, which would basically be adding noise? 3. The result in Section 2.4 (based on Mei and Montanari 2019) seems to be under the assumption of iid weight matrix W. I might have missed something, but is there a place the authors discussed that this characterization also holds for arbitrary W (independent of X) with bounded spectral norm? 4. (minor) Does the characterization also holds for the ridgeless limit (\lambda 0)? 5. (minor) On Figure 2 Left, why is there a discrepancy between the predicted and simulated boxplot? 6. (minor) Although this is not the motivation of the work, the mentioned connection between NN and RF model typically requires significant overparameterization, and thus the current proportional scaling of n and d might not be the right setup.
Neural Information Processing Systems
Feb-6-2025, 21:45:03 GMT
- Technology: