Reviews: MetaInit: Initializing learning by learning to initialize
–Neural Information Processing Systems
Update: The authors have addressed my questions. I hope in the camera ready there is a clear discussion on Taylor expansion VS. finite difference (at least in the appendix). I also second the other reviewer on the importance of comparing in the batch norm case, since the method should be used as a general purpose initialization scheme. Longer Summary: - Authors introduce the GradientDeviation criterion, which characterizes how much gradient changes after gradient step. Simple and avoids full Hessian () - They use meta-learning to learn the scale of initialization such that GradientDeviation is minimized.
Neural Information Processing Systems
Jan-25-2025, 09:04:27 GMT