Reviews: MetaInit: Initializing learning by learning to initialize

Jan-25-2025, 09:04:27 GMT–Neural Information Processing Systems

Update: The authors have addressed my questions. I hope in the camera ready there is a clear discussion on Taylor expansion VS. finite difference (at least in the appendix). I also second the other reviewer on the importance of comparing in the batch norm case, since the method should be used as a general purpose initialization scheme. Longer Summary: - Authors introduce the GradientDeviation criterion, which characterizes how much gradient changes after gradient step. Simple and avoids full Hessian () - They use meta-learning to learn the scale of initialization such that GradientDeviation is minimized.

initialization scheme, initialize, initializing, (4 more...)

Neural Information Processing Systems

Jan-25-2025, 09:04:27 GMT

Conferences Web Page

Add feedback

Genre:
- Summary/Review (0.70)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.94)