Review for NeurIPS paper: Practical Quasi-Newton Methods for Training Deep Neural Networks
–Neural Information Processing Systems
Weaknesses: There are some weaknesses of QN methods applied to deep neural networks that also somewhat limit the applicability of the proposed algorithm. First, there are additional hyperparameters compared to first-order methods that need to be tuned beyond learning rate, namely damping terms (two for this algorithm), decay parameter for calculating moving-average to stabilize BFGS updates and the memory-parameter p for LBFGS. The authors merged the two damping terms into a single hyperparameter assuming some relation between them and performed sensitivity analysis, however a systematic way of tuning all these hyperparameters to a new application remains a bit challenging. Second, generalization performance of deep networks trained via second-order methods might lag behind first-order methods such as small batch SGD and Adam, especially without careful hyperparameter tuning. The authors have chosen not to include generalization results in the experiments and argued that the focus of the paper is comparing optimization techniques.
Neural Information Processing Systems
Jan-22-2025, 04:14:07 GMT
- Technology: