Reviews: Online Normalization for Training Neural Networks
–Neural Information Processing Systems
The paper is well motivated and quite clear. I like the distinction between statistical, functional and heuristics methods of normalization. Also, investigating normalization techniques that do not rely on mini-batch statistics is an important research direction. I have however a few remarks concerning ON: 1) How does it compares to Batch Renormalization (BRN)? Both methods rely on running averages of statistics, so I think it would be fair to clearly state what are the differences between the two methods and to thoroughly compare against it in the experimental setup, especially because BRN introduces 1 extra hyper-parameter, while one need to tune 2 of them in ON. 2) How difficult is it to tune both decay rates hyper-parameters?
Neural Information Processing Systems
Jan-27-2025, 01:49:00 GMT
- Technology: