Review for NeurIPS paper: Stochastic Normalization

Neural Information Processing Systems 

Though the reviewers remark that the paper brings no insights/analysis, it was well-received by reviewers on average as an empirical architecture design idea, addressing an important problem. The experimental validation is conducted to the standards in the field and shows that the method is empirically useful. The combination BSS StochNorm is particularly promising. The authors are invited to submit the final version, considering the following improvements: - the paper can be densified to avoid self-repetitions and redundancy (of definitions of normalizations, descriptions of the contribution -- something like trice, of the existing methods, the algorithm and its description and Fig 1) - this space and the 9th page could be used to clarify important details of the experimental setup that are needed to understand what is the basis of comparison: how the hyperparameters are chosen per method, whether the 5 trials include a random train-validation splitting; include additional results from the rebuttal and discuss more along along the points below relating to the literature. As pointed out by reviewers, using moving averages is princily different from using batch statistics in that the moving average is considered as a constant for back-propagation.