Review for NeurIPS paper: Stochastic Normalization

Neural Information Processing Systems 

Summary and Contributions: This paper introduces a novel method to prevent overfitting when fine-tuning a pre-trained network for a new task using a small training set. The paper proposes a hybrid batch normalization layer, called stochastic normalization that, randomly switches the normalization statistics between: those calculated from the current min-batch and the moving average statistics. The authors replace the standard batch normalization layer of different network architectures such as VGG-16, Inception-V3, and Resnet-50 with their proposed stochastic normalization and show empirically that the fine-tuning using the adopted architecture outperforms multiple existing methods for over-fitting problem in fine-tuning. Overall, the paper is studying a very important problem and the proposed method seems to be working in practice. The major problem I have with this paper is the lack of consistency in the experimental set up.