[D] Training with Batch Normalization • r/MachineLearning
Hi, despite all the alchemy which Batch Norm does behind the covariate shift, I understand it simply as normalization layer which tries to keep all activation within some prior distribution. This is especially helpful at the beginning of the training process where badly chosen initialization may lead to vanishing or exploding signal in the network. Once the network is trained the batch norm in test time uses moving averages of estimated mean and variance of training population i.e. it applies simple linear transformation. Did anyone try to compare estimated values of mean and variance, with those computes from whole training set (i.e. I'm wonder how this would affect test accuracy.
Dec-10-2017, 09:10:13 GMT
- Technology: