Batch Normalization (BN or BatchNorm) is a technique used to normalize the layer inputs by re-centering and re-scaling. This is done by evaluating the mean and the standard deviation of each input channel (across the whole batch), then normalizing these inputs (check this video) and, finally, both a scaling and a shifting take place through two learnable parameters β and γ. Batch Normalization is quite effective but the real reasons behind this effectiveness remain unclear. Initially, as it was proposed by Sergey Ioffe and Christian Szegedy in their 2015 article, the purpose of BN was to mitigate the internal covariate shift (ICS), defined as "the change in the distribution of network activations due to the change in network parameters during training". In fact, a reason to scale inputs is to get stable training; unfortunately this may be true in the beginning but as the network trains and the weights move away from their initial values there is no guarantee of stability.
May-22-2022, 12:50:21 GMT