[Deep Learning] Batch Normalization


Neural networks were difficult to train. Currently, we have a lot of tricks to achieve faster training and to solve the troubles that arise during model training. On the following, let's explore one of those tricks: batch normalization. When training a model, if the scale of two features (x1 1,2…, x2 100, 200 …) are quite different, the weight adjust in the small-scale features would be small. On the graph above, we observe that w1 just is adjusted slowly compared to w2 during the training.

PhaseDNN - A Parallel Phase Shift Deep Neural Network for Adaptive Wideband Learning

arXiv.org Machine Learning

In this paper, we propose a phase shift deep neural network (PhaseDNN) which provides a wideband convergence in approximating a high dimensional function during its training of the network. The PhaseDNN utilizes the fact that many DNN achieves convergence in the low frequency range first, thus, a series of moderately-sized of DNNs are constructed and trained in parallel for ranges of higher frequencies. With the help of phase shifts in the frequency domain, implemented through a simple phase factor multiplication on the training data, each DNN in the series will be trained to approximate the target function's higher frequency content over a specific range. Due to the phase shift, each DNN achieves the speed of convergence as in the low frequency range. As a result, the proposed PhaseDNN system is able to convert wideband frequency learning to low frequency learning, thus allowing a uniform learning to wideband high dimensional functions with frequency adaptive training. Numerical results have demonstrated the capability of PhaseDNN in learning information of a target function from low to high frequency uniformly.

How Does Batch Normalization Help Optimization?

Neural Information Processing Systems

Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift". In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.

US Steps up Winter-Warfare Training as Global Threat Shifts

U.S. News

After 17 years of war against Taliban and al-Qaida-linked insurgents, the military is shifting its focus to better prepare for great-power competition with Russia and China, and against unpredictable foes such as North Korea and Iran. U.S. forces must be able to survive and fight while countering drones, sophisticated jamming equipment and other electronic and cyber warfare that can track them, disrupt communications and kill them -- technology they didn't routinely face over the last decade.

Gradient Descent - Batch Normalization in Neural Networks


Batch Normalization basically means that we normalize each activation individually. Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift. Their paper is a fascinting deep dive into the math of how layers are affected by the input, and how this covariate shift can be reduced by applying batch normalizations.