Convolutional Normalization

Esposito, Massimiliano, Ganaba, Nader

arXiv.org Machine Learning 

During the past few years, there has been considerable success of applying deep learning to complex problems ranging from speech synthesis (cite WaveNet) to deep reinforcement learning achieving victories in complex games such as Go [19]. This, in turn, has fueled massive interest in the field. At the core of deep learning are training algorithms which allow neural networks to learn from the data they are presented or learn how to extremize a target quantity. One such algorithm is the stochastic gradient descent (SGD) (cite) and it is one of most widely used training algorithms. However, this does not come without issues as many training algorithms suffer from being slow at converging to the optimal state and lack stability when approaching the optimum (citation needed).One common approach is to transform the data into a manageable form through centring and scaling the data and it is the base of many normalization techniques. In fact, the success of "Batch Normalization" by Sgzedy and Ioffe [8] sparked an interest in such techniques. In [8], they noticed that the learning procedure, which uses stochastic gradient descent (SGD), or derived adaptive algorithms like Adam [12], can be hindered and they claimed that a phenomenon known as internal covariance shift is the cause. This phenomenon depends on the change of the network's weights during backpropagation that can lead to the shift of data-distribution towards the saturation regime of the activation

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found