Convolutional Normalization

Feb-18-2021–arXiv.org Machine Learning

During the past few years, there has been considerable success of applying deep learning to complex problems ranging from speech synthesis (cite WaveNet) to deep reinforcement learning achieving victories in complex games such as Go [19]. This, in turn, has fueled massive interest in the field. At the core of deep learning are training algorithms which allow neural networks to learn from the data they are presented or learn how to extremize a target quantity. One such algorithm is the stochastic gradient descent (SGD) (cite) and it is one of most widely used training algorithms. However, this does not come without issues as many training algorithms suffer from being slow at converging to the optimal state and lack stability when approaching the optimum (citation needed).One common approach is to transform the data into a manageable form through centring and scaling the data and it is the base of many normalization techniques. In fact, the success of "Batch Normalization" by Sgzedy and Ioffe [8] sparked an interest in such techniques. In [8], they noticed that the learning procedure, which uses stochastic gradient descent (SGD), or derived adaptive algorithms like Adam [12], can be hindered and they claimed that a phenomenon known as internal covariance shift is the cause. This phenomenon depends on the change of the network's weights during backpropagation that can lead to the shift of data-distribution towards the saturation regime of the activation

batch normalization, neural network, standard deviation, (15 more...)

arXiv.org Machine Learning

Feb-18-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Hawaii > Honolulu County
    - Honolulu (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.94)
  - Neural Networks > Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found