On Optimization of Deep Neural Networks
The aforementioned tools provide the necessary elements to obtain proper gradients for the network parameter updates. Ultimately we needed to devise an effective strategy to utilize these gradients. This time, the inspiration came from physics in the form of momentum. One of the most commonly used optimizers is Stochastic gradient descent (SGD). Unfortunately, SGD is inherently limiting as it employs first-order information only.
Jun-14-2020, 15:06:21 GMT
- Technology: