It's Only Natural: An Excessively Deep Dive Into Natural Gradient Optimization

#artificialintelligence 

I'm going to tell a story: one you've almost certainly heard before, but with a different emphasis than you're used to. To a first (order) approximation, all modern deep learning models are trained using gradient descent. At each step of gradient descent, your parameter values begin at some starting point, and you move them in the direction of greatest loss reduction. You do this by taking the derivative of your loss with respect to your whole vector of parameters, otherwise called the Jacobian. However, this is just the first derivative of your loss, and it doesn't tell you anything about curvature, or, how quickly your first derivative is changing.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found