Neural nets - learning with total gradient rather than stochastic gradients? • /r/MachineLearning

#artificialintelligence 

The estimate of the gradient from just a mini-batch is usually good enough to point you in the right descent direction. It doesn't make sense to do the extra computation for a marginally better estimate. Plus, the inaccuracy or noise introduced by the mini-batch approximation can act as a regularizer. Here is an interesting paper that performs statistical tests during optimization: if the gradient is not statistically significant, more samples are added to the mini-batch.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found