[R] "Unbiasing Truncated Backpropagation Through Time", Tallec & Ollivier 2017 • r/MachineLearning

@machinelearnbot 

The big point here is that we are improving the optimization approach by adding clever noise into the gradient. By sampling different truncation lengths, the gradient estimate we obtain becomes stochastic. It doesn't come as much of a surprise that adding noise does slow down the training procedure. However, as mentionned, the noise we introduce is not any noise: it provides unbiasedness. Notably, this means that ARTBP considers some minima that Truncated Backprop does not see as minima, as it is biased.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found