[R] "Unbiasing Truncated Backpropagation Through Time", Tallec & Ollivier 2017 • r/MachineLearning
The big point here is that we are improving the optimization approach by adding clever noise into the gradient. By sampling different truncation lengths, the gradient estimate we obtain becomes stochastic. It doesn't come as much of a surprise that adding noise does slow down the training procedure. However, as mentionned, the noise we introduce is not any noise: it provides unbiasedness. Notably, this means that ARTBP considers some minima that Truncated Backprop does not see as minima, as it is biased.
May-27-2017, 14:21:56 GMT
- Technology: