How to Code and Understand DeepMind's Neural Stack Machine - i am trask
For more on derivatives and differentiability, see the rest of that tutorial.) Why do we care that the stack (as a function) is differentiable? Well, we used the "derivative" of the function to move the error around (more specifically... to backpropagate). For more on this, please see the Tutorial I Wrote on Basic Neural Networks, Gradient Descent, and Recurrent Neural Networks. I particularly recommend the last one because it demontrates backpropgating through somewhat more arbitrary vector operations... kindof like what we're going to do here.
Dec-28-2017, 16:06:01 GMT