How to Code and Understand DeepMind's Neural Stack Machine - i am trask
For more on derivatives and differentiability, see the rest of that tutorial.) Why do we care that the stack (as a function) is differentiable? Well, we used the "derivative" of the function to move the error around (more specifically... to backpropagate). For more on this, please see the Tutorial I Wrote on Basic Neural Networks, Gradient Descent, and Recurrent Neural Networks. I particularly recommend the last one because it demontrates backpropgating through somewhat more arbitrary vector operations... kindof like what we're going to do here.
Apr-8-2016, 14:55:34 GMT