[Research][1610.06258] Using Fast Weights to Attend to the Recent Past • /r/MachineLearning
This is my preliminary understanding of the paper after reading it a few times. Please correct me if I'm wrong. At each "timestep" of the RNN, they process the RNN an extra S times, with an augmented weight matrix. The augmented weight matrix consists of the sum of many helper terms made up of the hidden states from previous timesteps (the outer product h \dot h.T). The proportional coefficient for these outer products will ensure that these helper terms exponentially vanish to zero. Using the outer product of previous states to enforce the weight matrix during inference, is sort of like making the weight matrix "learn" during inference in a Hebbian-learning way, like in a Hopfield network.
Oct-22-2016, 02:35:20 GMT
- Technology: