vsml rnn
7608de7a475c0c878f60960d72a92654-Supplemental.pdf
Figure 10: We are optimizing VSML RNNs to implement neural forwardcomputation suchthat for different inputs and weights a tanh-activated multiplicative interaction is produced (left), with different lines for differentw. Next, we use a deep network and provide intermediate errors by a ground truth network. Finally, we remove intermediate errors and use the RNN's intermediate predictions that are now close to the ground truth. All 6meta test tasks are unseen. Thebottom plot shows the same dataset processed by SGD with Adam which learns significantly slower by followingthegradient. those enabled.
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- Europe > Switzerland (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
A Derivations
To achieve learning in deeper networks we have used a curriculum on random and MNIST data. Next, we use a deep network and provide intermediate errors by a ground truth network. Finally, we remove intermediate errors and use the RNN's intermediate predictions that are now close to the ground truth. Figure 12 provides the entire meta test training trajectories for a subset of all configurations. Furthermore, in Figure 13 we show the cumulative accuracy on the first 100 examples.