pmnist classification task
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Appendix for Integrating Momentum into Recurrent Neural Networks
Section 3.1, we flatten and process the image as a sequence of the length of 784 pixel-by-pixel. The baseline LSTM models consist of one LSTM cell with 128 and 256 hidden units. Orthogonal initialization is used for input-to-hidden weights, while hidden-to-hidden weights are initialized to identity matrices. The gradient norms are clipped to 1 during training. The log-magnitude of these sequences is fed into the models as the input data.