9f29450d2eb58feb555078bdefe28aa5-Supplemental.pdf

Neural Information Processing Systems 

We use a batch size of 64. The TensorFlow graphs contains 9,430 operations. LSTM cells possible given enough hardware resources. The corresponding TensorFlow graph contains 9,021 operations for a 2-layer model. RNNLM, but its many hidden states make it far more computationally expensive than RNNLM.