Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

Neural Information Processing Systems 

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions.