Review for NeurIPS paper: Latent World Models For Intrinsically Motivated Exploration

Neural Information Processing Systems 

Summary and Contributions: The paper proposes a novel method to the address the problem of exploration in RL. It is know problem in RL that sparse rewards make random exploration _very_ inefficient. One approach for overcoming such limitations is using intrinsic motivation methods, building an auxiliary reward signal to encourage an agent to seek novel or rare states, for example proportional to inverse visit counts or, as proposed in this paper, some prediction error. Prediction error as a measure of novely can by heaviliy affected by three types of uncertainty by sources: 1. from novelty (epistemic) -- this is the signal we are typically after. This propose a belief state formulation that the authors claim is not too sensitivity to stochasticity and has the ability to extrapolate the state dynamics, such that the prediction error can be a genuine measurement for novelty.