Review for NeurIPS paper: Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

Neural Information Processing Systems 

Weaknesses: - The paper's narrative is based around POMDPs, but the experimental evaluation does not really stress the capability of the method in that respect. Evaluation is done on pixel-based control, which is PO of course, but we have know that a lagged observation of a few time-steps can make the state fully observable quickly. Hence, we do not know how the method fares in environments where the state uncertainty has to be actively reduced by the agent. Therefore I think the paper overstates the results. It is easy to get out of this, however, since one can just drop the POMDP claim. For me personally (and the optimal control community) it is obvious that we want some kind of state estimation when we use control, as most–if not all–practical problems are PO.