Review for NeurIPS paper: Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
–Neural Information Processing Systems
The method targets a model-based approach to solve POMDPs with high-dimensional observation spaces. This problem is tackle by learning jointly about the dynamics of the POMDP and the optimal policy by maximum likelihood using an "RL as inference" type objective. In more detail, the latent space transitions are predicted by an inference model that is trained to maximise an evidence lower bound. The reviewers are mostly positive about the paper. They mention the theoretical soundness of the approach and the quality of writing as well as the empirical set-up and usefulness of the ablations.
Neural Information Processing Systems
Jan-21-2025, 08:11:51 GMT
- Technology: