Review for NeurIPS paper: Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

Jan-21-2025, 08:11:59 GMT–Neural Information Processing Systems

Weaknesses: - The paper's narrative is based around POMDPs, but the experimental evaluation does not really stress the capability of the method in that respect. Evaluation is done on pixel-based control, which is PO of course, but we have know that a lagged observation of a few time-steps can make the state fully observable quickly. Hence, we do not know how the method fares in environments where the state uncertainty has to be actively reduced by the agent. Therefore I think the paper overstates the results. It is easy to get out of this, however, since one can just drop the POMDP claim. For me personally (and the optimal control community) it is obvious that we want some kind of state estimation when we use control, as most–if not all–practical problems are PO.

deep reinforcement learning, latent variable model, stochastic latent actor-critic, (2 more...)

Neural Information Processing Systems

Jan-21-2025, 08:11:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models (1.00)
  - Reinforcement Learning (0.89)