Reviews: Planning with Goal-Conditioned Policies
–Neural Information Processing Systems
Post rebuttal: My suggestions/comments were not addressed in the rebuttal, so I keep my score as is. Others have proposed this type of two step optimization where one first learns a compact representation with a VAE on randomly collected samples, then use various RL or planning methods on the representation. However, this doesn't work well for high dimensional spaces where random collection of data for learning the representation space does not give enough samples -- especially from the optimal policy. This work doesn't address this issue, by only evaluating on environments with very small state spaces, where random sampling to train the VAE is feasible. Originality: The idea of planning using TDMs over a latent representation is novel, and a promising direction for goal-directed planning in high-dimensional observation spaces.
Neural Information Processing Systems
Jan-27-2025, 00:42:26 GMT
- Technology: