Review for NeurIPS paper: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
–Neural Information Processing Systems
Additional Feedback: This paper introduces a method for efficient exploration in RL. The proposed method assumes an MDP with high-dimensional states that are generated by an underlying lower-dimensional process, such that these states can be compressed via an unsupervised learning algorithm/oracle. The method then (1) defines an MDP over the resulting low-dimensional state space; and (2) learns a policy by generating trajectories in low-dimensional space, which arguably facilitates exploration. At each iteration, the algorithm gathers data to compute a policy and also to improve the embedding model computed by the unsupervised algorithm. The authors show that as long as the unsupervised algorithm and the tabular RL algorithm have polynomial sample complexity, it is possible to find a near-optimal policy with polynomial complexity in the number of latent states, which is much smaller than the number of high-dimensional states.
Neural Information Processing Systems
Jun-2-2025, 02:02:21 GMT