Review for NeurIPS paper: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Jun-2-2025, 02:02:21 GMT–Neural Information Processing Systems

Additional Feedback: This paper introduces a method for efficient exploration in RL. The proposed method assumes an MDP with high-dimensional states that are generated by an underlying lower-dimensional process, such that these states can be compressed via an unsupervised learning algorithm/oracle. The method then (1) defines an MDP over the resulting low-dimensional state space; and (2) learns a policy by generating trajectories in low-dimensional space, which arguably facilitates exploration. At each iteration, the algorithm gathers data to compute a policy and also to improve the embedding model computed by the unsupervised algorithm. The authors show that as long as the unsupervised algorithm and the tabular RL algorithm have polynomial sample complexity, it is possible to find a near-optimal policy with polynomial complexity in the number of latent states, which is much smaller than the number of high-dimensional states.

algorithm, high-dimensional state, learning, (11 more...)

Neural Information Processing Systems

Jun-2-2025, 02:02:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.73)
  - Unsupervised or Indirectly Supervised Learning (0.64)