Review for NeurIPS paper: On Reward-Free Reinforcement Learning with Linear Function Approximation
–Neural Information Processing Systems
The authors study sequential decision processes without reward function. The goal is to learn the transition dynamics such that various reward functions could be optimised efficiently in the future. The authors extend recent work to the linear function approximation case. They provide an analysis of the sample complexity, and show that while for linear MDPs complexity is polynomial, this is not true for MDPs with a linear optimal value functions, providing insight on the hardness of this second class of problems. The strengths of the paper are the theoretical development of the algorithm and the lower bound for MDPs with linear optimal Q functions.
Neural Information Processing Systems
Feb-6-2025, 10:58:16 GMT
- Technology: