Review for NeurIPS paper: Provably adaptive reinforcement learning in metric spaces
–Neural Information Processing Systems
Summary and Contributions: This paper studies reinforcement learning (RL) problems on large state and action spaces that are endowed with a metric. They key assumption is that the optimal state-action value function, Q*, is Lipschitz smooth with respect to that metric. The setting is that of an episodic, H-stage Markov decision process, in which the learner must choose an action for each stage of each episode while achieving low regret against an optimal policy. Previously, an algorithm was proposed for this problem based on learning the Q* function on an adaptive discretization of the state-action space that becomes steadily finer on important regions of the space. The regret of this algorithm was thought to depend on the packing number of the state-action space.
Neural Information Processing Systems
Jan-25-2025, 13:55:50 GMT
- Technology: