Reviews: Regret Bounds for Learning State Representations in Reinforcement Learning
–Neural Information Processing Systems
This paper proposes a natural extension of UCRL2 to learning state representations. The proposed algorithm chooses optimistically over a finite set of candidate MDPs and their corresponding policies. The algorithm is analyzed and improves over existing regret bounds. The paper was discussed and all reviewers agree that this is a natural extension of UCRL2 that deserves to be published.
Neural Information Processing Systems
Jan-25-2025, 23:58:45 GMT
- Technology: