Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
–Neural Information Processing Systems
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
Neural Information Processing Systems
Aug-15-2025, 17:10:59 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.46)