Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs

May-31-2025, 04:11:22 GMT–Neural Information Processing Systems

Achieving the no-regret property for Reinforcement Learning (RL) problems in continuous state and action-space environments is one of the major open problems in the field. Existing solutions either work under very specific assumptions or achieve bounds that are vacuous in some regimes. Furthermore, many structural assumptions are known to suffer from a provably unavoidable exponential dependence on the time horizon H in the regret, which makes any possible solution unfeasible in practice. In this paper, we identify local linearity as the feature that makes Markov Decision Processes (MDPs) both learnable (sublinear regret) and feasible (regret that is polynomial in H). We define a novel MDP representation class, namely Locally Linearizable MDPs, generalizing other representation classes like Linear MDPs and MDPS with low inherent Belmman error.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

May-31-2025, 04:11:22 GMT

Conferences PDF

Add feedback

Country:
- Europe > Italy (0.14)

Genre:
- Research Report > Experimental Study (0.92)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)