Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs

Neural Information Processing Systems 

Existing solutions either work under very specific assumptions or achieve bounds that are vacuous in some regimes.