RL: Efficient Exploration for Nonepisodic RL

Neural Information Processing Systems 

We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., adapt online and without resets. This setting is ubiquitous in the real world, where resetting is impossible or requires human intervention.