Reinforcement Learning with Logarithmic Regret and Policy Switches

Neural Information Processing Systems 

In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found