Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

Zhang, Zihan, Zhou, Yuan, Ji, Xiangyang

arXiv.org Machine Learning 

Reinforcement learning (RL) [5] studies the problem of how to make sequential decisions to learn and act in unknown environments (which is usually modeled by a Markov Decision Process (MDP)) and maximize the collected rewards. There are mainly two types of algorithms to approach the RL problems: model-based algorithms and model-free algorithms. Model-based RL algorithms keep explicit description of the learned model and make decisions based on this model. In contrast, modelfree algorithms only maintain a group of value functions instead of the complete model of the system dynamics. Due to their space-and time-efficiency, model-free RL algorithms have been getting popular in a wide range of practical tasks (e.g., DQN [16], TRPO [18], and A3C [15]). In RL theory, model-free algorithms are explicitly defined to be the ones whose space complexity is always sublinear relative to the space required to store the MDP parameters [12]. For tabular MDPs (i.e., MDPs with finite number of states and actions, usually denoted by S and A respectively), this requires that the space complexity to be opS

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found