Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity
Zhang, Zihan, Zhou, Yuan, Ji, Xiangyang
Reinforcement learning (RL) [5] studies the problem of how to make sequential decisions to learn and act in unknown environments (which is usually modeled by a Markov Decision Process (MDP)) and maximize the collected rewards. There are mainly two types of algorithms to approach the RL problems: model-based algorithms and model-free algorithms. Model-based RL algorithms keep explicit description of the learned model and make decisions based on this model. In contrast, modelfree algorithms only maintain a group of value functions instead of the complete model of the system dynamics. Due to their space-and time-efficiency, model-free RL algorithms have been getting popular in a wide range of practical tasks (e.g., DQN [16], TRPO [18], and A3C [15]). In RL theory, model-free algorithms are explicitly defined to be the ones whose space complexity is always sublinear relative to the space required to store the MDP parameters [12]. For tabular MDPs (i.e., MDPs with finite number of states and actions, usually denoted by S and A respectively), this requires that the space complexity to be opS
Oct-12-2020
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Illinois (0.04)
- California > Los Angeles County
- Long Beach (0.14)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Spain > Andalusia
- Granada Province > Granada (0.04)
- France > Auvergne-Rhône-Alpes
- United Kingdom > Scotland
- Asia > Middle East
- Jordan (0.04)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Oceania > Australia
- Genre:
- Research Report (0.50)