Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

Oct-12-2020–arXiv.org Machine Learning

Reinforcement learning (RL) [5] studies the problem of how to make sequential decisions to learn and act in unknown environments (which is usually modeled by a Markov Decision Process (MDP)) and maximize the collected rewards. There are mainly two types of algorithms to approach the RL problems: model-based algorithms and model-free algorithms. Model-based RL algorithms keep explicit description of the learned model and make decisions based on this model. In contrast, modelfree algorithms only maintain a group of value functions instead of the complete model of the system dynamics. Due to their space-and time-efficiency, model-free RL algorithms have been getting popular in a wide range of practical tasks (e.g., DQN [16], TRPO [18], and A3C [15]). In RL theory, model-free algorithms are explicitly defined to be the ones whose space complexity is always sublinear relative to the space required to store the MDP parameters [12]. For tabular MDPs (i.e., MDPs with finite number of states and actions, usually denoted by S and A respectively), this requires that the space complexity to be opS

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

Oct-12-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Illinois (0.04)
    - California > Los Angeles County
      - Long Beach (0.14)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - United Kingdom > Scotland
    - City of Edinburgh > Edinburgh (0.04)
  - Spain > Andalusia
    - Granada Province > Granada (0.04)
  - France > Auvergne-Rhône-Alpes
    - Lyon > Lyon (0.04)
- Asia > Middle East
  - Jordan (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found