Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Nov-23-2020–arXiv.org Machine Learning

Designing efficient algorithms that learn and plan in sequential decision-making tasks with large state and action spaces has become a central task of modern reinforcement learning (RL) in recent years. RL often assumes the environment as a Markov Decision Process (MDP), described by a tuple of state space, action space, reward function, and transition probability function. Due to a large number of possible states and actions, traditional tabular reinforcement learning methods such as Q-learning (Watkins, 1989), which directly access each state-action pair, are computationally intractable. A common approach to cope with high-dimensional state and action spaces is to utilize feature mappings such as linear functions or neural networks to map states and actions to a low-dimensional space. Recently, a large body of literature has been devoted to provide regret bounds for online RL with linear function approximation. These works can be divided into two main categories. The first category of works is of model-free style, which directly parameterizes the action-value function as a linear function of some given feature mapping. For instance, Jin et al. (2020) studied the episodic MDPs with linear MDP assumption, which assumes that both transition probability function and reward function can be represented as a linear function of a given feature mapping.

algorithm, gap min, inequality hold, (11 more...)

arXiv.org Machine Learning

Nov-23-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Los Angeles County > Los Angeles (0.28)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.62)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found