Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

Dann, Christoph, Mansour, Yishay, Mohri, Mehryar, Sekhari, Ayush, Sridharan, Karthik

Jun-21-2021–arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has achieved several remarkable empirical successes in the last decade, which include playing Atari 2600 video games at superhuman levels (Mnih et al., 2015), AlphaGo or AlphaGo Zero surpassing champions in Go (Silver et al., 2018), AlphaStar's victory over top-ranked professional players in StarCraft (Vinyals et al., 2019), or practical self-driving cars. These applications all correspond to the setting of rich observations, where the state space is very large and where observations may be images, text or audio data. In contrast, most provably efficient RL algorithms are still limited to the classical tabular setting where the state space is small (Kearns and Singh, 2002; Brafman and Tennenholtz, 2002; Azar et al., 2017; Dann et al., 2019) and do not scale to the rich observation setting. To derive guarantees for large state spaces, much of the existing work in RL theory relies on a realizability and a low-rank assumption (Krishnamurthy et al., 2016; Jiang et al., 2017; Dann et al., 2018; Du et al., 2019a; Misra et al., 2020; Agarwal et al., 2020b). Different notions of rank have been adopted in the literature, including that of a low-rank transition matrix (Jin et al., 2020a), a low Bellman rank (Jiang et al., 2017), Wittness rank (Sun et al., 2019), Eluder dimension (Osband and Van Roy, 2014), Bellman-Eluder dimension (Jin et al., 2021), or bilinear classes (Du et al., 2021).

algorithm, computer game, ground transportation, (21 more...)

arXiv.org Artificial Intelligence

Jun-21-2021

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Israel (0.14)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.93)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found