Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Li, Gen, Wei, Yuting, Chi, Yuejie, Gu, Yuantao, Chen, Yuxin

Sep-28-2020–arXiv.org Machine Learning

Model-free algorithms such as Q-learning (Watkins and Dayan, 1992) play a central role in recent breakthroughs of reinforcement learning (RL) (Mnih et al., 2015). In contrast to model-based algorithms that decouple model estimation and planning, model-free algorithms attempt to directly interact with the environment -- in the form of a policy that selects actions based on perceived states of the environment -- from the collected data samples, without modeling the environment explicitly. Therefore, model-free algorithms are able to process data in an online fashion and are often memory-efficient. Understanding and improving the sample efficiency of model-free algorithms lie at the core of recent research activity (Dulac-Arnold et al., 2019), whose importance is particularly evident for the class of RL applications in which data collection is costly and time-consuming (such as clinical trials, online advertisements, and so on). The current paper concentrates on Q-learning -- an off-policy model-free algorithm that seeks to learn the optimal action-value function by observing what happens under a behavior policy. The off-policy feature makes it appealing in various RL applications where it is infeasible to change the policy under evaluation on the fly. There are two basic update models in Q-learning. The first one is termed a synchronous setting, which hypothesizes on the existence of a simulator (or a generative model); at each time, the simulator generates an independent sample for every state-action pair, and the estimates are updated simultaneously across all state-action pairs. The second model concerns an asynchronous setting, where only a single sample trajectory following a behavior policy is accessible; at each time, the algorithm updates its estimate of a single state-action pair using one state transition from the trajectory.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

Sep-28-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - New Jersey > Mercer County
    - Princeton (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found