Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Li, Gen, Wei, Yuting, Chi, Yuejie, Gu, Yuantao, Chen, Yuxin

arXiv.org Machine Learning 

Model-free algorithms such as Q-learning (Watkins and Dayan, 1992) play a central role in recent breakthroughs of reinforcement learning (RL) (Mnih et al., 2015). In contrast to model-based algorithms that decouple model estimation and planning, model-free algorithms attempt to directly interact with the environment -- in the form of a policy that selects actions based on perceived states of the environment -- from the collected data samples, without modeling the environment explicitly. Therefore, model-free algorithms are able to process data in an online fashion and are often memory-efficient. Understanding and improving the sample efficiency of model-free algorithms lie at the core of recent research activity (Dulac-Arnold et al., 2019), whose importance is particularly evident for the class of RL applications in which data collection is costly and time-consuming (such as clinical trials, online advertisements, and so on). The current paper concentrates on Q-learning -- an off-policy model-free algorithm that seeks to learn the optimal action-value function by observing what happens under a behavior policy. The off-policy feature makes it appealing in various RL applications where it is infeasible to change the policy under evaluation on the fly. There are two basic update models in Q-learning. The first one is termed a synchronous setting, which hypothesizes on the existence of a simulator (or a generative model); at each time, the simulator generates an independent sample for every state-action pair, and the estimates are updated simultaneously across all state-action pairs. The second model concerns an asynchronous setting, where only a single sample trajectory following a behavior policy is accessible; at each time, the algorithm updates its estimate of a single state-action pair using one state transition from the trajectory.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found