Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Zhou, Dongruo, He, Jiafan, Gu, Quanquan

arXiv.org Artificial Intelligence 

Designing efficient algorithms that learn and plan in sequential decision-making tasks with large state and action spaces has become the central goal of modern reinforcement learning (RL) in recent years. Due to numerous possible states and actions, traditional tabular reinforcement learning methods (Watkins, 1989; Jaksch et al., 2010; Azar et al., 2017) which directly access each stateaction pair are computationally intractable. A common method to design reinforcement learning algorithms for large-scale state and action spaces is to make use of feature mappings such as linear functions or neural networks to map states and actions to a low-dimensional space and solve the decision-making problem in the feature space. Despite the empirical success of feature mapping based reinforcement learning methods (Singh et al., 1995; Kwok and Fox, 2004; Bertsekas, 2018), the theoretical understanding and the fundamental limits of these methods remain largely understudied. In this paper, we aim to develop provable reinforcement learning algorithms with feature mapping for discounted Markov Decision Processes (MDPs). Discounted MDP is one of the most widely used models to formulate the modern reinforcement learning tasks such as Atari games (Mnih et al., 2015) and deep recommendation system (Zheng et al., 2018).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found