On learning history based policies for controlling Markov decision processes

Patil, Gandharv, Mahajan, Aditya, Precup, Doina

arXiv.org Artificial Intelligence 

State abstraction and function approximation are vital components used by reinforcement learning (RL) algorithms to efficiently solve complex control problems when exact computations are intractable due to large state and action spaces. Over the past few decades, state abstraction in RL has evolved from the use of pre-determined and problemspecific features [18, 74, 9, 69, 64, 42, 58] to the use of adaptive basis functions learnt by solving an isolated regression problem [53, 47, 39, 56], and more recently to the use of neural network-based Deep-RL algorithms that embed state abstraction in successive layers of a neural network [5, 7]. Feature abstraction results in information loss, and the resulting state features might not satisfy the controlled Markov property, even if this property is satisfied by the corresponding state [70]. One approach to counteract the loss of the Markov property is to generate the features using the history of state-action pairs, and empirical evidence suggests that using such history-based features are beneficial in practice [52]. However, a theoretical characterisation of history-based Deep-RL algorithms for fully observed Markov Decision Processes (MDPs) is largely absent form the literature.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found