Sufficient Markov Decision Processes with Alternating Deep Neural Networks

Wang, Longshaokan, Laber, Eric B., Witkiewitz, Katie

arXiv.org Machine Learning 

Markov decision processes (MDPs) (Bellman, 1957; Puterman, 2014) are the primary mathematical model for representing sequential decision problems with an indefinite time horizon (Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998; Bather, 2000; Si, 2004; Powell, 2007; Wiering and Van Otterlo, 2012). This class of models is quite general as almost any decision process can be made into an MDP by concatenating data over multiple decision points (see Section 2 for a precise statement); however, coercing a decision process into the MDP framework in this way can lead to high-dimensional system state information that is difficult to model effectively. One common approach to construct a low-dimensional decision process from a high-dimensional MDP is to create a finite discretization of the space of possible system states and to treat the resultant process as a finite MDP (Gordon, 1995; Murao and Kitamura, 1997; Sutton and Barto, 1998; Kamio et al., 2004; Whiteson et al., 2007). However, such discretization can result in a significant loss of information and can be difficult to apply when the system state information is continuous and high-dimensional. Another common approach to dimension reduction is to construct a low-dimensional summary of the underlying system states, e.g., by applying principal components analysis (Jolliffe, 1986), multidimensional scaling (Borg and Groenen, 1997), or by constructing a local linear embedding (Roweis and Saul, 2000).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found