43207fd5e34f87c48d584fc5c11befb8-Supplemental.pdf

Neural Information Processing Systems 

Is Plug-in Solver Sample Efficient for Feature-based Reinfocement Learning? Thus, we will use Φ and φ to represent Λ and λ in Appendix C and Appendix D. We use P We use P (s, a) to denote the row vector of P that corresponds to (s, a). A detailed description is provided in [1]. We use 1 to denote a column vector with all components to be 1. We use [H] to denote {0, 1,, H 1}. Finite Horizon Markov Decision Process A Finite Horizon Markov decision process (FHMDP) is described by the tuple M = (S, A, P, r, H), which differs from DMDP only in that the discount factor γ is replaced by the horizon H. It is a generalized version of DMDP which includes two players competing with each other.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found