AUnifyingViewofOptimisminEpisodic ReinforcementLearning

Neural Information Processing Systems 

A finite episodic Markov decision process (MDP) is a tuple (S,A,H,α,P,r) where S and A are the finite sets of states and actions withS = |S|,A = |A|, H is the (fixed) episode length andα is the initial state distribution.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found