216f44e2d28d4e175a194492bde9148f-Paper.pdf

Neural Information Processing Systems 

We assume the environment modeled as discrete-time factored-action MDP (FA-MDP)M = hS,A,P,R,γi where S is the set of states s, A is the set of vector-represented actionsa = (a1,...,am),P(s0|s,a) = Pr(st+1 = s0|st = s,at = a)isthe transition probability,R(s,a) R is the immediate reward for taking actiona in state s, and γ [0,1) is the discount factor.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found