216f44e2d28d4e175a194492bde9148f-Paper.pdf
–Neural Information Processing Systems
We assume the environment modeled as discrete-time factored-action MDP (FA-MDP)M = hS,A,P,R,γi where S is the set of states s, A is the set of vector-represented actionsa = (a1,...,am),P(s0|s,a) = Pr(st+1 = s0|st = s,at = a)isthe transition probability,R(s,a) R is the immediate reward for taking actiona in state s, and γ [0,1) is the discount factor.
Neural Information Processing Systems
Feb-7-2026, 19:03:33 GMT
- Country:
- Technology: