Goto

Collaborating Authors

 Reinforcement Learning






216f44e2d28d4e175a194492bde9148f-Paper.pdf

Neural Information Processing Systems

We assume the environment modeled as discrete-time factored-action MDP (FA-MDP)M = hS,A,P,R,ฮณi where S is the set of states s, A is the set of vector-represented actionsa = (a1,...,am),P(s0|s,a) = Pr(st+1 = s0|st = s,at = a)isthe transition probability,R(s,a) R is the immediate reward for taking actiona in state s, and ฮณ [0,1) is the discount factor.


InformationDirectedRewardLearning forReinforcementLearning

Neural Information Processing Systems

From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returnswith as few expert queries as possible.