AMaximum-Entropy Approachto Off-Policy Evaluationin Average-Reward MDPs

Neural Information Processing Systems 

Howevb isnon-zero Similarlyr(s, ) are features: r(s, a)= (s, a)>w. Assumption A3(Featureexcitation)Forapolicy withstationarydistributiond (s, a), define =E(s,a) d [ (s, a) (s, a)>].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found