AMaximum-Entropy Approachto Off-Policy Evaluationin Average-Reward MDPs
–Neural Information Processing Systems
Howevb isnon-zero Similarlyr(s, ) are features: r(s, a)= (s, a)>w. Assumption A3(Featureexcitation)Forapolicy withstationarydistributiond (s, a), define =E(s,a) d [ (s, a) (s, a)>].
artificial intelligence, international conferenceon machine learning, machine learning, (10 more...)
Neural Information Processing Systems
Feb-19-2026, 04:31:28 GMT
- Country:
- Technology: