A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Oct-10-2024, 19:07:55 GMT–Neural Information Processing Systems

This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.

artificial intelligence, machine learning, maximum-entropy approach, (3 more...)

Neural Information Processing Systems

Oct-10-2024, 19:07:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.92)