A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Dec-24-2025, 07:27:31 GMT–Neural Information Processing Systems

This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e.

maximum-entropy approach, name change, off-policy evaluation, (5 more...)

Neural Information Processing Systems

Dec-24-2025, 07:27:31 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.88)