A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
–Neural Information Processing Systems
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e.
Neural Information Processing Systems
Jan-26-2025, 15:04:13 GMT