A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
–Neural Information Processing Systems
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e.
Neural Information Processing Systems
Dec-24-2025, 07:27:31 GMT
- Technology: