Review for NeurIPS paper: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Jan-26-2025, 15:04:13 GMT–Neural Information Processing Systems

Correctness: The main technical content seems to be correct. I have the following questions though: When using the linear assumption for the reward and the dynamics, the feature selection/setting is crutial. To relax the linear assumption, it is also mentioned, features can be pre-trained. What would be the recommended way to pre-learn it? For possible violation of the assumptions, how it would affect the results in practice?

average-reward mdp, maximum-entropy approach, off-policy evaluation, (3 more...)

Neural Information Processing Systems

Jan-26-2025, 15:04:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)