TowardsHyperparameter-freePolicySelection forOfflineReinforcementLearning
–Neural Information Processing Systems
Existing approaches based on off-policyevaluation (OPE) oftenrequireadditional function approximation and hence hyperparameters, creating a chicken-and-egg situation.
Neural Information Processing Systems
Feb-9-2026, 05:29:20 GMT