Bi-Level Offline Policy Optimization with Limited Exploration

Neural Information Processing Systems 

Subsequently, at the upper level, the policy aims to maximize a conservative value estimate from the confidence set formed at the lower level.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found