On Gap-dependent Bounds for Offline Reinforcement Learning

Neural Information Processing Systems 

Instead, we have access to a dataset generated from some past suboptimal policies.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found