Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Neural Information Processing Systems 

Typically, deep RL algorithms learn a policy in an online trial-and-error fashion using millions to billions of data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found