When to Trust Y our Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Neural Information Processing Systems 

H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q-function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found