Goto

Collaborating Authors

 Reinforcement Learning










When to Trust Y our Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Neural Information Processing Systems

H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q-function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset.