When to Trust Y our Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
–Neural Information Processing Systems
H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q-function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset.
Neural Information Processing Systems
Aug-19-2025, 17:14:27 GMT
- Country:
- Asia
- North America > United States (0.05)
- Genre:
- Instructional Material > Online (0.41)
- Research Report (0.46)
- Technology: