Doubly Mild Generalization for Offline Reinforcement Learning Yixiu Mao 1, Qi Wang 1, Y un Qu
–Neural Information Processing Systems
Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions.
Neural Information Processing Systems
Mar-21-2025, 03:13:09 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Education > Educational Setting > Online (0.46)
- Technology: