Doubly Mild Generalization for Offline Reinforcement Learning Yixiu Mao 1, Qi Wang 1, Y un Qu
–Neural Information Processing Systems
Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions.
Neural Information Processing Systems
Feb-14-2026, 12:29:01 GMT
- Country:
- Asia > China (0.04)
- Europe
- Czechia > Prague (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Education > Educational Setting > Online (0.46)
- Technology: