Mildly Conservative Q-Learning for Offline Reinforcement Learning
–Neural Information Processing Systems
This paper explores mild but enough conservatism for offline learning while not harming generalization.
Neural Information Processing Systems
Nov-13-2025, 08:02:12 GMT
- Country:
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > China
- Genre:
- Research Report (0.46)
- Technology: