ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints
–Neural Information Processing Systems
Offline reinforcement learning (RL) learns policies entirely from static datasets. Practical applications of offline RL will inevitably require learning from datasets where the variability of demonstrated behaviors changes non-uniformly across the state space. For example, at a red light, nearly all human drivers behave similarly by stopping, but when merging onto a highway, some drivers merge quickly, efficiently, and safely, while many hesitate or merge dangerously.
Neural Information Processing Systems
Dec-25-2025, 02:01:35 GMT
- Technology: