Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
–Neural Information Processing Systems
In offline RL, a critical challenge is distribution shift (also called "extrapolation error" in literature).
Neural Information Processing Systems
Feb-8-2026, 00:47:52 GMT
- Country:
- Asia > China
- Guangdong Province > Guangzhou (0.04)
- Europe
- Hungary (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Illinois > Cook County > Chicago (0.04)
- Asia > China
- Technology: