Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
–Neural Information Processing Systems
Typically, deep RL algorithms learn a policy in an online trial-and-error fashion using millions to billions of data.
Neural Information Processing Systems
Feb-17-2026, 07:16:53 GMT
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California
- San Mateo County > Menlo Park (0.04)
- Santa Clara County > Palo Alto (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Asia