Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithms Miao Lu1 Han Zhong 2 Tong Zhang 3 Jose Blanchet
–Neural Information Processing Systems
Distributionally robust reinforcement learning (DRRL), often framed as a robust Markov decision process (RMDP), seeks to find a robust policy that achieves good performance under the worst-case scenario among all environments within a prespecified uncertainty set centered around the training environment. Unlike previous work, which relies on a generative model or a pre-collected offline dataset enjoying good coverage of the deployment environment, we tackle robust RL via interactive data collection, where the learner interacts with the training environment only and refines the policy through trial and error. In this robust RL paradigm, two main challenges emerge: managing the distributional robustness while striking a balance between exploration and exploitation during data collection. Initially, we establish that sample-efficient learning without additional assumptions is unattainable owing to the curse of support shift; i.e., the potential disjointedness of the distributional supports between training and testing environments. To circumvent such a hardness result, we introduce the vanishing minimal value assumption to RMDPs with a total-variation distance robust set, postulating that the minimal value of the optimal robust value function is zero. Such an assumption effectively eliminates the support shift issue for RMDPs with a TV distance robust set, and we present an algorithm with a provable sample complexity guarantee. Our work makes the initial step to uncovering the inherent difficulty of robust RL via interactive data collection and sufficient conditions for sample-efficient algorithms with sharp sample complexity.
Neural Information Processing Systems
May-28-2025, 14:13:12 GMT
- Country:
- North America > United States > Illinois (0.14)
- Genre:
- Research Report > Experimental Study (0.92)
- Workflow (0.66)