Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
Yue, Bo, Li, Jian, Liu, Guiliang
–arXiv.org Artificial Intelligence
To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner. Existing ICRL algorithms collect training samples from an interactive environment. However, the efficacy and efficiency of these sampling strategies remain unknown. To bridge this gap, we introduce a strategic exploration framework with guaranteed efficiency. Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints. Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy. Both algorithms are theoretically grounded with tractable sample complexity. We empirically demonstrate the performance of our algorithms under various environments. Constrained Reinforcement Learning (CRL) addresses sequential decision-making problems within safety constraints and achieves considerable success in various safety-critical applications (Gu et al., 2022). However, in many real-world environments, such as robot control (García & Shafie, 2020; Thomas et al., 2021) and autonomous driving (Krasowski et al., 2020), specifying the exact constraint that can consistently guarantee the safe control is challenging, which is further exacerbated when the ground-truth constraint is time-varying and context-dependent.
arXiv.org Artificial Intelligence
Sep-30-2024
- Country:
- Asia (0.27)
- North America > United States (0.46)
- Genre:
- Research Report > New Finding (0.47)
- Industry:
- Information Technology (0.34)
- Transportation (0.34)
- Technology: