Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

Sep-30-2024–arXiv.org Artificial Intelligence

To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner. Existing ICRL algorithms collect training samples from an interactive environment. However, the efficacy and efficiency of these sampling strategies remain unknown. To bridge this gap, we introduce a strategic exploration framework with guaranteed efficiency. Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints. Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy. Both algorithms are theoretically grounded with tractable sample complexity. We empirically demonstrate the performance of our algorithms under various environments. Constrained Reinforcement Learning (CRL) addresses sequential decision-making problems within safety constraints and achieves considerable success in various safety-critical applications (Gu et al., 2022). However, in many real-world environments, such as robot control (García & Shafie, 2020; Thomas et al., 2021) and autonomous driving (Krasowski et al., 2020), specifying the exact constraint that can consistently guarantee the safe control is challenging, which is further exacerbated when the ground-truth constraint is time-varying and context-dependent.

constraint, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Sep-30-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.27)
- North America > United States (0.46)

Genre:
- Research Report > New Finding (0.47)

Industry:
- Information Technology (0.34)
- Transportation (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Robots (1.00)