A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
Perez, Mateo, Somenzi, Fabio, Trivedi, Ashutosh
–arXiv.org Artificial Intelligence
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.
arXiv.org Artificial Intelligence
Jan-15-2024
- Country:
- Europe
- Czechia > Prague (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom
- England > Greater London
- London (0.04)
- North Sea > Central North Sea (0.04)
- England > Greater London
- North America > United States
- Colorado > Boulder County > Boulder (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Europe
- Genre:
- Research Report (0.40)