A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

Perez, Mateo, Somenzi, Fabio, Trivedi, Ashutosh

Jan-15-2024–arXiv.org Artificial Intelligence

Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

algorithm, probability, state-action pair, (14 more...)

arXiv.org Artificial Intelligence

Jan-15-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - Colorado > Boulder County > Boulder (0.04)
- Europe
  - Czechia > Prague (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.89)
  - Computational Learning Theory (0.61)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.37)