Policy Synthesis and Reinforcement Learning for Discounted LTL
Alur, Rajeev, Bastani, Osbert, Jothimurugan, Kishor, Perez, Mateo, Somenzi, Fabio, Trivedi, Ashutosh
–arXiv.org Artificial Intelligence
The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.
arXiv.org Artificial Intelligence
May-29-2023
- Country:
- North America
- United States
- Pennsylvania (0.04)
- Colorado > Boulder County
- Boulder (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- United Kingdom
- North Sea > Central North Sea (0.04)
- England > Greater London
- London (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Porto
- Porto (0.04)
- Netherlands > North Brabant
- Eindhoven (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom
- Asia
- Middle East > Republic of Türkiye
- Aksaray Province > Aksaray (0.04)
- Japan > Honshū
- Kantō > Chiba Prefecture > Chiba (0.04)
- Middle East > Republic of Türkiye
- North America
- Genre:
- Research Report (0.40)