Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping

Camacho, Alberto (University of Toronto) | Chen, Oscar (University of Cambridge) | Sanner, Scott (University of Toronto) | McIlraith, Sheila A. (University of Toronto)

AAAI Conferences 

We propose an approach to solving Markov Decision Processes with non-Markovian rewards specified in Linear Temporal Logic interpreted over finite traces (LTL-f). Our approach integrates automata representations of LTL-f formulae into compiled MDPs that can be solved by off-the-shelf MDP planners, exploiting reward shaping to help guide search. Experiments with state-of-the-art UCT-based MDP planner PROST show automata-based reward shaping to be an effective method to guide search, producing solutions of superior quality, while maintaining policy optimality guarantees.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found