Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping

Camacho, Alberto (University of Toronto) | Chen, Oscar (University of Cambridge) | Sanner, Scott (University of Toronto) | McIlraith, Sheila A. (University of Toronto)

Jun-13-2017–AAAI Conferences

We propose an approach to solving Markov Decision Processes with non-Markovian rewards specified in Linear Temporal Logic interpreted over finite traces (LTL-f). Our approach integrates automata representations of LTL-f formulae into compiled MDPs that can be solved by off-the-shelf MDP planners, exploiting reward shaping to help guide search. Experiments with state-of-the-art UCT-based MDP planner PROST show automata-based reward shaping to be an effective method to guide search, producing solutions of superior quality, while maintaining policy optimality guarantees.

nmrdp, non-markovian reward expressed, prost uct null, (10 more...)

AAAI Conferences

Jun-13-2017

Conferences PDF

Add feedback

Country:
- North America
  - United States (0.05)
  - Canada > Ontario
    - Toronto (0.18)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.05)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Planning & Scheduling (0.47)
  - Machine Learning > Learning Graphical Models (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found