Multi-Objective POMDPs with Lexicographic Reward Preferences

Wray, Kyle Hollins (University of Massachusetts, Amherst) | Zilberstein, Shlomo (University of Massachusetts, Amherst)

Jul-15-2015–AAAI Conferences

We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs--Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs.

lpomdp, slack, value function, (13 more...)

AAAI Conferences

Jul-15-2015

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Hampshire County
    - Amherst (0.14)
  - Illinois > Cook County
    - Chicago (0.04)
- Europe > Iceland
  - Capital Region > Reykjavik (0.04)

Industry:
- Health & Medicine (0.94)
- Transportation > Ground
  - Road (0.35)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found