Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies

Regan, Kevin (University of Toronto) | Boutilier, Craig (University of Toronto)

Jul-15-2010–AAAI Conferences

The precise specification of reward functions for Markov decision processes (MDPs) is often extremely difficult, motivating research into both reward elicitation and the robust solution of MDPs with imprecisely specified reward (IRMDPs). We develop new techniques for the robust optimization of IRMDPs, using the minimax regret decision criterion, that exploit the set of nondominated policies, i.e., policies that are optimal for some instantiation of the imprecise reward function. Drawing parallels to POMDP value functions, we devise a Witness-style algorithm for identifying nondominated policies. We also examine several new algorithms for computing minimax regret using the nondominated set, and examine both practically and theoretically the impact of approximating this set. Our results suggest that a small subset of the nondominated set can greatly speed up computation, yet yield very tight approximations to minimax regret.

artificial intelligence, machine learning, nondominated policy, (17 more...)

AAAI Conferences

Jul-15-2010

Conferences PDF

Add feedback

Country:
- North America
  - United States > New York (0.04)
  - Canada
    - Ontario > Toronto (0.29)
    - British Columbia > East Kootenay Region
      - Fernie (0.04)

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found