Bayesian Real-time Dynamic Programming

Sanner, Scott (National ICT Australia) | Goetschalckx, Robby (Catholic University of Leuven) | Driessens, Kurt (Catholic University of Leuven) | Shani, Guy (Microsoft Research)

Jun-23-2009–AAAI Conferences

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees without visiting the entire state space. Building on RTDP, recent work has sought to improve its efficiency through various optimizations, including maintaining upper and lower bounds to both govern trial termination and prioritize state exploration. In this work, we take a Bayesian perspective on these upper and lower bounds and use a value of perfect information (VPI) analysis to govern trial termination and exploration in a novel algorithm we call VPI-RTDP. VPI-RTDP leads to an improvement over state-of-the-art RTDP methods, empirically yielding up to a three-fold reduction in the amount of time and number of visited states required to achieve comparable policy performance.

algorithm, dynamic programming, rtdp, (16 more...)

AAAI Conferences

Jun-23-2009

Conferences PDF

Add feedback

Country:
- Oceania > Australia
  - Australian Capital Territory > Canberra (0.04)
- North America
  - Mexico (0.04)
  - United States > Washington
    - King County > Redmond (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Italy > Trentino-Alto Adige/Südtirol
    - Trentino Province > Trento (0.04)
  - Germany > North Rhine-Westphalia
    - Cologne Region > Bonn (0.04)
  - Belgium > Flanders
    - Flemish Brabant > Leuven (0.04)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.83)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found