Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

May-21-2012–arXiv.org Machine Learning

Large Markov decision processes (MDPs) are common in reinforcement learning and operations research and are often solved by approximate dynamic programming (ADP). Many ADP algorithms have been developed and studied, often with impressive empirical performance. However, because many ADP methods must be carefully tuned to work well and offer insufficient theoretical guarantees, it is important to develop new methods that have both good theoretical guarantees and empirical performance. Approximate linear programming (ALP)--an ADP method--has been developed with the goal of achieving convergence and good theoretical guarantees (de Farias & van Roy, 2003). Approximate bilinear programming (ABP) improves on the theoretical properties of ALP at the cost of additional computational complexity (Petrik & Zilberstein, 2009, 2011).

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

May-21-2012

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.28)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found