Better Be Lucky than Good: Exceeding Expectations in MDP Evaluation
Keller, Thomas (University of Freiburg) | Geißer, Florian (University of Freiburg)
Two other algorithms require the knowledge Markov Decision Processes (MDPs) offer a general framework of the optimal policy and its expected reward. We show to describe probabilistic planning problems of varying that the expected reward of the optimal policy is a lower complexity. The development of algorithms that act successfully bound for the expected performance of both strategies. in MDPs is important to many AI applications. Our final algorithm switches between the application of Since it is often impossible or intractable to evaluate MDP the optimal policy and the policy with the highest possible algorithms based on a theoretical analysis alone, the International outcome, which can be computed without notable overhead Probabilistic Planning Competition (IPPC) was introduced in the Trial-based Heuristic Tree Search (THTS) framework to allow a comparison based on experimental evaluation. (Keller and Helmert 2013). We show theoretically and empirically The idea is to approximate the quality of an MDP that all algorithms outperform the naïve base approach solver by performing a sequence of runs on a problem instance, that ignores the potential of optimizing evaluation and by using the average of the obtained results as runs in hindsight, and that it pays off to take suboptimal base an approximation of the expected reward.
Mar-6-2015
- Country:
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- North America > United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- Europe
- Switzerland > Basel-City
- Basel (0.04)
- Germany > Baden-Württemberg
- Freiburg (0.04)
- Switzerland > Basel-City
- Oceania > Australia
- Genre:
- Research Report (0.46)