marginal utility objective
- North America > Canada > Alberta (0.15)
- North America > Canada > Quebec (0.04)
- North America > Canada > Alberta (0.15)
- North America > Canada > Quebec (0.04)
14da15db887a4b50efe5c1bc66537089-AuthorFeedback.pdf
We would like to thank the reviewers for their insightful comments. Addressing the common point of limiting our experimentation to a single-decision setting, our intent was to focus our analysis only on the effects of candidate generation. By removing the influences of other factors on the performance of search, for instance, rollout policies and state value function approximations, we can focus the evaluation. We are aware that the sequential-decision setting requires extra reasoning. We would argue, though, that the other components of learning algorithms for search try to ameliorate the amount of reasoning needed --- indeed, learning a perfect value function approximation would essentially reduce a sequential-decision problem to a single-decision problem. However, we do plan on examining our ideas in a full MCTS setting, which we think is a problem deserving its own investigation.
Marginal Utility for Planning in Continuous or Large Discrete Action Spaces
Ahmad, Zaheen Farraz, Lelis, Levi H. S., Bowling, Michael
Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.
- North America > Canada > Alberta (0.14)
- North America > Canada > Quebec (0.04)