Introduction In an online planning problem an agent must select and execute a sequence of actions to maximise some objective criterion, where the time and resources used in action selection count in assessing overall solution quality. Such problems are representative of a broad class of bounded rational (Russell & Wefald 1989) real world planning problems, such as control of autonomous vehicles and control of chemical plants, where an agent must perform a task as well as possible within its limited physical and computational abilities. Thus, to maximise overall performance the agent must tradeoff the cost of resources used in planning against the possible benefits of improved action selections. Incremental search (also called agent-centred search (Koenig 2001)) is one way of performing this tradeoff, where actions are selected based upon an heuristically guided partial search through the space of possible future action sequences. As partial search provides incomplete information about the best action, action selection becomes a problem of decision-making under uncertainty.
Lemons, Seth (University of New Hampshire) | Benton, J. (University of Arizona) | Ruml, Wheeler (University of New Hampshire) | Do, Minh (Palo Alto Research Center) | Yoon, Sungwook (Palo Alto Research Center)
This paper presents an approach to integrating planning and execution in time-sensitive environments. We present a simple setting in which to consider the issue, that we call continual on-line planning. New goals arrive stochastically during execution, the agent issues actions for execution one at a time, and the environment is otherwise deterministic. We take the objective to be a form of time-dependent partial satisfaction planning reminiscent of discounted MDPs: goals offer reward that decays over time, actions incur fixed costs, and the agent attempts to maximize net utility. We argue that this setting highlights the central challenge of time-aware planning while excluding the complexity of non-deterministic actions. Our approach to this problem is based on real-time heuristic search. We view the two central issues as the decision of which partial plans to elaborate during search and the decision of when to issue an action for execution. We propose an extension of Russell and Wefald's decision-theoretic A* algorithm that can cope with our inadmissible heuristic. Our algorithm, DTOCS, handles the complexities of the on-line setting by balancing deliberative planning and real-time response.
Search methods based on Monte-Carlo simulation have recently led to breakthrough performance improvements in difficult game-playing domains such as Go and General Game Playing. Monte-Carlo Random Walk (MRW) planning applies Monte-Carlo ideas to deterministic classical planning. In the forward chaining planner Arvand, Monte-Carlo random walks are used to explore the local neighborhood of a search state for action selection. In contrast to the stochastic local search approach used in the recent planner Identidem, random walks yield a larger and unbiased sample of the search neighborhood, and require state evaluations only at the endpoints of each walk. On IPC-4 competition problems, the performance of Arvand is competitive with state of the art systems.
Relaxed plans are used in the heuristic search planner FF for computing a numerical heuristic and extracting helpful actions. We present a novel way for extracting information from the relaxed plan and for dealing with helpful actions, by considering the high quality of the relaxed plans in numerous domains. For each evaluated state, we employ actions from these plans in order to find the beginning of a valid plan that can lead to a reachable state. We use this lookahead strategy in a complete best-first search algorithm, modified in order to take into account helpful actions. In numerous planning domains, the performance of heuristic search planning and the size of the problems that can be handled have been drastically improved.
We present an extension of the planning framework based on action graphs and local search to deal with PDDL2.1 temporal problems requiring concurrency, while previously the approach could only solve problems admitting a sequential solution. The paper introduces a revised plan representation supporting concurrency and some new search techniques using it, which are implemented in a new version of the LPG planner. An experimental analysis indicates that the proposed approach is suitable to temporal planning with requiring concurrency and is competitive with state-of-the-art planners.