identity action
Planning Time to Think: Metareasoning for On-Line Planning with Durative Actions
Cserna, Bence (University of New Hampshire) | Ruml, Wheeler (University of New Hampshire) | Frank, Jeremy (NASA Ames Research Center)
When minimizing makespan during off-line planning, the fastest action sequence to reach a particular state is, by definition, preferred. When trying to reach a goal quickly in on-line planning, previous work has inherited that assumption: the faster of two paths that both reach the same state is usually considered to dominate the slower one. In this short paper, we point out that, when planning happens concurrently with execution, selecting a slower action can allow additional time for planning, leading to better plans. We present Slo'RTS, a metareasoning planning algorithm that estimates whether the expected improvement in future decision-making from this increased planning time is enough to make up for the increased duration of the selected action. Using simple benchmarks, we show that Slo'RTS can yield shorter time-to-goal than a conventional planner. This generalizes previous work on metareasoning in on-line planning and highlights the inherent uncertainty present in an on-line setting.
Metareasoning in Real-Time Heuristic Search
O' (University of New Hampshire) | Ceallaigh, Dylan (University of New Hampshire) | Ruml, Wheeler
Real-time heuristic search addresses the setting in which planning andacting can proceed concurrently. We explore the use of metareasoning at two decision points within a real-time heuristic search. First, if the domain has an `identity action' that allows the agent to remain in the same state and deliberate further, when should this action be taken? Second, given a partial plan that extends to the lookahead frontier, to how many actions should the agent commit? We show that considering these decisions carefully can reduce the agent's total time taken to arrive at a goal in several benchmark domains, relative to the current state-of-the-art. The resulting algorithm can dynamically adjust the way it interleaves planning and acting, between greedy hill-climbing and A*, depending on the problem instance.