Planning & Scheduling
Impact of Modeling Languages on the Theory and Practice in Planning Research
Rintanen, Jussi ( Aalto University )
We propose revisions to the research agenda in Automated Planning. The proposal is based on a review of the role of the Planning Domain Definition Language (PDDL) in the activities of the AI planning community and the impact of PDDL on parts of its research agenda. We specifically show how specific properties of PDDL have impacted research on planning, by putting emphasis on certain research topics and complicating others. We argue that the development of more advanced modeling languages would be โ analogously to the impact PDDL has had โ a low overhead and smooth route for the ICAPS community shift its research focus to increasingly promising and relevant research topics.
Chance-Constrained Scheduling via Conflict-Directed Risk Allocation
Wang, Andrew J. (Massachusetts Institute of Technology) | Williams, Brian C. (Massachusetts Institute of Technology)
Temporal uncertainty in large-scale logistics forces one to trade off between lost efficiency through built-in slack and costly replanning when deadlines are missed. Due to the difficulty of reasoning about such likelihoods and consequences, a computational framework is needed to quantify and bound the risk of violating scheduling requirements. This work addresses the chance-constrained scheduling problem, where actions' durations are modeled probabilistically. Our solution method uses conflict-directed risk allocation to efficiently compute a scheduling policy. The key insight, compared to previous work in probabilistic scheduling, is to decouple the reasoning about temporal and risk constraints. This decomposes the problem into a separate master and subproblem, which can be iteratively solved much quicker. Through a set of simulated car-sharing scenarios, it is empirically shown that conflict-directed risk allocation computes solutions nearly an order of magnitude faster than prior art, which considers all constraints in a single lump-sum optimization.
On Interruptible Pure Exploration in Multi-Armed Bandits
Shleyfman, Alexander (Technion โ Israel Institute of Technology) | Komenda, Antonรญn (Czech Technical University in Prague) | Domshlak, Carmel (Technion โ Israel Institute of Technology)
Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate performance improvement over time. Our experimental evaluation demonstrates that the corresponding instances of DB favorably compete both with the currently popular strategies UCB1 and Epsilon-Greedy, as well as with the conservative uniform sampling.
Multi-Objective MDPs with Conditional Lexicographic Reward Preferences
Wray, Kyle Hollins (University of Massachusetts, Amherst) | Zilberstein, Shlomo (University of Massachusetts, Amherst) | Mouaddib, Abdel-Illah (University of Caen)
Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semi-autonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Finally, we demonstrate how GPU-based optimization can improve the scalability of LVI and other value iteration algorithms for MDPs.
Preference Planning for Markov Decision Processes
Li, Meilun (Beihang University) | She, Zhikun (Beihang University) | Turrini, Andrea (Institute of Software, Chinese Academy of Sciences) | Zhang, Lijun (Institute of Software, Chinese Academy of Sciences)
The classical planning problem can be enriched with quantitative and qualitative user-defined preferences on how the system behaves on achieving the goal. In this paper, we propose the probabilistic preference planning problem for Markov decision processes, where the preferences are based on an enriched probabilistic LTL-style logic. We develop P4Solver, an SMT-based planner computing the preferred plan by reducing the problem to quadratic programming problem, which can be solved using SMT solvers such as Z3. We illustrate the framework by applying our approach on two selected case studies.
Agnostic System Identification for Monte Carlo Planning
Talvitie, Erik (Franklin and Marshall College)
While model-based reinforcement learning is often studied under the assumption that a fully accurate model is contained within the model class, this is rarely true in practice. When the model class may be fundamentally limited, it can be difficult to obtain theoretical guarantees. Under some conditions the DAgger algorithm promises a policy nearly as good as the plan obtained from the most accurate model in the class, but only if the planning algorithm is near-optimal, which is also rarely the case in complex problems. This paper explores the interaction between DAgger and Monte Carlo planning, specifically showing that DAgger may perform poorly when coupled with a sub-optimal planner. A novel variation of DAgger specifically for use with Monte Carlo planning is derived and is shown to behave far better in some cases where DAgger fails.
Crowdsourced Action-Model Acquisition for Planning
Zhuo, Hankz Hankui (Sun Yat-sen University)
AI planning techniques often require a given set of action models provided as input. Creating action models is, however, a difficult task that costs much manual effort. The problem of action-model acquisition has drawn a lot of interest from researchers in the past. Despite the success of the previous systems, they are all based on the assumption that there are enough training examples for learning high-quality action models. In many real-world applications, e.g., military operation, collecting a large amount of training examples is often both difficult and costly. Instead of collecting training examples, we assume there are abundant annotators, i.e., the crowd, available to provide information learning action models. Specifically, we first build a set of soft constraints based on the labels (true or false) given by the crowd or annotators. We then builds a set of soft constraints based on the input plan traces. After that we put all the constraints together and solve them using a weighted MAX-SAT solver, and convert the solution of the solver to action models. We finally exhibit that our approach is effective in the experiment.
Tractable Cost-Optimal Planning over Restricted Polytree Causal Graphs
Aghighi, Meysam (Linkรถping University) | Jonsson, Peter (Linkรถping University) | Stรฅhlberg, Simon (Linkรถping University)
Causal graphs are widely used to analyze the complexity of planning problems. Many tractable classes have been identified with their aid and state-of-the-art heuristics have been derived by exploiting such classes. In particular, Katz and Keyder have studied causal graphs that are hourglasses (which is a generalization of forks and inverted-forks) and shown that the corresponding cost-optimal planning problem is tractable under certain restrictions. We continue this work by studying polytrees (which is a generalization of hourglasses) under similar restrictions. We prove tractability of cost-optimal planning by providing an algorithm based on a novel notion of variable isomorphism. Our algorithm also sheds light on the k-consistency procedure for identifying unsolvable planning instances. We speculate that this may, at least partially, explain why merge-and-shrink heuristics have been successful for recognizing unsolvable instances.
HVAC-Aware Occupancy Scheduling
Lim, BoonPing (NICTA and Australian National University) | Briel, Menkes van den (NICTA and Australian National University) | Thiebaux, Sylvie (NICTA and Australian National University) | Backhaus, Scott (Los Alamos National Laboratory) | Bent, Russell (Los Alamos National Laboratory)
Energy consumption in commercial and educational buildings is impacted by group activities such as meetings, workshops, classes and exams, and can be reduced by scheduling these activities to take place at times and locations that are favorable from an energy standpoint. This paper improves on the effectiveness of energy-aware room-booking and occupancy scheduling approaches, by allowing the scheduling decisions to rely on an explicit model of the building's occupancy-based HVAC control. The core component of our approach is a mixed-integer linear programming (MILP) model which optimally solves the joint occupancy scheduling and occupancy-based HVAC control problem. To scale up to realistic problem sizes, we embed this MILP model into a large neighbourhood search (LNS). We obtain substantial energy reduction in comparison with occupancy-based HVAC control using arbitrary schedules or using schedules obtained by existing heuristic energy-aware scheduling approaches.
Interactive Narrative Planning in The Best Laid Plans
Ware, Stephen G. (University of New Orleans) | Young, R. Michael (North Carolina State University) | Stith, Christian (Clemson University) | Wright, Phillip (North Carolina State University)
The Best Laid Plans is an interactive narrative video game that uses cognitive-inspired fast planning techniques to generate stories with conflict during play. Players alternate between acting out a plan and seeing that plan thwarted by non-player characters. The Glaive narrative planner combines causal-link-based computational models of narrative with the speed of fast heuristic search techniques to adapt the story each time the player attempts a new plan.