Planning & Scheduling
Software Architecture for Next-Generation AI Planning Systems
Graef, Sebastian, Georgievski, Ilche
Artificial Intelligence (AI) planning is a flourishing research and development discipline that provides powerful tools for searching a course of action that achieves some user goal. While these planning tools show excellent performance on benchmark planning problems, they represent challenging software systems when it comes to their use and integration in real-world applications. In fact, even in-depth understanding of their internal mechanisms does not guarantee that one can successfully set up, use and manipulate existing planning tools. We contribute toward alleviating this situation by proposing a service-oriented planning architecture to be at the core of the ability to design, develop and use next-generation AI planning systems. We collect and classify common planning capabilities to form the building blocks of the planning architecture. We incorporate software design principles and patterns into the architecture to allow for usability, interoperability and reusability of the planning capabilities. Our prototype planning system demonstrates the potential of our approach for rapid prototyping and flexibility of system composition. Finally, we provide insight into the qualitative advantages of our approach when compared to a typical planning tool.
Improving Artificial Teachers by Considering How People Learn and Forget
Nioche, Aurélien, Murena, Pierre-Alexandre, de la Torre-Ortiz, Carlos, Oulasvirta, Antti
Applications for self-regulated teaching are very popular (e.g., with Duolingo estimates of 100M downloads from Google Play at the time of writing). One of the central challenges for research on intelligent user interfaces is to identify algorithmic principles that can pick the best interventions for reliably improving human learning toward stated objectives in light of realistically obtainable data on the user. The computational problem we study is how, when given some learning materials, we can organize them into lessons and reviews such that, over time, human learning is maximized with respect to a set learning objective. Predicting the effects of teaching interventions on human learning is challenging, however. Firstly, the state of user memory is both latent (that is, not directly observable) and non-stationary (that is, evolving over time, on account of such effects as loss of activation and interference), and an intervention that is ideal for one user may be a poor choice for another user -- there are large individual-to-individual differences in forgetting and recall.
iX-BSP: Incremental Belief Space Planning
Farhi, Elad I., Indelman, Vadim
Deciding what's next? is a fundamental problem in robotics and Artificial Intelligence. Under belief space planning (BSP), in a partially observable setting, it involves calculating the expected accumulated belief-dependent reward, where the expectation is with respect to all future measurements. Since solving this general un-approximated problem quickly becomes intractable, state of the art approaches turn to approximations while still calculating planning sessions from scratch. In this work we propose a novel paradigm, Incremental BSP (iX-BSP), based on the key insight that calculations across planning sessions are similar in nature and can be appropriately re-used. We calculate the expectation incrementally by utilizing Multiple Importance Sampling techniques for selective re-sampling and re-use of measurement from previous planning sessions. The formulation of our approach considers general distributions and accounts for data association aspects. We demonstrate how iX-BSP could benefit existing approximations of the general problem, introducing iML-BSP, which re-uses calculations across planning sessions under the common Maximum Likelihood assumption. We evaluate both methods and demonstrate a substantial reduction in computation time while statistically preserving accuracy. The evaluation includes both simulation and real-world experiments considering autonomous vision-based navigation and SLAM. As a further contribution, we introduce to iX-BSP the non-integral wildfire approximation, allowing one to trade accuracy for computational performance by averting from updating re-used beliefs when they are "close enough". We evaluate iX-BSP under wildfire demonstrating a substantial reduction in computation time while controlling the accuracy sacrifice. We also provide analytical and empirical bounds of the effect wildfire holds over the objective value.
Making the most of your day: online learning for optimal allocation of time
Boursier, Etienne, Garrec, Tristan, Perchet, Vianney, Scarsini, Marco
We study online learning for optimal allocation when the resource to be allocated is time. Examples of possible applications include a driver filling a day with rides, a landlord renting an estate, etc. Following our initial motivation, a driver receives ride proposals sequentially according to a Poisson process and can either accept or reject a proposed ride. If she accepts the proposal, she is busy for the duration of the ride and obtains a reward that depends on the ride duration. If she rejects it, she remains on hold until a new ride proposal arrives. We study the regret incurred by the driver first when she knows her reward function but does not know the distribution of the ride duration, and then when she does not know her reward function, either. Faster rates are finally obtained by adding structural assumptions on the distribution of rides or on the reward function. This natural setting bears similarities with contextual (one-armed) bandits, but with the crucial difference that the normalized reward associated to a context depends on the whole distribution of contexts.
Robust and Efficient Planning using Adaptive Entropy Tree Search
Kozakowski, Piotr, Pacek, Mikołaj, Miłoś, Piotr
In this paper, we present the Adaptive EntropyTree Search (ANTS) algorithm. ANTS builds on recent successes of maximum entropy planning while mitigating its arguably major drawback - sensitivity to the temperature setting. We endow ANTS with a mechanism, which adapts the temperature to match a given range of action selection entropy in the nodes of the planning tree. With this mechanism, the ANTS planner enjoys remarkable hyper-parameter robustness, achieves high scores on the Atari benchmark, and is a capable component of a planning-learning loop akin to AlphaZero. We believe that all these features make ANTS a compelling choice for a general planner for complex tasks.
Hedging of Financial Derivative Contracts via Monte Carlo Tree Search
The construction of approximate replication strategies for derivative contracts in incomplete markets is a key problem of financial engineering. Recently Reinforcement Learning algorithms for pricing and hedging under realistic market conditions have attracted significant interest. While financial research mostly focused on variations of Q-learning, in Artificial Intelligence Monte Carlo Tree Search is the recognized stateof-the-art method for various planning problems, such as the games of Hex, Chess, Go,... This article introduces Monte Carlo Tree Search as a method to solve the stochastic optimal control problem underlying the pricing and hedging of financial derivatives. As compared to Q-learning it combines reinforcement learning with tree search techniques. As a consequence Monte Carlo Tree Search has higher sample efficiency, is less prone to over-fitting to specific market models and generally learns stronger policies faster. In our experiments we find that Monte Carlo Tree Search, being the world-champion in games like Chess and Go, is easily capable of directly maximizing the utility of investor's terminal wealth without an intermediate mathematical theory.
Deep Reinforcement Agent for Scheduling in HPC
Fan, Yuping, Lan, Zhiling, Childers, Taylor, Rich, Paul, Allcock, William, Papka, Michael E.
Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems and the highly dynamic nature of application workloads have placed tremendous burden on manually designed and tuned scheduling heuristics. More aggressive optimization and automation are needed for cluster scheduling in HPC. In this work, we present an automated HPC scheduling agent named DRAS (Deep Reinforcement Agent for Scheduling) by leveraging deep reinforcement learning. DRAS is built on a novel, hierarchical neural network incorporating special HPC scheduling features such as resource reservation and backfilling. A unique training strategy is presented to enable DRAS to rapidly learn the target environment. Once being provided a specific scheduling objective given by system manager, DRAS automatically learns to improve its policy through interaction with the scheduling environment and dynamically adjusts its policy as workload changes. The experiments with different production workloads demonstrate that DRAS outperforms the existing heuristic and optimization approaches by up to 45%.
Regarding Goal Bounding and Jump Point Search
Hu, Yue | Harabor, Daniel (Monash University) | Qin, Long (National University of Defense Technology, China) | Yin, Quanjun (National University of Defense Technology, China)
Jump Point Search (JPS) is a well known symmetry-breaking algorithm that can substantially improve performance for grid-based optimal pathfinding. When the input grid is static further speedups can be obtained by combining JPS with goal bounding techniques such as Geometric Containers (instantiated as Bounding Boxes) and Compressed Path Databases. Two such methods, JPS+BB and Two-Oracle Path PlannING (Topping), are currently among the fastest known approaches for computing shortest paths on grids. The principal drawback for these algorithms is the overhead costs: each one requires an all-pairs precomputation step, the running time and subsequent storage costs of which can be prohibitive. In this work we consider an alternative approach where we precompute and store goal bounding data only for grid cells which are also jump points. Since the number of jump points is usually much smaller than the total number of grid cells, we can save up to orders of magnitude in preprocessing time and space. Considerable precomputation savings do not necessarily mean performance degradation. For a second contribution we show how canonical orderings, partial expansion strategies and enhanced intermediate pruning can be leveraged to improve online query performance despite a reduction in preprocessed data. The combination of faster preprocessing and stronger online reasoning leads to three new and highly performant algorithms: JPS+BB+ and Two-Oracle Pathfinding Search (TOPS) based on search, and Topping+ based on path extraction. We give a theoretical analysis showing that each method is complete and optimal. We also report convincing gains in a comprehensive empirical evaluation that includes almost all current and cutting-edge algorithms for grid-based pathfinding.
Risk-Averse Bayes-Adaptive Reinforcement Learning
Rigter, Marc, Lacerda, Bruno, Hawes, Nick
In this work, we address risk-averse Bayesadaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
Uncertainty Quantification and Propagation for Airline Disruption Management
Ogunsina, Kolawole, Papamichalis, Marios, DeLaurentis, Daniel
Disruption management during the airline scheduling process can be compartmentalized into proactive and reactive processes depending upon the time of schedule execution. The state of the art for decision-making in airline disruption management involves a heuristic human-centric approach that does not categorically study uncertainty in proactive and reactive processes for managing airline schedule disruptions. Hence, this paper introduces an uncertainty transfer function model (UTFM) framework that characterizes uncertainty for proactive airline disruption management before schedule execution, reactive airline disruption management during schedule execution, and proactive airline disruption management after schedule execution to enable the construction of quantitative tools that can allow an intelligent agent to rationalize complex interactions and procedures for robust airline disruption management. Specifically, we use historical scheduling and operations data from a major U.S. airline to facilitate the development and assessment of the UTFM, defined by hidden Markov models (a special class of probabilistic graphical models) that can efficiently perform pattern learning and inference on portions of large data sets.