Planning & Scheduling
Adaptive Informative Path Planning with Multimodal Sensing
Choudhury, Shushman, Gruver, Nate, Kochenderfer, Mykel J.
Adaptive Informative Path Planning (AIPP) problems model an agent tasked with obtaining information subject to resource constraints in unknown, partially observable environments. Existing work on AIPP has focused on representing observations about the world as a result of agent movement. We formulate the more general setting where the agent may choose between different sensors at the cost of some energy, in addition to traversing the environment to gather information. We call this problem AIPPMS (MS for Multimodal Sensing). AIPPMS requires reasoning jointly about the effects of sensing and movement in terms of both energy expended and information gained. We frame AIPPMS as a Partially Observable Markov Decision Process (POMDP) and solve it with online planning. Our approach is based on the Partially Observable Monte Carlo Planning framework with modifications to ensure constraint feasibility and a heuristic rollout policy tailored for AIPPMS. We evaluate our method on two domains: a simulated search-and-rescue scenario and a challenging extension to the classic RockSample problem. We find that our approach outperforms a classic AIPP algorithm that is modified for AIPPMS, as well as online planning using a random rollout policy.
Maximum Entropy Monte-Carlo Planning
Xiao, Chenjun, Huang, Ruitong, Mei, Jincheng, Schuurmans, Dale, Müller, Martin
We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation. To establish the effectiveness of this approach, we first investigate the single-step decision problem, stochastic softmax bandits, and show that softmax values can be estimated at an optimal convergence rate in terms of mean squared error. We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS). We prove that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT.
Solving Delete Free Planning with Relaxed Decision Diagram Based Heuristics
Castro, Margarita Paz (University of Toronto) | Piacentini, Chiara (Autodesk Research) | Cire, Andre Augusto (Dept. of Management, University of Toronto Scarborough and Rotman School of Management) | Beck, J. Christopher (Department of Mechanical and Industrial Engineering, University of Toronto)
We investigate the use of relaxed decision diagrams (DDs) for computing admissible heuristics for the cost-optimal delete-free planning (DFP) problem. Our main contributions are the introduction of two novel DD encodings for a DFP task: a multivalued decision diagram that includes the sequencing aspect of the problem and a binary decision diagram representation of its sequential relaxation. We present construction algorithms for each DD that leverage these different perspectives of the DFP task and provide theoretical and empirical analyses of the associated heuristics. We further show that relaxed DDs can be used beyond heuristic computation to extract delete-free plans, find action landmarks, and identify redundant actions. Our empirical analysis shows that while DD-based heuristics trail the state of the art, even small relaxed DDs are competitive with the linear programming heuristic for the DFP task, thus, revealing novel ways of designing admissible heuristics.
Decentralized MCTS via Learned Teammate Models
Czechowski, Aleksander, Oliehoek, Frans
A key difficulty of cooperative decentralized planning lies in making accurate predictions about the decisions of other agents. In this paper we present a policy improvement operator for learning to plan in iterated cooperative multi-agent scenarios. At each application of our method, a selected agent learns an approximation of policies of its teammates from data from past simulations. Under the assumption of ideal function approximation, successive iterations of our algorithm are guaranteed to improve the policies, and eventually lead to convergence to a Nash equilibrium in a coordinate ascent manner. We combine the policy improvement operator with the decentralized Monte Carlo Tree Search planning method and demonstrate the application of the algorithm on several scenarios in the spatial task allocation problem introduced in (Claes et al., 2015). We show that deep learning and convolutional neural networks can be efficiently employed to produce policy approximators which exploit the spatial features of the problem, and that the proposed algorithm improves over the baseline planning performance for particularly challenging domain configurations.
Train Scheduling with Hybrid Answer Set Programming
Abels, Dirk, Jordi, Julian, Ostrowski, Max, Schaub, Torsten, Toletti, Ambra, Wanko, Philipp
We present a solution to real-world train scheduling problems, involving routing, scheduling, and optimization, based on Answer Set Programming (ASP). To this end, we pursue a hybrid approach that extends ASP with difference constraints to account for a fine-grained timing. More precisely, we exemplarily show how the hybrid ASP system clingo[DL] can be used to tackle demanding planning-and-scheduling problems. In particular, we investigate how to boost performance by combining distinct ASP solving techniques, such as approximations and heuristics, with preprocessing and encoding techniques for tackling large-scale, real-world train scheduling instances.
Westworld, ethics and maltreating robots Journal of Medical Ethics blog
This week saw the return, for a third season, of the critically acclaimed HBO series Westworld. WW's central premise in its first 2 seasons was a theme park, sometime in the near future, populated by highly realistic robots or'hosts'. Human guests can pay exorbitant sums to interact with these robots, in a huge range of ways. In the'western' themed area – after which the show is named – guests can choose to be white-hatted heroes or black-hatted villains. The good guys get to be brave, chivalrous, honourable and generally decent.
Simulated annealing based heuristic for multiple agile satellites scheduling under cloud coverage uncertainty
Han, Chao, Gu, Yi, Wu, Guohua, Wang, Xinwei
Agile satellites are the new generation of Earth observation satellites (EOSs) with stronger attitude maneuvering capability. Since optical remote sensing instruments equipped on satellites cannot see through the cloud, the cloud coverage has a significant influence on the satellite observation missions. We are the first to address multiple agile EOSs scheduling problem under cloud coverage uncertainty where the objective aims to maximize the entire observation profit. The chance constraint programming model is adopted to describe the uncertainty initially, and the observation profit under cloud coverage uncertainty is then calculated via sample approximation method. Subsequently, an improved simulated annealing based heuristic combining a fast insertion strategy is proposed for large-scale observation missions. The experimental results show that the improved simulated annealing heuristic outperforms other algorithms for the multiple AEOSs scheduling problem under cloud coverage uncertainty, which verifies the efficiency and effectiveness of the proposed algorithm.
Agile Earth observation satellite scheduling over 20 years: formulations, methods and future directions
Wang, Xinwei, Wu, Guohua, Xing, Lining, Pedrycz, Witold
Agile satellites with advanced attitude maneuvering capability are the new generation of Earth observation satellites (EOSs). The continuous improvement in satellite technology and decrease in launch cost have boosted the development of agile EOSs (AEOSs). To efficiently employ the increasing orbiting AEOSs, the AEOS scheduling problem (AEOSSP) aiming to maximize the entire observation profit while satisfying all complex operational constraints, has received much attention over the past 20 years. The objectives of this paper are thus to summarize current research on AEOSSP, identify main accomplishments and highlight potential future research directions. To this end, general definitions of AEOSSP with operational constraints are described initially, followed by its three typical variations including different definitions of observation profit, multi-objective function and autonomous model. A detailed literature review from 1997 up to 2019 is then presented in line with four different solution methods, i.e., exact method, heuristic, metaheuristic and machine learning. Finally, we discuss a number of topics worth pursuing in the future.
Overview of Tools Supporting Planning for Automated Driving
Tong, Kailin, Ajanovic, Zlatan, Stettinger, Georg
Planning is an essential topic in the realm of automated driving. Besides planning algorithms that are widely covered in the literature, planning requires different software tools for its development, validation, and execution. This paper presents a survey of such tools including map representations, communication, traffic rules, open-source planning stacks and middleware, simulation, and visualization tools as well as benchmarks. We start by defining the planning task and different supporting tools. Next, we provide a comprehensive review of state-of-the-art developments and analysis of relations among them. Finally, we discuss the current gaps and suggest future research directions.
Integrating Acting, Planning and Learning in Hierarchical Operational Models
Patra, Sunandita, Mason, James, Kumar, Amit, Ghallab, Malik, Traverso, Paolo, Nau, Dana
We present new planning and learning algorithms for RAE, the Refinement Acting Engine. RAE uses hierarchical operational models to perform tasks in dynamically changing environments. Our planning procedure, UPOM, does a UCT-like search in the space of operational models in order to find a near-optimal method to use for the task and context at hand. Our learning strategies acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve RAE's performance in four test domains using two different metrics: efficiency and success ratio.