Planning & Scheduling
Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning
Feng, Dieqiao, Gomes, Carla P., Selman, Bart
Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.
Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes
Xu, Junhong, Yin, Kai, Liu, Lantao
We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we design a method that eliminates such a strong assumption, which is oftentimes extremely difficult to engineer in reality. To achieve this, we first apply the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of value function, we then design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We have validated the proposed method through extensive simulations in both simplified and realistic planning scenarios, and the experiments show that our proposed approach leads to a much superior performance over several baseline methods.
Revisiting Bounded-Suboptimal Safe Interval Path Planning
Yakovlev, Konstantin, Andreychuk, Anton, Stern, Roni
Safe-interval path planning (SIPP) is a powerful algorithm for finding a path in the presence of dynamic obstacles. SIPP returns provably optimal solutions. However, in many practical applications of SIPP such as path planning for robots, one would like to trade-off optimality for shorter planning time. In this paper we explore different ways to build a bounded-suboptimal SIPP and discuss their pros and cons. We compare the different bounded-suboptimal versions of SIPP experimentally. While there is no universal winner, the results provide insights into when each method should be used.
Innovative Applications of AI: The SURTRAC Application
Real-time traffic signal control presents a challenging multiagent planning problem, particularly in urban road networks where, unlike simpler arterial settings, there are competing dominant traffic flows that shift through the day. Further complicating matters, urban environments require attention to multimodal traffic flows (vehicles, pedestrians, bicyclists, buses) that move at different speeds and may be given different priorities. For the past several years, my research group has been developing and refining a real-time, adaptive traffic signal control system to address these challenges, referred to as scalable urban traffic control (Surtrac). Combining principles from automated planning and scheduling, multiagent systems, and traffic theory, Surtrac treats traffic signal control as a decentralized online planning process. In operation, each intersection repeatedly generates and executes (in rolling horizon fashion) signal-timing plans that optimize the movement of currently sensed approaching traffic through the intersection.
ProTuner: Tuning Programs with Monte Carlo Tree Search
Haj-Ali, Ameer, Genc, Hasan, Huang, Qijing, Moses, William, Wawrzynek, John, Asanović, Krste, Stoica, Ion
We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing. We build our framework on top of Halide and show that MCTS can outperform the state-of-the-art beam-search algorithm. Unlike beam search, which is guided by greedy intermediate performance comparisons between partial and less meaningful schedules, MCTS compares complete schedules and looks ahead before making any intermediate scheduling decision. We further explore modifications to the standard MCTS algorithm as well as combining real execution time measurements with the cost model. Our results show that MCTS can outperform beam search on a suite of 16 real benchmarks.
Online Mapping and Motion Planning under Uncertainty for Safe Navigation in Unknown Environments
Pairet, Èric, Hernández, Juan David, Carreras, Marc, Petillot, Yvan, Lahijanian, Morteza
Safe autonomous navigation is an essential and challenging problem for robots operating in highly unstructured or completely unknown environments. Under these conditions, not only robotic systems must deal with limited localisation information, but also their manoeuvrability is constrained by their dynamics and often suffer from uncertainty. In order to cope with these constraints, this manuscript proposes an uncertainty-based framework for mapping and planning feasible motions online with probabilistic safety-guarantees. The proposed approach deals with the motion, probabilistic safety, and online computation constraints by: (i) incrementally mapping the surroundings to build an uncertainty-aware representation of the environment, and (ii) iteratively (re)planning trajectories to goal that are kinodynamically feasible and probabilistically safe through a multi-layered sampling-based planner in the belief space. In-depth empirical analyses illustrate some important properties of this approach, namely, (a) the multi-layered planning strategy enables rapid exploration of the high-dimensional belief space while preserving asymptotic optimality and completeness guarantees, and (b) the proposed routine for probabilistic collision checking results in tighter probability bounds in comparison to other uncertainty-aware planners in the literature. Furthermore, real-world in-water experimental evaluation on a non-holonomic torpedo-shaped autonomous underwater vehicle and simulated trials in the Stairwell scenario of the DARPA Subterranean Challenge 2019 on a quadrotor unmanned aerial vehicle demonstrate the efficacy of the method as well as its suitability for systems with limited on-board computational power.
How South Korea turned an urban planning system into a coronavirus tracking database
SEOUL – When a man in Seoul tested positive for coronavirus in May, South Korean authorities were able to confirm his wide-ranging movements in and outside the city in minutes, including five bars and clubs he visited on a recent night out. The fast response -- well ahead of many other countries facing outbreaks -- was the result of merging South Korea's already advanced methods of collecting information and tracking the virus into a new data sharing system that patches together cellphone location data and credit card records. The Epidemic Investigation Support System (EISS), introduced in late March, effectively removed technological barriers to sharing that information between authorities, by building on the country's "Smart City" data system. That platform was originally designed to let local authorities share urban planning information, from population to traffic and pollution, by uploading data in Excel spreadsheets and other formats. Now it forms the foundation for a data clearinghouse that has turbocharged South Korea's response to the virus.
Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search
The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. In this paper, we describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards (which is the case in many optimization problems), 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search. We gauge the effectiveness of our method in "SameGame"---a popular single-player test domain. Our experimental results indicate that our method outperforms baseline algorithms on several board sizes. Additionally, it is competitive with state-of-the-art search algorithms on a public set of positions.
Monte Carlo Inverse Folding
Cazenave, Tristan, Fournier, Thomas
The RNA Inverse Folding problem comes from computational biology. The goal is to find a molecule that has a given folding. It is important for scientific fields such as bioengineering, pharmaceutical research, biochemistry, synthetic biology and RNA nanostructures. Nested Monte Carlo Search has given excellent results for this problem. We propose to adapt and evaluate different Monte Carlo Search algorithms for the RNA Inverse Folding problem.
On Modelling Multi-Agent Path Finding as a Classical Planning Problem
Vodrážka, Jindřich (Charles University) | Barták, Roman (Charles University) | Švancara, Jiří (Charles University)
Multi-Agent Path Finding (MAPF) deals with the problem of finding collision-free paths for a set of agents, where each agent wants to move from its start location to its goal location on a shared graph. The paper addresses the question of how to model MAPF as a classical planning problem, specifically, how to encode various collision constraints. Several models in the PDDL modeling language are proposed and empirically compared.