Planning & Scheduling
STRIPS Action Discovery
Suárez-Hernández, Alejandro, Segovia-Aguas, Javier, Torras, Carme, Alenyà, Guillem
The problem of specifying high-level knowledge bases for planning becomes a hard task in realistic environments. This knowledge is usually handcrafted and is hard to keep updated, even for system experts. Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. These approaches can synthesize action schemas in Planning Domain Definition Language (PDDL) from a set of execution traces each consisting, at least, of an initial and final state. In this paper, we propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown. In addition, we contribute with a compilation to classical planning that mitigates the problem of learning static predicates in the action model preconditions, exploits the capabilities of SAT planners with parallel encodings to compute action schemas and validate all instances. Our system is flexible in that it supports the inclusion of partial input information that may speed up the search. We show through several experiments how learned action models generalize over unseen planning instances.
Learning Non-Markovian Reward Models in MDPs
Rens, Gavin, Raskin, Jean-François
There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks. In other words, the reward that the agent receives is non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine; a finite state automaton that produces output sequences (rewards in our case) from input sequences (state/action observations in our case). In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves and a Mealy machine synchronised with this MDP to formalise the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt. Learning non-Markov reward functions is a challenge. Our approach to overcome this challenging problem is a careful combination of the Angluin's L* active learning algorithm to learn finite automata, testing techniques for establishing conformance of finite model hypothesis and optimisation techniques for computing optimal strategies in Markovian (immediate) reward MDPs. We also show how our framework can be combined with classical heuristics such as Monte Carlo Tree Search. We illustrate our algorithms and a preliminary implementation on two typical examples for AI.
Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks
Jiang, Feibo, Wang, Kezhi, Dong, Li, Pan, Cunhua, Yang, Kun
An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the mobile users, by optimizing offloading decision, transmission power, and resource allocation in the mobile edge computing (MEC) system. Towards this end, a deep reinforcement learning (DRL) method is proposed to obtain an online resource scheduling policy. Firstly, a related and regularized stacked auto encoder (2r-SAE) with unsupervised learning is proposed to perform data compression and representation for high dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Secondly, we present an adaptive simulated annealing based approach (ASA) as the action search method of DRL, in which an adaptive h-mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Thirdly, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. Numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks. It also shows that the proposed framework is suitable for resource scheduling problem in large-scale MEC networks, especially in the dynamic environment.
On Algorithmic Decision Procedures in Emergency Response Systems in Smart and Connected Communities
Pettet, Geoffrey, Mukhopadhyay, Ayan, Kochenderfer, Mykel, Vorobeychik, Yevgeniy, Dubey, Abhishek
Emergency Response Management (ERM) is a critical problem faced by communities across the globe. Despite its importance, it is common for ERM systems to follow myopic and straight-forward decision policies in the real world. Principled approaches to aid decision-making under uncertainty have been explored in this context but have failed to be accepted into real systems. We identify a key issue impeding their adoption - algorithmic approaches to emergency response focus on reactive, post-incident dispatching actions, i.e. optimally dispatching a responder after incidents occur. However, the critical nature of emergency response dictates that when an incident occurs, first responders always dispatch the closest available responder to the incident. We argue that the crucial period of planning for ERM systems is not post-incident, but between incidents. However, this is not a trivial planning problem - a major challenge with dynamically balancing the spatial distribution of responders is the complexity of the problem. An orthogonal problem in ERM systems is to plan under limited communication, which is particularly important in disaster scenarios that affect communication networks. We address both the problems by proposing two partially decentralized multi-agent planning algorithms that utilize heuristics and the structure of the dispatch problem. We evaluate our proposed approach using real-world data, and find that in several contexts, dynamic re-balancing the spatial distribution of emergency responders reduces both the average response time as well as its variance.
The 2^k Neighborhoods for Grid Path Planning
Rivera, Nicolás (University of Cambridge) | Hernández, Carlos (Universidad Andrés Bello) | Hormazábal, Nicolás (Universidad Andrés Bello) | Baier, Jorge A (Pontificia Universidad Católica de Chile)
Grid path planning is an important problem in AI. Its understanding has been key for the development of autonomous navigation systems. An interesting and rather surprising fact about the vast literature on this problem is that only a few neighborhoods have been used when evaluating these algorithms. Indeed, only the 4- and 8-neighborhoods are usually considered, and rarely the 16-neighborhood. This paper describes three contributions that enable the construction of effective grid path planners for extended 2k-neighborhoods; that is, neighborhoods that admit 2k neighbors per state, where k is a parameter. First, we provide a simple recursive definition of the 2k-neighborhood in terms of the 2k-1-neighborhood. Second, we derive distance functions, for any k ≥ 2, which allow us to propose admissible heuristics that are perfect for obstacle-free grids, which generalize the well-known Manhattan and Octile distances. Third, we define the notion of canonical path for the 2k-neighborhood; this allows us to incorporate our neighborhoods into two versions of A*, namely Canonical A* and Jump Point Search (JPS), whose performance, we show, scales well when increasing k. Our empirical evaluation shows that, when increasing k, the cost of the solution found improves substantially. Used with the 2k-neighborhood, Canonical A* and JPS, in many configurations, are also superior to the any-angle path planner Theta* both in terms of solution quality and runtime. Our planner is competitive with one implementation of the any-angle path planner, ANYA in some configurations. Our main practical conclusion is that standard, well-understood grid path planning technology may provide an effective approach to any-angle grid path planning.
2019 in Review: 10 AI Papers That Made an Impact
The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a 56 percent increase over 2018; ICLR received 1,591 main conference paper submissions, up 60 percent over last year; ACL reported a record-breaking 2,906 submissions, almost doubling last year's 1,544; and ICCV 2019 received 4,303 submissions, more than twice the 2017 total. As part of our year-end series, Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019. Abstract: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero).
Optimal by Design: Model-Driven Synthesis of Adaptation Strategies for Autonomous Systems
Elrakaiby, Yehia, Spoletini, Paola, Nuseibeh, Bashar
--Many software systems have become too large and complex to be managed efficiently by human administrators, particularly when they operate in uncertain and dynamic environments and require frequent changes. Requirements-driven adaptation techniques have been proposed to endow systems with the necessary means to autonomously decide ways to satisfy their requirements. However, many current approaches rely on general-purpose languages, models and/or frameworks to design, develop and analyze autonomous systems. Unfortunately, these tools are not tailored towards the characteristics of adaptation problems in autonomous systems. D proposes a model (and a language) for the high-level description of the basic elements of self-adaptive systems, namely the system, capabilities, requirements and environment. Based on those elements, a Markov Decision Process (MDP) is constructed to compute the optimal strategy or the most rewarding system behavior . Furthermore, this defines a reflex controller that can ensure timely responses to changes. One novel feature of the framework is that it benefits both from goal-oriented techniques, developed for requirement elicitation, refinement and analysis, and synthesis capabilities and extensive research around MDPs, their extensions and tools. Our preliminary evaluation results demonstrate the practicality and advantages of the framework. Autonomous systems such as unmanned vehicles and robots play an increasingly relevant role in our societies. Many factors contribute to the complexity in the design and development of those systems. First, they typically operate in dynamic and uncontrollable environments [1]-[5]. Therefore, they must continuously adapt their configuration in response to changes, both in their operating environment and in themselves. Since the frequency of change cannot be controlled, decision-making must be almost instantaneous to ensure timely responses. From a design and management perspective, it is desirable to minimize the effort needed to design the system and to enable its runtime updating and maintenance. A promising technique to address those challenges is requirements-driven adaptation that endow systems with the necessary means to autonomously operate based on their requirements. Requirements are prescriptive statements of intent to be satisfied by cooperation of the agents forming the system [6]. They say what the system will do and not how it will do it [7].
Modeling and solving the multimodal car- and ride-sharing problem
Enzi, Miriam, Parragh, Sophie N., Pisinger, David, Prandtstetter, Matthias
We introduce the multimodal car- and ride-sharing problem (MMCRP), in which a pool of cars is used to cover a set of ride requests, while uncovered requests are assigned to other modes of transport (MOT). A car's route consists of one or more trips. Each trip must have a specific but non-predetermined driver, start in a depot and finish in a (possibly different) depot. Ride-sharing between users is allowed, even when two rides do not have the same origin and/or destination. A user has always the option of using other modes of transport according to an individual list of preferences. The problem can be formulated as a vehicle scheduling problem. In order to solve the problem, an auxiliary graph is constructed in which each trip starting and ending in a depot, and covering possible ride-shares, is modeled as an edge in a time-space graph. We propose a two-layer decomposition algorithm based on column generation, where the master problem ensures that each request can only be covered at most once, and the pricing problem generates new promising routes by solving a kind of shortest path problem in a time-space network. Computational experiments based on realistic instances are reported. The benchmark instances are based on demographic, spatial, and economic data of Vienna, Austria. We solve large instances with the column generation based approach to near optimality in reasonable time, and we further investigate various exact and heuristic pricing schemes.
Offline Grid-Based Coverage path planning for guards in games
Enezi, Wael Al, Verbrugge, Clark
Algorithmic approaches to exhaustive coverage have application in video games, enabling automatic game level exploration. Current designs use simple heuristics that frequently result in poor performance or exhibit unnatural behaviour. In this paper, we introduce a novel algorithm for covering a 2D polygonal (with holes) area. We assume prior knowledge of the map layout and use a grid-based world representation. Experimental analysis over several scenarios ranging from simple layouts to more complex maps used in actual games show good performance. This work serves as an initial step towards building a more efficient coverage path planning algorithm for non-player characters.
Towards Evaluating Plan Generation Approaches with Instructional Texts
Chowdhury, Debajyoti Paul, Biswas, Arghya, Sosnowski, Tomasz, Yordanova, Kristina
Recent research in behaviour understanding through language grounding has shown it is possible to automatically generate behaviour models from textual instructions. These models usually have goal-oriented structure and are modelled with different formalisms from the planning domain such as the Planning Domain Definition Language. One major problem that still remains is that there are no benchmark datasets for comparing the different model generation approaches, as each approach is usually evaluated on domain-specific application. To allow the objective comparison of different methods for model generation from textual instructions, in this report we introduce a dataset consisting of 83 textual instructions in English language, their refinement in a more structured form as well as manually developed plans for each of the instructions. The dataset is publicly available to the community.