"Planning is the process of generating (possibly partial) representations of future behavior prior to the use of such plans to constrain or control that behavior. The outcome is usually a set of actions, with temporal and other constraints on them, for execution by some agent or agents. As a core aspect of human intelligence, planning has been studied since the earliest days of AI and cognitive science. Planning research has led to many useful tools for real-world applications, and has yielded significant insights into the organization of behavior and the nature of reasoning about actions."
– Planning entry by Austin Tate in the MIT Encyclopedia of Cognitive Science.
In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P)PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce ASNets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.
Markov decision processes are of major interest in the planning community as well as in the model checking community. But in spite of the similarity in the considered formal models, the development of new techniques and methods happened largely independently in both communities. This work is intended as a beginning to unite the two research branches. We consider goal-reachability analysis as a common basis between both communities. The core of this paper is the translation from Jani, an overarching input language for quantitative model checkers, into the probabilistic planning domain definition language (PPDDL), and vice versa from PPDDL into Jani. These translations allow the creation of an overarching benchmark collection, including existing case studies from the model checking community, as well as benchmarks from the international probabilistic planning competitions (IPPC). We use this benchmark set as a basis for an extensive empirical comparison of various approaches from the model checking community, variants of value iteration, and MDP heuristic search algorithms developed by the AI planning community. On a per benchmark domain basis, techniques from one community can achieve state-ofthe-art performance in benchmarks of the other community. Across all benchmark domains of one community, the performance comparison is however in favor of the solvers and algorithms of that particular community. Reasons are the design of the benchmarks, as well as tool-related limitations. Our translation methods and benchmark collection foster crossfertilization between both communities, pointing out specific opportunities for widening the scope of solvers to different kinds of models, as well as for exchanging and adopting algorithms across communities.
Agile satellites with advanced attitude maneuvering capability are the new generation of Earth observation satellites (EOSs). The continuous improvement in satellite technology and decrease in launch cost have boosted the development of agile EOSs (AEOSs). To efficiently employ the increasing orbiting AEOSs, the AEOS scheduling problem (AEOSSP) aiming to maximize the entire observation profit while satisfying all complex operational constraints, has received much attention over the past 20 years. The objectives of this paper are thus to summarize current research on AEOSSP, identify main accomplishments and highlight potential future research directions. To this end, general definitions of AEOSSP with operational constraints are described initially, followed by its three typical variations including different definitions of observation profit, multi-objective function and autonomous model. A detailed literature review from 1997 up to 2019 is then presented in line with four different solution methods, i.e., exact method, heuristic, metaheuristic and machine learning. Finally, we discuss a number of topics worth pursuing in the future.
Planning is an essential topic in the realm of automated driving. Besides planning algorithms that are widely covered in the literature, planning requires different software tools for its development, validation, and execution. This paper presents a survey of such tools including map representations, communication, traffic rules, open-source planning stacks and middleware, simulation, and visualization tools as well as benchmarks. We start by defining the planning task and different supporting tools. Next, we provide a comprehensive review of state-of-the-art developments and analysis of relations among them. Finally, we discuss the current gaps and suggest future research directions.
In this paper, we provide a comprehensive outline of the different threads of work in Explainable AI Planning (XAIP) that has emerged as a focus area in the last couple of years and contrast that with earlier efforts in the field in terms of techniques, target users, and delivery mechanisms. We hope that the survey will provide guidance to new researchers in automated planning towards the role of explanations in the effective design of human-in-the-loop systems, as well as provide the established researcher with some perspective on the evolution of the exciting world of explainable planning.
There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks. In other words, the reward that the agent receives is non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine; a finite state automaton that produces output sequences (rewards in our case) from input sequences (state/action observations in our case). In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves and a Mealy machine synchronised with this MDP to formalise the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt. Learning non-Markov reward functions is a challenge. Our approach to overcome this challenging problem is a careful combination of the Angluin's L* active learning algorithm to learn finite automata, testing techniques for establishing conformance of finite model hypothesis and optimisation techniques for computing optimal strategies in Markovian (immediate) reward MDPs. We also show how our framework can be combined with classical heuristics such as Monte Carlo Tree Search. We illustrate our algorithms and a preliminary implementation on two typical examples for AI.
--Many software systems have become too large and complex to be managed efficiently by human administrators, particularly when they operate in uncertain and dynamic environments and require frequent changes. Requirements-driven adaptation techniques have been proposed to endow systems with the necessary means to autonomously decide ways to satisfy their requirements. However, many current approaches rely on general-purpose languages, models and/or frameworks to design, develop and analyze autonomous systems. Unfortunately, these tools are not tailored towards the characteristics of adaptation problems in autonomous systems. D proposes a model (and a language) for the high-level description of the basic elements of self-adaptive systems, namely the system, capabilities, requirements and environment. Based on those elements, a Markov Decision Process (MDP) is constructed to compute the optimal strategy or the most rewarding system behavior . Furthermore, this defines a reflex controller that can ensure timely responses to changes. One novel feature of the framework is that it benefits both from goal-oriented techniques, developed for requirement elicitation, refinement and analysis, and synthesis capabilities and extensive research around MDPs, their extensions and tools. Our preliminary evaluation results demonstrate the practicality and advantages of the framework. Autonomous systems such as unmanned vehicles and robots play an increasingly relevant role in our societies. Many factors contribute to the complexity in the design and development of those systems. First, they typically operate in dynamic and uncontrollable environments -. Therefore, they must continuously adapt their configuration in response to changes, both in their operating environment and in themselves. Since the frequency of change cannot be controlled, decision-making must be almost instantaneous to ensure timely responses. From a design and management perspective, it is desirable to minimize the effort needed to design the system and to enable its runtime updating and maintenance. A promising technique to address those challenges is requirements-driven adaptation that endow systems with the necessary means to autonomously operate based on their requirements. Requirements are prescriptive statements of intent to be satisfied by cooperation of the agents forming the system . They say what the system will do and not how it will do it .
We introduce the multimodal car- and ride-sharing problem (MMCRP), in which a pool of cars is used to cover a set of ride requests, while uncovered requests are assigned to other modes of transport (MOT). A car's route consists of one or more trips. Each trip must have a specific but non-predetermined driver, start in a depot and finish in a (possibly different) depot. Ride-sharing between users is allowed, even when two rides do not have the same origin and/or destination. A user has always the option of using other modes of transport according to an individual list of preferences. The problem can be formulated as a vehicle scheduling problem. In order to solve the problem, an auxiliary graph is constructed in which each trip starting and ending in a depot, and covering possible ride-shares, is modeled as an edge in a time-space graph. We propose a two-layer decomposition algorithm based on column generation, where the master problem ensures that each request can only be covered at most once, and the pricing problem generates new promising routes by solving a kind of shortest path problem in a time-space network. Computational experiments based on realistic instances are reported. The benchmark instances are based on demographic, spatial, and economic data of Vienna, Austria. We solve large instances with the column generation based approach to near optimality in reasonable time, and we further investigate various exact and heuristic pricing schemes.
Recent research in behaviour understanding through language grounding has shown it is possible to automatically generate behaviour models from textual instructions. These models usually have goal-oriented structure and are modelled with different formalisms from the planning domain such as the Planning Domain Definition Language. One major problem that still remains is that there are no benchmark datasets for comparing the different model generation approaches, as each approach is usually evaluated on domain-specific application. To allow the objective comparison of different methods for model generation from textual instructions, in this report we introduce a dataset consisting of 83 textual instructions in English language, their refinement in a more structured form as well as manually developed plans for each of the instructions. The dataset is publicly available to the community.
The semantics of this compilation is that by invoking the service the planner is expecting to get back (n)one or more of the promised outputs. Composition T echnique The inputs to the composition step is thus a process and a set of skills and the output is a optimized process wherein the original process has been composed with skills wherever possible to maximize automation, as shown in Figure 3. Once the skills have been compiled to the standard D3WA form the rest of the process remains same as in D3WA. This means we get all the rest of its features for free, including being able to visualize, debug, and iterate on the composed process once it has been computed. Specifically with regards to business process management, we illustrate some key capabilities next. The reason declarative works well in this setting is twofold: First, the sheer size of these composed processes, and the need to be able to be flexible with their management, makes it imperative that they are not written and maintained by hand. Furthermore, as we mentioned before, the source of skills and processes may be different. The declarative framework allows developers of either to develop without having to worry about how they relate to each other.