Planning & Scheduling
Mixed-Initiative Goal Manipulation
Mixed-initiative planning systems attempt to integrate human and AI planners so that the synthesis results in high-quality plans. In the AI community, the dominant model of planning is search. In state-space planning, search consists of backward and forward chaining through the effects and preconditions of operator representations. Although search is an acceptable mechanism to use in performing automated planning, we present an alternative model to present to the user at the interface of a mixed-initiative planning assistant. That is, we propose to model planning as a goal-manipulation task.
Smart Infrastructure for Future Urban Mobility
Real-time traffic signal control presents a challenging multiagent planning pro blem, particularly in urban road networks where, unlike simpler arterial settings, there are competing dominant traffic flows that shift through the day. Further complicating matters, urban environments require attention to multimodal traffic flows (vehicles, pedestrians, bicyclists, buses) that move at different speeds and may be given different priorities. For the past several years, my research group has been developing and refining a real-time, adaptive traffic signal control system to address these challenges, referred to as scalable urban traffic control (Surtrac). Combining principles from automated planning and scheduling, multiagent systems, and traffic theory, Surtrac treats traffic signal control as a decentralized online planning process. In operation, each intersection repeatedly generates and executes (in rolling horizon fashion) signal-timing plans that optimize the movement of currently sensed approaching traffic through the intersection.
Norm Identification through Plan Recognition
Societal rules, as exemplified by norms, aim to provide a degree of behavioural stability to multi-agent societies. Norms regulate a society using the deontic concepts of permissions, obligations and prohibitions to specify what can, must and must not occur in a society. Many implementations of normative systems assume various combinations of the following assumptions: that the set of norms is static and defined at design time; that agents joining a society are instantly informed of the complete set of norms; that the set of agents within a society does not change; and that all agents are aware of the existing norms. When any one of these assumptions is dropped, agents need a mechanism to identify the set of norms currently present within a society, or risk unwittingly violating the norms. In this paper, we develop a norm identification mechanism that uses a combination of parsing-based plan recognition and Hierarchical Task Network (HTN) planning mechanisms, which operates by analysing the actions performed by other agents. While our basic mechanism cannot learn in situations where norm violations take place, we describe an extension which is able to operate in the presence of violations.
AI and Data Gaps: The Plot Thickens - The AI Journal
Four weeks ago, my last article for The AI Journal pointed out that the COVID-19 "creates data deficits … Relaxed or suspended regulatory reporting requirements deliver less data to governments for aggregation, which in turn delivers less complete data sets to markets." It did not take long before the official sector started pointing out crucial breaks in key data sets used for global macro and credit risk analysis. De Nederlandsche Bank (the Dutch central bank) recently released research showing that key inputs for HIPC inflation measures were not collected during March and April. Because the HIPC data is only updated annually, when central bankers sought to plug the gap in order to conduct monetary policy, they only alternative was to use same-month 2019 data as a proxy for 2020 prices in key sectors disrupted by the pandemic. While the data gap was resolved, actual price data diverged significantly from the estimated data.
Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models
Patra, Sunandita, Mason, James, Ghallab, Malik, Nau, Dana, Traverso, Paolo
The most common representation formalisms for automated planning are descriptive models that abstractly describe what the actions do and are tailored for effciently computing the next state(s) in a state-transition system. However, real-world acting requires operational models that describe how to do things, with rich control structures for closed-loop online decision-making in a dynamic environment. To use a different action model for planning than the one used for acting causes problems with combining acting and planning, in particular for the development and consistency verification of the different models. As an alternative, we define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models, which are written in a general-purpose hierarchical task-oriented language offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system, except that instead of being purely reactive, it can get advice from the planner. Our planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM (UCT Procedure for Operational Models), whose rollouts are simulations of the actor's operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve the acting efficiency and robustness of RAE. We discuss the asymptotic convergence of UPOM by mapping its search space to an MDP.
Manipulation of Articulated Objects using Dual-arm Robots via Answer Set Programming
Bertolucci, Riccardo, Capitanelli, Alessio, Dodaro, Carmine, Leone, Nicola, Maratea, Marco, Mastrogiovanni, Fulvio, Vallati, Mauro
The manipulation of articulated objects is of primary importance in Robotics, and can be considered as one of the most complex manipulation tasks. Traditionally, this problem has been tackled by developing ad-hoc approaches, which lack flexibility and portability. In this paper we present a framework based on Answer Set Programming (ASP) for the automated manipulation of articulated objects in a robot control architecture. In particular, ASP is employed for representing the configuration of the articulated object, for checking the consistency of such representation in the knowledge base, and for generating the sequence of manipulation actions. The framework is exemplified and validated on the Baxter dual-arm manipulator in a first, simple scenario. Then, we extend such scenario to improve the overall setup accuracy, and to introduce a few constraints in robot actions execution to enforce their feasibility. The extended scenario entails a high number of possible actions that can be fruitfully combined together. Therefore, we exploit macro actions from automated planning in order to provide more effective plans. We validate the overall framework in the extended scenario, thereby confirming the applicability of ASP also in more realistic Robotics settings, and showing the usefulness of macro actions for the robot-based manipulation of articulated objects.
Efficient Black-Box Planning Using Macro-Actions with Focused Effects
Allen, Cameron, Katz, Michael, Klinger, Tim, Konidaris, George, Riemer, Matthew, Tesauro, Gerald
The difficulty of classical planning increases exponentially with search-tree depth. Heuristic search can make planning more efficient, but good heuristics can be expensive to compute or may require domain-specific information, and such information may not even be available in the more general case of black-box planning. Rather than treating a given planning problem as fixed and carefully constructing a heuristic to match it, we instead rely on the simple and general-purpose "goal-count" heuristic and construct macro-actions to make it more accurate. Our approach searches for macro-actions with focused effects (i.e. macros that modify only a small number of state variables), which align well with the assumptions made by the goal-count heuristic. Our method discovers macros that dramatically improve black-box planning efficiency across a wide range of planning domains, including Rubik's cube, where it generates fewer states than the state-of-the-art LAMA planner with access to the full SAS$^+$ representation.
Machine Learning in Airline Crew Pairing to Construct Initial Clusters for Dynamic Constraint Aggregation
Yaakoubi, Yassine, Soumis, François, Lacoste-Julien, Simon
The crew pairing problem (CPP) is generally modelled as a set partitioning problem where the flights have to be partitioned in pairings. A pairing is a sequence of flight legs separated by connection time and rest periods that starts and ends at the same base. Because of the extensive list of complex rules and regulations, determining whether a sequence of flights constitutes a feasible pairing can be quite difficult by itself, making CPP one of the hardest of the airline planning problems. In this paper, we first propose to improve the prototype Baseline solver of Desaulniers et al. (2020) by adding dynamic control strategies to obtain an efficient solver for large-scale CPPs: Commercial-GENCOL-DCA. These solvers are designed to aggregate the flights covering constraints to reduce the size of the problem. Then, we use machine learning (ML) to produce clusters of flights having a high probability of being performed consecutively by the same crew. The solver combines several advanced Operations Research techniques to assemble and modify these clusters, when necessary, to produce a good solution. We show, on monthly CPPs with up to 50 000 flights, that Commercial-GENCOL-DCA with clusters produced by ML-based heuristics outperforms Baseline fed by initial clusters that are pairings of a solution obtained by rolling horizon with GENCOL. The reduction of solution cost averages between 6.8% and 8.52%, which is mainly due to the reduction in the cost of global constraints between 69.79% and 78.11%.
Online Learning of Non-Markovian Reward Models
Rens, Gavin, Raskin, Jean-François, Reynouad, Raphaël, Marra, Giuseppe
There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine, a finite state automaton that produces output sequences from input sequences. In our formal setting, we consider a Markov decision process (MDP) that models the dynamics of the environment in which the agent evolves and a Mealy machine synchronized with this MDP to formalize the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown to the agent and must be learned. Our approach to overcome this challenge is to use Angluin's $L^*$ active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). Formal methods are used to determine the optimal strategy for answering so-called membership queries posed by $L^*$. Moreover, we prove that the expected reward achieved will eventually be at least as much as a given, reasonable value provided by a domain expert. We evaluate our framework on three problems. The results show that using $L^*$ to learn an MRM in a non-Markovian reward decision process is effective.
Planning High-Level Paths in Hostile, Dynamic, and Uncertain Environments
Banfi, Jacopo (Cornell University) | Shree, Vikram (Cornell University) | Campbell, Mark (Cornell University)
This paper introduces and studies a graph-based variant of the path planning problem arising in hostile environments. We consider a setting where an agent (e.g. a robot) must reach a given destination while avoiding being intercepted by probabilistic entities which exist in the graph with a given probability and move according to a probabilistic motion pattern known a priori. Given a goal vertex and a deadline to reach it, the agent must compute the path to the goal that maximizes its chances of survival. We study the computational complexity of the problem, and present two algorithms for computing high quality solutions in the general case: an exact algorithm based on Mixed-Integer Nonlinear Programming, working well in instances of moderate size, and a pseudo-polynomial time heuristic algorithm allowing to solve large scale problems in reasonable time. We also consider the two limit cases where the agent can survive with probability 0 or 1, and provide specialized algorithms to detect these kinds of situations more efficiently.