Planning & Scheduling
Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder
Zeng, Junjie, Qin, Long, Hu, Yue, Hu, Cong, Yin, Quanjun
In this paper, we present a hierarchical path planning framework called SG-RL (subgoal graphs-reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By "rational", we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG-RL works in a two-level manner. At the first level, SG-RL uses a geometric path-planning method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG-RL uses an RL method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG-RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG-RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.
Watching and Acting Together: Concurrent Plan Recognition and Adaptation for Human-Robot Teams
Levine, Steven James, Williams, Brian Charles
There is huge demand for robots to work alongside humans in heterogeneous teams. To achieve a high degree of fluidity, robots must be able to (1) recognize their human co-worker's intent, and (2) adapt to this intent accordingly, providing useful aid as a teammate. The literature to date has made great progress in these two areas -- recognition and adaptation -- but largely as separate research activities. In this work, we present a unified approach to these two problems, in which recognition and adaptation occur concurrently and holistically within the same framework. We introduce Pike, an executive for human-robot teams, that allows the robot to continuously and concurrently reason about what a human is doing as execution proceeds, as well as adapt appropriately. The result is a mixed-initiative execution where humans and robots interact fluidly to complete task goals.Key to our approach is our task model: a contingent, temporally-flexible team-plan with explicit choices for both the human and robot. This allows a single set of algorithms to find implicit constraints between sets of choices for the human and robot (as determined via causal link analysis and temporal reasoning), narrowing the possible decisions a rational human would take (hence achieving intent recognition) as well as the possible actions a robot could consistently take (hence achieving adaptation). Pike makes choices based on the preconditions of actions in the plan, temporal constraints, unanticipated disturbances, and choices made previously (by either agent).Innovations of this work include (1) a framework for concurrent intent recognition and adaptation for contingent, temporally-flexible plans, (2) the generalization of causal links for contingent, temporally-flexible plans along with related extraction algorithms, and (3) extensions to a state-of-the-art dynamic execution system to utilize these causal links for decision making.
Learning Classical Planning Strategies with Policy Gradient
Gomoluch, Pawel, Alrajeh, Dalal, Russo, Alessandra
A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on relatively simple best-first search algorithm, which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy. This enables tailoring the search strategy to a particular distribution of planning problems and a selected performance metric, such as the IPC score or running time. We construct a strategy space using five search algorithms and a two-dimensional representation of the planner's state. Strategies are then trained on randomly generated planning problems using policy gradient. Experimental results show that the learner is able to discover domain-specific search strategies, thus improving the planner's performance with respect to the chosen metric.
Une approche totalement instanci\'ee pour la planification HTN
Ramoul, Abdeldjalil, Pellier, Damien, Fiorino, Humbert, Pesty, Sylvie
Many planning techniques have been developed to allow autonomous systems to act and make decisions based on their perceptions of the environment. Among these techniques, HTN ({\it Hierarchical Task Network}) planning is one of the most used in practice. Unlike classical approaches of planning. HTN operates by decomposing task into sub-tasks until each of these sub-tasks can be achieved an action. This hierarchical representation provide a richer representation of planning problems and allows to better guide the plan search and provides more knowledge to the underlying algorithms. In this paper, we propose a new approach of HTN planning in which, as in conventional planning, we instantiate all planning operators before starting the search process. This approach has proven its effectiveness in classical planning and is necessary for the development of effective heuristics and encoding planning problems in other formalism such as CSP or SAT. The instantiation is actually used by most modern planners but has never been applied in an HTN based planning framework. We present in this article a generic instantiation algorithm which implements many simplification techniques to reduce the process complexity inspired from those used in classical planning. Finally we present some results obtained from an experimentation on a range of problems used in the international planning competitions with a modified version of SHOP planner using fully instantiated problems.
Planification en temps r\'eel avec agenda de buts et sauts
Pellier, Damien, Bouzy, Bruno, Métivier, Marc
In the context of real-time planning, this paper investigates the contributions of two enhancements for selecting actions. First, the agenda-driven planning enhancement ranks relevant atomic goals and solves them incrementally in a best-first manner. Second, the committed jump enhancement commits a sequence of actions to be executed at the following time steps. To assess these two enhancements, we developed a real-time planning algorithm in which action selection can be driven by a goal-agenda, and committed jumps can be done. Experimental results, performed on classical planning problems, show that agenda-planning and committed jumps are clear advantages in the real-time context. Used simultaneously, they enable the planner to be several orders of magnitude faster and solution plans to be shorter.
A Review on Learning Planning Action Models for Socio-Communicative HRI
Arora, Ankuj, Fiorino, Humbert, Pellier, Damien, Pesty, Sylvie
For social robots to be brought more into widespread use in the fields of companionship, care taking and domestic help, they must be capable of demonstrating social intelligence. In order to be acceptable, they must exhibit socio-communicative skills. Classic approaches to program HRI from observed human-human interactions fails to capture the subtlety of multimodal interactions as well as the key structural differences between robots and humans. The former arises due to a difficulty in quantifying and coding multimodal behaviours, while the latter due to a difference of the degrees of liberty between a robot and a human. However, the notion of reverse engineering from multimodal HRI traces to learn the underlying behavioral blueprint of the robot given multimodal traces seems an option worth exploring. With this spirit, the entire HRI can be seen as a sequence of exchanges of speech acts between the robot and human, each act treated as an action, bearing in mind that the entire sequence is goal-driven. Thus, this entire interaction can be treated as a sequence of actions propelling the interaction from its initial to goal state, also known as a plan in the domain of AI planning. In the same domain, this action sequence that stems from plan execution can be represented as a trace. AI techniques, such as machine learning, can be used to learn behavioral models (also known as symbolic action models in AI), intended to be reusable for AI planning, from the aforementioned multimodal traces. This article reviews recent machine learning techniques for learning planning action models which can be applied to the field of HRI with the intent of rendering robots as socio-communicative.
Mining useful Macro-actions in Planning
Castellanos-Paez, Sandra, Pellier, Damien, Fiorino, Humbert, Pesty, Sylvie
Abstract--Planning has achieved significant progress in recent years. Among the various approaches to scale up plan synthesis, the use of macro-actions has been widely explored. As a first stage towards the development of a solution to learn online macro-actions, we propose an algorithm to identify useful macroactions based on data mining techniques. The integration in the planning search of these learned macro-actions shows significant improvements over six classical planning benchmarks. Automated planning is an area of Artificial Intelligence that comes up with the challenge of devising systems that can autonomously find a plan to reach a set of goals. In classical planning, a problem is composed of an initial state, a goal specification and a set of actions. From the initial state if the preconditions of an action are satisfied, the action is applicable to the current state.
Mean-based Heuristic Search for Real-Time Planning
Pellier, Damien, Bouzy, Bruno, Métivier, Marc
In this paper, we introduce a new heuristic search algorithm based on mean values for real-time planning, called MHSP. It consists in associating the principles of UCT, a bandit-based algorithm which gave very good results in computer games, and especially in Computer Go, with heuristic search in order to obtain a real-time planner in the context of classical planning. MHSP is evaluated on different planning problems and compared to existing algorithms performing on-line search and learning. Besides, our results highlight the capacity of MHSP to return plans in a real-time manner which tend to an optimal plan over the time which is faster and of better quality compared to existing algorithms in the literature.
3 Ways AI Simplifies Workforce Management And Improves Team Morale
In today's digital world, most enterprises are handling huge volumes of enterprise data. Sifting through it to locate the one nugget of information you need can be so onerous, many managers don't even try. They're already busy coordinating hectic employee time-off requests, making last-minute schedules, sorting out performance reviews, and completing hundreds of other tasks to keep the business running day-to-day. They simply don't have time. To help a company's data work for -- rather than against -- them, teams are increasingly turning to artificial intelligence (AI). These systems dive into mountains of data and streamline some of the most time-consuming aspects of workforce management.
A Framework for Robot Programming in Cobotic Environments: First user experiments
Liang, Ying Siu, Pellier, Damien, Fiorino, Humbert, Pesty, Sylvie
The increasing presence of robots in industries has not gone unnoticed. Large industrial players have incorporated them into their production lines, but smaller companies hesitate due to high initial costs and the lack of programming expertise. In this work we introduce a framework that combines two disciplines, Programming by Demonstration and Automated Planning, to allow users without any programming knowledge to program a robot. The user teaches the robot atomic actions together with their semantic meaning and represents them in terms of preconditions and effects. Using these atomic actions the robot can generate action sequences autonomously to reach any goal given by the user. We evaluated the usability of our framework in terms of user experiments with a Baxter Research Robot and showed that it is well-adapted to users without any programming experience.