Planning & Scheduling
Learning to Schedule DAG Tasks
Hua, Zhigang, Qi, Feng, Liu, Gan, Yang, Shuang
Scheduling computational tasks represented by directed acyclic graphs (DAGs) is challenging because of its complexity. Conventional scheduling algorithms rely heavily on simple heuristics such as shortest job first (SJF) and critical path (CP), and are often lacking in scheduling quality. In this paper, we present a novel learning-based approach to scheduling DAG tasks. The algorithm employs a reinforcement learning agent to iteratively add directed edges to the DAG, one at a time, to enforce ordering (i.e., priorities of execution and resource allocation) of "tricky" job nodes. By doing so, the original DAG scheduling problem is dramatically reduced to a much simpler proxy problem, on which heuristic scheduling algorithms such as SJF and CP can be efficiently improved. Our approach can be easily applied to any existing heuristic scheduling algorithms. On the benchmark dataset of TPC-H, we show that our learning based approach can significantly improve over popular heuristic algorithms and consistently achieves the best performance among several methods under a variety of settings.
Efficient UAV Trajectory-Planning using Economic Reinforcement Learning
Khalil, Alvi Ataur, Byrne, Alexander J, Rahman, Mohammad Ashiqur, Manshaei, Mohammad Hossein
Advances in unmanned aerial vehicle (UAV) design have opened up applications as varied as surveillance, firefighting, cellular networks, and delivery applications. Additionally, due to decreases in cost, systems employing fleets of UAVs have become popular. The uniqueness of UAVs in systems creates a novel set of trajectory or path planning and coordination problems. Environments include many more points of interest (POIs) than UAVs, with obstacles and no-fly zones. This system revolves around an economic theory, in particular an auction mechanism where UAVs trade assigned POIs. We formulate the path planning problem as a multi-agent economic game, where agents can cooperate and compete for resources. We then translate the problem into a Partially Observable Markov decision process (POMDP), which is solved using a reinforcement learning (RL) model deployed on each agent. As the system computes task distributions via UAV cooperation, it is highly resilient to any change in the swarm size. Our proposed network and economic game architecture can effectively coordinate the swarm as an emergent phenomenon while maintaining the swarm's operation. Unmanned aerial vehicles (UAVs) are applicable to a wide-ranging set of problems such as fire fighting, security monitoring, agriculture, edge computing, 3D mapping, and network support [1]. Fire fighting problems center around tracking and finding fires, whereas security applications focus on monitoring and finding targets. On the other hand, agricultural problems center around field monitoring and data harvesting, while edge computing and network support are focused on data harvesting and load reaction. All of these problems can be abstracted to a set of partially observed points and must be traveled to in the shortest amount of time possible, and then some task must be carried out in the vicinity of this point. Swarm surveillance missions are essential in both civilian and military contexts, where solutions must be secure, reliable, and efficient.
Self-play Learning Strategies for Resource Assignment in Open-RAN Networks
Wang, Xiaoyang, Thomas, Jonathan D, Piechocki, Robert J, Kapoor, Shipra, Santos-Rodriguez, Raul, Parekh, Arjun
Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the mapping of variable RU requirements to local mobile edge computing centres for future centralized processing would significantly reduce the power consumption in cellular networks. In this paper, we study the RU-DU resource assignment problem in an ORAN system, modelled as a 2D bin packing problem. A deep reinforcement learning-based self-play approach is proposed to achieve efficient RU-DU resource management, with AlphaGo Zero inspired neural Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing environment and real sites data show that the self-play learning strategy achieves intelligent RU-DU resource assignment for different network conditions.
Cost Optimal Planning as Satisfiability
We investigate upper bounds on the length of cost optimal plans that are valid for problems with 0-cost actions. We employ these upper bounds as horizons for a SAT-based encoding of planning with costs. Given an initial upper bound on the cost of the optimal plan, we experimentally show that this SAT-based approach is able to compute plans with better costs, and in many cases it can match the optimal cost. Also, in multiple instances, the approach is successful in proving that a certain cost is the optimal plan cost.
Single and Parallel Machine Scheduling with Variable Release Dates
Mohr, Felix, Mejía, Gonzalo, Yuraszeck, Francisco
In this paper, we address the identical parallel machine scheduling problem with variable release dates and a common deadline for arrival. This problem occurs in several settings in which the release dates themselves are decision variables with the constraint that all jobs must arrive before or on a common fixed deadline. This deadline can be interpreted as a maximum release date for all jobs. To our knowledge, this problem has not been studied before in spite of many important applications. A first example is a manufacturing facility which uses a Just-In-Time discipline: jobs are released to the shop floor as late as possible to avoid cluttering the system but due to accounting restrictions, mostly related to the MRP (Materials Requirements Planning) logic, all work orders in a time bucket must be released before a fixed deadline. A second example is the receiving area of a warehouse which restricts the arrival of trucks within a time window. The warehouse may schedule its suppliers' trucks so to avoid congestion and provide them with an arrival time, but again, the warehouse's opening hours or external constraints such as circulation bans at certain hours, restrict the arrival of trucks. In these two examples, the deadline constraint cannot be violated, and a central controller must guarantee that all jobs meet such a constraint.
Learning Symbolic Operators for Task and Motion Planning
Silver, Tom, Chitnis, Rohan, Tenenbaum, Joshua, Kaelbling, Leslie Pack, Lozano-Perez, Tomas
Robotic planning problems in hybrid state and action spaces can be solved by integrated task and motion planners (TAMP) that handle the complex interaction between motion-level decisions and task-level plan feasibility. TAMP approaches rely on domain-specific symbolic operators to guide the task-level search, making planning efficient. In this work, we formalize and study the problem of operator learning for TAMP. Central to this study is the view that operators define a lossy abstraction of the transition model of the underlying domain. We then propose a bottom-up relational learning method for operator learning and show how the learned operators can be used for planning in a TAMP system. Experimentally, we provide results in three domains, including long-horizon robotic planning tasks. We find our approach to substantially outperform several baselines, including three graph neural network-based model-free approaches based on recent work. Video: https://youtu.be/iVfpX9BpBRo
Multi-Agent Path Planning based on MPC and DDPG
Xue, Junxiao, Kong, Xiangyan, Dong, Bowei, Xu, Mingliang
The problem of mixed static and dynamic obstacle avoidance is essential for path planning in highly dynamic environment. However, the paths formed by grid edges can be longer than the true shortest paths in the terrain since their headings are artificially constrained. Existing methods can hardly deal with dynamic obstacles. To address this problem, we propose a new algorithm combining Model Predictive Control (MPC) with Deep Deterministic Policy Gradient (DDPG). Firstly, we apply the MPC algorithm to predict the trajectory of dynamic obstacles. Secondly, the DDPG with continuous action space is designed to provide learning and autonomous decision-making capability for robots. Finally, we introduce the idea of the Artificial Potential Field to set the reward function to improve convergence speed and accuracy. We employ Unity 3D to perform simulation experiments in highly uncertain environment such as aircraft carrier decks and squares. The results show that our method has made great improvement on accuracy by 7%-30% compared with the other methods, and on the length of the path and turning angle by reducing 100 units and 400-450 degrees compared with DQN (Deep Q Network), respectively.
Motion Planning for a Pair of Tethered Robots
Teshnizi, Reza H., Shell, Dylan A.
Considering an environment containing polygonal obstacles, we address the problem of planning motions for a pair of planar robots connected to one another via a cable of limited length. Much like prior problems with a single robot connected via a cable to a fixed base, straight line-of-sight visibility plays an important role. The present paper shows how the reduced visibility graph provides a natural discretization and captures the essential topological considerations very effectively for the two robot case as well. Unlike the single robot case, however, the bounded cable length introduces considerations around coordination (or equivalently, when viewed from the point of view of a centralized planner, relative timing) that complicates the matter. Indeed, the paper has to introduce a rather more involved formalization than prior single-robot work in order to establish the core theoretical result -- a theorem permitting the problem to be cast as one of finding paths rather than trajectories. Once affirmed, the planning problem reduces to a straightforward graph search with an elegant representation of the connecting cable, demanding only a few extra ancillary checks that ensure sufficiency of cable to guarantee feasibility of the solution. We describe our implementation of A${}^\star$ search, and report experimental results. Lastly, we prescribe an optimal execution for the solutions provided by the algorithm.
The Logical Options Framework
Araki, Brandon, Li, Xiao, Vodrahalli, Kiran, DeCastro, Jonathan, Fry, Micah J., Rus, Daniela
Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF efficiently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and planning. We provide and prove conditions under which LOF will learn satisfying, optimal policies. And lastly, we show how LOF's learned policies can be composed to satisfy unseen tasks with only 10-50 retraining steps. We evaluate LOF on four tasks in discrete and continuous domains, including a 3D pick-and-place environment.
Inferring Agents Preferences as Priors for Probabilistic Goal Recognition
Gusmão, Kin Max, Pereira, Ramon Fraga, Meneguzzi, Felipe
Recent approaches to goal recognition have leveraged planning landmarks to achieve high-accuracy with low runtime cost. These approaches, however, lack a probabilistic interpretation. Furthermore, while most probabilistic models to goal recognition assume that the recognizer has access to a prior probability representing, for example, an agent's preferences, virtually no goal recognition approach actually uses the prior in practice, simply assuming a uniform prior. In this paper, we provide a model to both extend landmark-based goal recognition with a probabilistic interpretation and allow the estimation of such prior probability and its usage to compute posterior probabilities after repeated interactions of observed agents. We empirically show that our model can not only recognize goals effectively but also successfully infer the correct prior probability distribution representing an agent's preferences.