Goto

Collaborating Authors

 Planning & Scheduling


Semiconductor Fab Scheduling with Self-Supervised and Reinforcement Learning

arXiv.org Artificial Intelligence

Semiconductor manufacturing is a notoriously complex and costly multi-step process involving a long sequence of operations on expensive and quantity-limited equipment. Recent chip shortages and their impacts have highlighted the importance of semiconductors in the global supply chains and how reliant on those our daily lives are. Due to the investment cost, environmental impact, and time scale needed to build new factories, it is difficult to ramp up production when demand spikes. This work introduces a method to successfully learn to schedule a semiconductor manufacturing facility more efficiently using deep reinforcement and self-supervised learning. We propose the first adaptive scheduling approach to handle complex, continuous, stochastic, dynamic, modern semiconductor manufacturing models. Our method outperforms the traditional hierarchical dispatching strategies typically used in semiconductor manufacturing plants, substantially reducing each order's tardiness and time until completion. As a result, our method yields a better allocation of resources in the semiconductor manufacturing process.


Efficient Offline Policy Optimization with a Learned Model

arXiv.org Artificial Intelligence

MuZero Unplugged presents a promising approach for offline policy learning from logged data. It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data. For good performance, MCTS requires accurate learned models and a large number of simulations, thus costing huge computing time. This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline RL settings, including 1) learning with limited data coverage; 2) learning from offline data of stochastic environments; 3) improperly parameterized models given the offline data; 4) with a low compute budget. We propose to use a regularized one-step look-ahead approach to tackle the above issues. Instead of planning with the expensive MCTS, we use the learned model to construct an advantage estimation based on a one-step rollout. Policy improvements are towards the direction that maximizes the estimated advantage with regularization of the dataset. We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. Experimental results show that our proposed approach achieves stable performance even with an inaccurate learned model. On the large-scale Atari benchmark, the proposed method outperforms MuZero Unplugged by 43%. Most significantly, it uses only 5.6% wall-clock time (i.e., 1 hour) compared to MuZero Unplugged (i.e., 17.8 hours) to achieve a 150% IQM normalized score with the same hardware and software stacks. Our implementation is open-sourced at https://github.com/sail-sg/rosmo.


A Survey on Active Simultaneous Localization and Mapping: State of the Art and New Frontiers

arXiv.org Artificial Intelligence

Active Simultaneous Localization and Mapping (SLAM) is the problem of planning and controlling the motion of a robot to build the most accurate and complete model of the surrounding environment. Since the first foundational work in active perception appeared, more than three decades ago, this field has received increasing attention across different scientific communities. This has brought about many different approaches and formulations, and makes a review of the current trends necessary and extremely valuable for both new and experienced researchers. In this work, we survey the state-of-the-art in active SLAM and take an in-depth look at the open challenges that still require attention to meet the needs of modern applications. After providing a historical perspective, we present a unified problem formulation and review the well-established modular solution scheme, which decouples the problem into three stages that identify, select, and execute potential navigation actions. We then analyze alternative approaches, including belief-space planning and deep reinforcement learning techniques, and review related work on multi-robot coordination. The manuscript concludes with a discussion of new research directions, addressing reproducible research, active spatial perception, and practical applications, among other topics.


Bringing Diversity to Autonomous Vehicles: An Interpretable Multi-vehicle Decision-making and Planning Framework

arXiv.org Artificial Intelligence

With the development of autonomous driving, it is becoming increasingly common for autonomous vehicles (AVs) and human-driven vehicles (HVs) to travel on the same roads. Existing single-vehicle planning algorithms on board struggle to handle sophisticated social interactions in the real world. Decisions made by these methods are difficult to understand for humans, raising the risk of crashes and making them unlikely to be applied in practice. Moreover, vehicle flows produced by open-source traffic simulators suffer from being overly conservative and lacking behavioral diversity. We propose a hierarchical multi-vehicle decision-making and planning framework with several advantages. The framework jointly makes decisions for all vehicles within the flow and reacts promptly to the dynamic environment through a high-frequency planning module. The decision module produces interpretable action sequences that can explicitly communicate self-intent to the surrounding HVs. We also present the cooperation factor and trajectory weight set, bringing diversity to autonomous vehicles in traffic at both the social and individual levels. The superiority of our proposed framework is validated through experiments with multiple scenarios, and the diverse behaviors in the generated vehicle trajectories are demonstrated through closed-loop simulations.


Graph Learning Based Decision Support for Multi-Aircraft Take-Off and Landing at Urban Air Mobility Vertiports

arXiv.org Artificial Intelligence

Majority of aircraft under the Urban Air Mobility (UAM) concept are expected to be of the electric vertical takeoff and landing (eVTOL) vehicle type, which will operate out of vertiports. While this is akin to the relationship between general aviation aircraft and airports, the conceived location of vertiports within dense urban environments presents unique challenges in managing the air traffic served by a vertiport. This challenge becomes pronounced within increasing frequency of scheduled landings and take-offs. This paper assumes a centralized air traffic controller (ATC) to explore the performance of a new AI driven ATC approach to manage the eVTOLs served by the vertiport. Minimum separation-driven safety and delays are the two important considerations in this case. The ATC problem is modeled as a task allocation problem, and uncertainties due to communication disruptions (e.g., poor link quality) and inclement weather (e.g., high gust effects) are added as a small probability of action failures. To learn the vertiport ATC policy, a novel graph-based reinforcement learning (RL) solution called "Urban Air Mobility- Vertiport Schedule Management (UAM-VSM)" is developed. This approach uses graph convolutional networks (GCNs) to abstract the vertiport space and eVTOL space as graphs, and aggregate information for a centralized ATC agent to help generalize the environment. Unreal Engine combined with Airsim is used as the simulation environment over which training and testing occurs. Uncertainties are considered only during testing, due to the high cost of Mc sampling over such realistic simulations. The proposed graph RL method demonstrates significantly better performance on the test scenarios when compared against a feasible random decision-making baseline and a first come first serve (FCFS) baseline, including the ability to generalize to unseen scenarios and with uncertainties.


Optimal Allocation of Many Robot Guards for Sweep-Line Coverage

arXiv.org Artificial Intelligence

We study the problem of allocating many mobile robots for the execution of a pre-defined sweep schedule in a known two-dimensional environment, with applications toward search and rescue, coverage, surveillance, monitoring, pursuit-evasion, and so on. The mobile robots (or agents) are assumed to have one-dimensional sensing capability with probabilistic guarantees that deteriorate as the sensing distance increases. In solving such tasks, a time-parameterized distribution of robots along the sweep frontier must be computed, with the objective to minimize the number of robots used to achieve some desired coverage quality guarantee or to maximize the probabilistic guarantee for a given number of robots. We propose a max-flow based algorithm for solving the allocation task, which builds on a decomposition technique of the workspace as a generalization of the well-known boustrophedon decomposition. Our proposed algorithm has a very low polynomial running time and completes in under two seconds for polygonal environments with over $10^5$ vertices. Simulation experiments are carried out on three realistic use cases with randomly generated obstacles of varying shapes, sizes, and spatial distributions, which demonstrate the applicability and scalability our proposed method.


Plan-Based Derivation of General Functional Structures in Product Design

arXiv.org Artificial Intelligence

In product design, a decomposition of the overall product function into a set of smaller, interacting functions is usually considered a crucial first step for any computer-supported design tool. Here, we propose a new approach for the decomposition of functions especially suited for later solutions based on Artificial Intelligence. The presented approach defines the decomposition problem in terms of a planning problem--a well established field in Artificial Intelligence. For the planning problem, logic-based solvers can be used to find solutions that compute a useful function structure for the design process. Well-known function libraries from engineering are used as atomic planning steps. The algorithms are evaluated using two different application examples to ensure the transferability of a general function decomposition.


Southwest To Tell U.S. Lawmakers 'We Messed Up' During Holiday Meltdown

International Business Times

Southwest Airlines Chief Operating Officer Andrew Watterson will apologize on Thursday before a U.S. Senate committee over the holiday meltdown that led to the cancellation of 16,700 flights and pledge changes to ensure that there will be no repeats. "Let me be clear: we messed up. In hindsight, we did not have enough winter operational resilience," Watterson's written testimony for a U.S. Senate Commerce Committee hearing seen by Reuters says. In other written testimony seen by Reuters, Southwest Airlines Pilots Association (SWAPA) President Casey Murray will tell the committee that the low-cost carrier's "overconfidence" in planning and a "systemic failure to provide modern tools" were responsible for the December meltdown that the union said stranded 2 million passengers and is estimated to have cost it more than $1 billion. Murray will tell the committee that pilots "have been sounding the alarm about (Southwest's) inadequate crew scheduling technology and outdated operational processes for years. Unfortunately, those warnings were summarily ignored."


Goal Alignment: A Human-Aware Account of Value Alignment Problem

arXiv.org Artificial Intelligence

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety problems in AI. Unfortunately, most existing works in value alignment tend to focus on issues that are primarily related to the fact that reward functions are an unintuitive mechanism to specify objectives. However, the complexity of the objective specification mechanism is just one of many reasons why the user may have misspecified their objective. A foundational cause for misalignment that is being overlooked by these works is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective. To address this lacuna, we propose a novel formulation for the value alignment problem, named goal alignment that focuses on a few central challenges related to value alignment. In doing so, we bridge the currently disparate research areas of value alignment and human-aware planning. Additionally, we propose a first-of-its-kind interactive algorithm that is capable of using information generated under incorrect beliefs about the agent, to determine the true underlying goal of the user.


Two-Step Online Trajectory Planning of a Quadcopter in Indoor Environments with Obstacles

arXiv.org Artificial Intelligence

This paper presents a two-step algorithm for online trajectory planning in indoor environments with unknown obstacles. In the first step, sampling-based path planning techniques such as the optimal Rapidly exploring Random Tree (RRT*) algorithm and the Line-of-Sight (LOS) algorithm are employed to generate a collision-free path consisting of multiple waypoints. Then, in the second step, constrained quadratic programming is utilized to compute a smooth trajectory that passes through all computed waypoints. The main contribution of this work is the development of a flexible trajectory planning framework that can detect changes in the environment, such as new obstacles, and compute alternative trajectories in real time. The proposed algorithm actively considers all changes in the environment and performs the replanning process only on waypoints that are occupied by new obstacles. This helps to reduce the computation time and realize the proposed approach in real time. The feasibility of the proposed algorithm is evaluated using the Intel Aero Ready-to-Fly (RTF) quadcopter in simulation and in a real-world experiment.