Planning & Scheduling
Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes
Chen, Weizhe, Khardon, Roni, Liu, Lantao
Robotic Information Gathering (RIG) is a foundational research topic that answers how a robot (team) collects informative data to efficiently build an accurate model of an unknown target function under robot embodiment constraints. RIG has many applications, including but not limited to autonomous exploration and mapping, 3D reconstruction or inspection, search and rescue, and environmental monitoring. A RIG system relies on a probabilistic model's prediction uncertainty to identify critical areas for informative data collection. Gaussian Processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data is typically non-stationary -- different locations do not have the same degree of variability. As a result, the prediction uncertainty does not accurately reveal prediction error, limiting the success of RIG algorithms. We propose a family of non-stationary kernels named Attentive Kernel (AK), which is simple, robust, and can extend any existing kernel to a non-stationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used stationary kernels and the leading non-stationary kernels. The improved uncertainty quantification guides the downstream informative planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with significant spatial variations, enabling the model to characterize salient environmental features.
Bayesian Optimisation Against Climate Change: Applications and Benchmarks
Hellan, Sigrid Passano, Lucas, Christopher G., Goddard, Nigel H.
Bayesian optimisation is a powerful method for optimising black-box functions, popular in settings where the true function is expensive to evaluate and no gradient information is available. Bayesian optimisation can improve responses to many optimisation problems within climate change for which simulator models are unavailable or expensive to sample from. While there have been several feasibility demonstrations of Bayesian optimisation in climate-related applications, there has been no unifying review of applications and benchmarks. We provide such a review here, to encourage the use of Bayesian optimisation in important and well-suited application domains. We identify four main application domains: material discovery, wind farm layout, optimal renewable control and environmental monitoring. For each domain we identify a public benchmark or data set that is easy to use and evaluate systems against, while being representative of real-world problems. Due to the lack of a suitable benchmark for environmental monitoring, we propose LAQN-BO, based on air pollution data. Our contributions are: a) identifying a representative range of benchmarks, providing example code where necessary; b) introducing a new benchmark, LAQN-BO; and c) promoting a wider use of climate change applications among Bayesian optimisation practitioners.
Geometry-Aware Coverage Path Planning for Depowdering on Complex 3D Surfaces
Do, Van-Thach, Pham, Quang-Cuong
This paper presents a new approach to obtaining nearly complete coverage paths (CP) with low overlapping on 3D general surfaces using mesh models. The CP is obtained by segmenting the mesh model into a given number of clusters using constrained centroidal Voronoi tessellation (CCVT) and finding the shortest path from cluster centroids using the geodesic metric efficiently. We introduce a new cost function to harmoniously achieve uniform areas of the obtained clusters and a restriction on the variation of triangle normals during the construction of CCVTs. The obtained clusters can be used to construct high-quality viewpoints (VP) for visual coverage tasks. Here, we utilize the planned VPs as cleaning configurations to perform residual powder removal in additive manufacturing using manipulator robots. The self-occlusion of VPs and ensuring collision-free robot configurations are addressed by integrating a proposed optimization-based strategy to find a set of candidate rays for each VP into the motion planning phase. CP planning benchmarks and physical experiments are conducted to demonstrate the effectiveness of the proposed approach. We show that our approach can compute the CPs and VPs of various mesh models with a massive number of triangles within a reasonable time.
Policy-Based Self-Competition for Planning Problems
Pirnay, Jonathan, Gรถttl, Quirin, Burger, Jakob, Grimm, Dominik Gerhard
AlphaZero-type algorithms may stop improving on single-player tasks in case the value network guiding the tree search is unable to approximate the outcome of an episode sufficiently well. One technique to address this problem is transforming the single-player task through self-competition. The main idea is to compute a scalar baseline from the agent's historical performances and to reshape an episode's reward into a binary output, indicating whether the baseline has been exceeded or not. However, this baseline only carries limited information for the agent about strategies how to improve. We leverage the idea of self-competition and directly incorporate a historical policy into the planning process instead of its scalar performance. Based on the recently introduced Gumbel AlphaZero (GAZ), we propose our algorithm GAZ 'Play-to-Plan' (GAZ PTP), in which the agent learns to find strong trajectories by planning against possible strategies of its past self. We show the effectiveness of our approach in two well-known combinatorial optimization problems, the Traveling Salesman Problem and the Job-Shop Scheduling Problem. With only half of the simulation budget for search, GAZ PTP consistently outperforms all selected single-player variants of GAZ.
Collision-free Motion Generation Based on Stochastic Optimization and Composite Signed Distance Field Networks of Articulated Robot
Liu, Baolin, Jiang, Gedong, Zhao, Fei, Mei, Xuesong
Safe robot motion generation is critical for practical applications from manufacturing to homes. In this work, we proposed a stochastic optimization-based motion generation method to generate collision-free and time-optimal motion for the articulated robot represented by composite signed distance field (SDF) networks. First, we propose composite SDF networks to learn the SDF for articulated robots. The learned composite SDF networks combined with the kinematics of the robot allow for quick and accurate estimates of the minimum distance between the robot and obstacles in a batch fashion. Then, a stochastic optimization-based trajectory planning algorithm generates a spatial-optimized and collision-free trajectory offline with the learned composite SDF networks. This stochastic trajectory planner is formulated as a Bayesian Inference problem with a time-normalized Gaussian process prior and exponential likelihood function. The Gaussian process prior can enforce initial and goal position constraints in Configuration Space. Besides, it can encode the correlation of waypoints in time series. The likelihood function aims at encoding task-related cost terms, such as collision avoidance, trajectory length penalty, boundary avoidance, etc. The kernel updating strategies combined with model-predictive path integral (MPPI) is proposed to solve the maximum a posteriori inference problems. Lastly, we integrate the learned composite SDF networks into the trajectory planning algorithm and apply it to a Franka Emika Panda robot. The simulation and experiment results validate the effectiveness of the proposed method.
Alexa Arena: A User-Centric Interactive Platform for Embodied AI
Gao, Qiaozi, Thattai, Govind, Shakiah, Suhaila, Gao, Xiaofeng, Pansare, Shreyas, Sharma, Vasu, Sukhatme, Gaurav, Shi, Hangjie, Yang, Bofei, Zheng, Desheng, Hu, Lucy, Arumugam, Karthika, Hu, Shui, Wen, Matthew, Guthy, Dinakar, Chung, Cadence, Khanna, Rohan, Ipek, Osman, Ball, Leslie, Bland, Kate, Rocker, Heather, Rao, Yadunandana, Johnston, Michael, Ghanadan, Reza, Mandal, Arindam, Tur, Dilek Hakkani, Natarajan, Prem
We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus opening a new venue for high-efficiency HRI data collection and EAI system evaluation. Along with the platform, we introduce a dialog-enabled instruction-following benchmark and provide baseline results for it. We make Alexa Arena publicly available to facilitate research in building generalizable and assistive embodied agents.
Coverage Path Planning with Budget Constraints for Multiple Unmanned Ground Vehicles
Tran, Vu Phi, Perera, Asanka, Garratt, Matthew A., Kasmarik, Kathryn, Anavatti, Sreenatha
This paper proposes a state-machine model for a multi-modal, multi-robot environmental sensing algorithm. This multi-modal algorithm integrates two different exploration algorithms: (1) coverage path planning using variable formations and (2) collaborative active sensing using multi-robot swarms. The state machine provides the logic for when to switch between these different sensing algorithms. We evaluate the performance of the proposed approach on a gas source localisation and mapping task. We use hardware-in-the-loop experiments and real-time experiments with a radio source simulating a real gas field. We compare the proposed approach with a single-mode, state-of-the-art collaborative active sensing approach. Our results indicate that our multi-modal switching approach can converge more rapidly than single-mode active sensing.
DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV
Mao, Xiao, Cao, Zhiguang, Fan, Mingfeng, Wu, Guohua, Pedrycz, Witold
Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To solve the underlying task scheduling problem, the deep reinforcement learning (DRL) based methods demonstrate notable advantage over the conventional heuristics as they rely less on hand-engineered rules. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1000 tasks. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency.
Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions
Improving the decision-making capabilities of agents is a key challenge on the road to artificial intelligence. To improve the planning skills needed to make good decisions, MuZero's agent combines prediction by a network model and planning by a tree search using the predictions. MuZero's learning process can fail when predictions are poor but planning requires them. We use this as an impetus to get the agent to explore parts of the decision tree in the environment that it otherwise would not explore. The agent achieves this, first by normal planning to come up with an improved policy. Second, it randomly deviates from this policy at the beginning of each training episode. And third, it switches back to the improved policy at a random time step to experience the rewards from the environment associated with the improved policy, which is the basis for learning the correct value expectation. The simple board game Tic-Tac-Toe is used to illustrate how this approach can improve the agent's decision-making ability. The source code, written entirely in Java, is available at https://github.com/enpasos/muzero.
Potential memorial designs for Las Vegas massacre unveiled, major step in planning process
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. A series of white angel wings rise up from the earth bathed in a warm glow of light, their sweeping forms creating a long covered pathway surrounded by trees in a possible centerpiece for the memorial to modern America's deadliest mass shooting. It's one of five potential designs unveiled Monday for a permanent monument on the Las Vegas Strip where 58 people were shot and killed and hundreds more injured at a country music festival on Oct. 1, 2017. Two survivors later died from their gunshot wounds.