Planning & Scheduling
A deep Q-Learning based Path Planning and Navigation System for Firefighting Environments
Bhattarai, Manish, Martinez-Ramon, Manel
Live fire creates a dynamic, rapidly changing environment that presents a worthy challenge for deep learning and artificial intelligence methodologies to assist firefighters with scene comprehension in maintaining their situational awareness, tracking and relay of important features necessary for key decisions as they tackle these catastrophic events. We propose a deep Q-learning based agent who is immune to stress induced disorientation and anxiety and thus able to make clear decisions for navigation based on the observed and stored facts in live fire environments. As a proof of concept, we imitate structural fire in a gaming engine called Unreal Engine which enables the interaction of the agent with the environment. The agent is trained with a deep Q-learning algorithm based on a set of rewards and penalties as per its actions on the environment. We exploit experience replay to accelerate the learning process and augment the learning of the agent with human-derived experiences. The agent trained under this deep Q-learning approach outperforms agents trained through alternative path planning systems and demonstrates this methodology as a promising foundation on which to build a path planning navigation assistant capable of safely guiding fire fighters through live fire environments.
Machine Learning Based Path Planning for Improved Rover Navigation (Pre-Print Version)
Abcouwer, Neil, Daftry, Shreyansh, Venkatraman, Siddarth, del Sesto, Tyler, Toupet, Olivier, Lanka, Ravi, Song, Jialin, Yue, Yisong, Ono, Masahiro
Enhanced AutoNav (ENav), the baseline surface navigation software for NASA's Perseverance rover, sorts a list of candidate paths for the rover to traverse, then uses the Approximate Clearance Evaluation (ACE) algorithm to evaluate whether the most highly ranked paths are safe. ACE is crucial for maintaining the safety of the rover, but is computationally expensive. If the most promising candidates in the list of paths are all found to be infeasible, ENav must continue to search the list and run time-consuming ACE evaluations until a feasible path is found. In this paper, we present two heuristics that, given a terrain heightmap around the rover, produce cost estimates that more effectively rank the candidate paths before ACE evaluation. The first heuristic uses Sobel operators and convolution to incorporate the cost of traversing high-gradient terrain. The second heuristic uses a machine learning (ML) model to predict areas that will be deemed untraversable by ACE. We used physics simulations to collect training data for the ML model and to run Monte Carlo trials to quantify navigation performance across a variety of terrains with various slopes and rock distributions. Compared to ENav's baseline performance, integrating the heuristics can lead to a significant reduction in ACE evaluations and average computation time per planning cycle, increase path efficiency, and maintain or improve the rate of successful traverses. This strategy of targeting specific bottlenecks with ML while maintaining the original ACE safety checks provides an example of how ML can be infused into planetary science missions and other safety-critical software.
Using Machine Learning for Decreasing State Uncertainty in Planning
Krivic, Senka (Kings College london) | Cashmore, Michael | Magazzeni, Daniele | Szedmak, Sandor | Piater, Justus
We present a novel approach for decreasing state uncertainty in planning prior to solving the planning problem. This is done by making predictions about the state based on currently known information, using machine learning techniques. For domains where uncertainty is high, we define an active learning process for identifying which information, once sensed, will best improve the accuracy of predictions. We demonstrate that an agent is able to solve problems with uncertainties in the state with less planning effort compared to standard planning techniques. Moreover, agents can solve problems for which they could not find valid plans without using predictions. Experimental results also demonstrate that using our active learning process for identifying information to be sensed leads to gathering information that improves the prediction process.
Thermal Prediction for Efficient Energy Management of Clouds using Machine Learning
Ilager, Shashikant, Ramamohanarao, Kotagiri, Buyya, Rajkumar
Thermal management in the hyper-scale cloud data centers is a critical problem. Increased host temperature creates hotspots which significantly increases cooling cost and affects reliability. Accurate prediction of host temperature is crucial for managing the resources effectively. Temperature estimation is a non-trivial problem due to thermal variations in the data center. Existing solutions for temperature estimation are inefficient due to their computational complexity and lack of accurate prediction. However, data-driven machine learning methods for temperature prediction is a promising approach. In this regard, we collect and study data from a private cloud and show the presence of thermal variations. We investigate several machine learning models to accurately predict the host temperature. Specifically, we propose a gradient boosting machine learning model for temperature prediction. The experiment results show that our model accurately predicts the temperature with the average RMSE value of 0.05 or an average prediction error of 2.38 degree Celsius, which is 6 degree Celsius less as compared to an existing theoretical model. In addition, we propose a dynamic scheduling algorithm to minimize the peak temperature of hosts. The results show that our algorithm reduces the peak temperature by 6.5 degree Celsius and consumes 34.5% less energy as compared to the baseline algorithm.
Deep Reactive Planning in Dynamic Environments
Ota, Kei, Jha, Devesh K., Onishi, Tadashi, Kanezaki, Asako, Yoshiyasu, Yusuke, Sasaki, Yoko, Mariyama, Toshisada, Nikovski, Daniel
The main novelty of the proposed approach is that it allows a robot to learn an end-to-end policy which can adapt to changes in the environment during execution. While goal conditioning of policies has been studied in the RL literature, such approaches are not easily extended to cases where the robot's goal can change during execution. This is something that humans are naturally able to do. However, it is difficult for robots to learn such reflexes (i.e., to naturally respond to dynamic environments), especially when the goal location is not explicitly provided to the robot, and instead needs to be perceived through a vision sensor. In the current work, we present a method that can achieve such behavior by combining traditional kinematic planning, deep learning, and deep reinforcement learning in a synergistic fashion to generalize to arbitrary environments. We demonstrate the proposed approach for several reaching and pick-and-place tasks in simulation, as well as on a real system of a 6-DoF industrial manipulator. A video describing our work could be found \url{https://youtu.be/hE-Ew59GRPQ}.
The 7 Day +1 Supercharge Your Life Challenge. Goal Setting
Many people want to change but they don't know how, or think it is not possible. However, I am here to tell you that you can change your life within 8 days. I know this may sound unbelievable but it is true! All it takes is being able to identify what is holding you back, create goals, have strength and a desire to keep on going. Richard Butler is going to guide you through the process of making significant changes in your life โ are you ready to start a whole new, successful life in the next 8 days?
Adaptive Stress Testing of Trajectory Predictions in Flight Management Systems
Moss, Robert J., Lee, Ritchie, Visser, Nicholas, Hochwarth, Joachim, Lopez, James G., Kochenderfer, Mykel J.
To find failure events and their likelihoods in flight-critical systems, we investigate the use of an advanced black-box stress testing approach called adaptive stress testing. We analyze a trajectory predictor from a developmental commercial flight management system which takes as input a collection of lateral waypoints and en-route environmental conditions. Our aim is to search for failure events relating to inconsistencies in the predicted lateral trajectories. The intention of this work is to find likely failures and report them back to the developers so they can address and potentially resolve shortcomings of the system before deployment. To improve search performance, this work extends the adaptive stress testing formulation to be applied more generally to sequential decision-making problems with episodic reward by collecting the state transitions during the search and evaluating at the end of the simulated rollout. We use a modified Monte Carlo tree search algorithm with progressive widening as our adversarial reinforcement learner. The performance is compared to direct Monte Carlo simulations and to the cross-entropy method as an alternative importance sampling baseline. The goal is to find potential problems otherwise not found by traditional requirements-based testing. Results indicate that our adaptive stress testing approach finds more failures and finds failures with higher likelihood relative to the baseline approaches.
Maximizing Store Revenues using Tabu Search for Floor Space Optimization
Xu, Jiefeng, Gul, Evren, Lim, Alvin
Floor space optimization is a critical revenue management problem commonly encountered by retailers. It maximizes store revenue by optimally allocating floor space to product categories which are assigned to their most appropriate planograms. We formulate the problem as a connected multi-choice knapsack problem with an additional global constraint and propose a tabu search based meta-heuristic that exploits the multiple special neighborhood structures. We also incorporate a mechanism to determine how to combine the multiple neighborhood moves. A candidate list strategy based on learning from prior search history is also employed to improve the search quality. The results of computational testing with a set of test problems show that our tabu search heuristic can solve all problems within a reasonable amount of time. Analyses of individual contributions of relevant components of the algorithm were conducted with computational experiments.
Domain-independent generation and classification of behavior traces
Borrajo, Daniel, Veloso, Manuela
Financial institutions mostly deal with people. Therefore, characterizing different kinds of human behavior can greatly help institutions for improving their relation with customers and with regulatory offices. In many of such interactions, humans have some internal goals, and execute some actions within the financial system that lead them to achieve their goals. In this paper, we tackle these tasks as a behavior-traces classification task. An observer agent tries to learn characterizing other agents by observing their behavior when taking actions in a given environment. The other agents can be of several types and the goal of the observer is to identify the type of the other agent given a trace of observations. We present CABBOT, a learning technique that allows the agent to perform on-line classification of the type of planning agent whose behavior is observing. In this work, the observer agent has partial and noisy observability of the environment (state and actions of the other agents). In order to evaluate the performance of the learning technique, we have generated a domain-independent goal-based simulator of agents. We present experiments in several (both financial and non-financial) domains with promising results.
Rearrangement: A Challenge for Embodied AI
Batra, Dhruv, Chang, Angel X., Chernova, Sonia, Davison, Andrew J., Deng, Jia, Koltun, Vladlen, Levine, Sergey, Malik, Jitendra, Mordatch, Igor, Mottaghi, Roozbeh, Savva, Manolis, Su, Hao
We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specified by object poses, by images, by a description in language, or by letting the agent experience the environment in the goal state. We characterize rearrangement scenarios along different axes and describe metrics for benchmarking rearrangement performance. To facilitate research and exploration, we present experimental testbeds of rearrangement scenarios in four different simulation environments. We anticipate that other datasets will be released and new simulation platforms will be built to support training of rearrangement agents and their deployment on physical systems.