Reinforcement Learning
Batch-Augmented Multi-Agent Reinforcement Learning for Efficient Traffic Signal Optimization
Wu, Yueh-Hua, Yeh, I-Hau, Hu, David, Liao, Hong-Yuan Mark
The goal of this work is to provide a viable solution based on reinforcement learning for traffic signal control problems. Although the state-of-the-art reinforcement learning approaches have yielded great success in a variety of domains, directly applying it to alleviate traffic congestion can be challenging, considering the requirement of high sample efficiency and how training data is gathered. In this work, we address several challenges that we encountered when we attempted to mitigate serious traffic congestion occurring in a metropolitan area. Specifically, we are required to provide a solution that is able to (1) handle the traffic signal control when certain surveillance cameras that retrieve information for reinforcement learning are down, (2) learn from batch data without a traffic simulator, and (3) make control decisions without shared information across intersections. We present a two-stage framework to deal with the above-mentioned situations. The framework can be decomposed into an Evolution Strategies approach that gives a fixed-time traffic signal control schedule and a multi-agent off-policy reinforcement learning that is capable of learning from batch data with the aid of three proposed components, bounded action, batch augmentation, and surrogate reward clipping. Our experiments show that the proposed framework reduces traffic congestion by 36% in terms of waiting time compared with the currently used fixed-time traffic signal plan. Furthermore, the framework requires only 600 queries to a simulator to achieve the result.
Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning
Gottipati, Sai Krishna, Sattarov, Boris, Niu, Sufeng, Pathak, Yashaswi, Wei, Haoran, Liu, Shengchao, Thomas, Karam M. J., Blackburn, Simon, Coley, Connor W., Tang, Jian, Chandar, Sarath, Bengio, Yoshua
Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models. However, current generative approaches exhibit a significant challenge as they do not ensure that the proposed molecular structures can be feasibly synthesized nor do they provide the synthesis routes of the proposed small molecules, thereby seriously limiting their practical applicability. In this work, we propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo drug design system. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space by subjecting commercially available small molecule building blocks to valid chemical reactions at every time step of the iterative virtual multi-step synthesis process. The proposed environment for drug discovery provides a highly challenging test-bed for RL algorithms owing to the large state space and high-dimensional continuous action space with hierarchical actions. PGFS achieves state-of-the-art performance in generating structures with high QED and penalized clogP. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets. Finally, we describe how the end-to-end training conceptualized in this study represents an important paradigm in radically expanding the synthesizable chemical space and automating the drug discovery process.
Artificial Intelligence for Business
Online Courses Udemy Artificial Intelligence for Business, Solve Real World Business Problems with AI Solutions Created by Hadelin de Ponteves, Kirill Eremenko, SuperDataScience Team English [Auto-generated], French [Auto-generated], 5 more Students also bought Data Science: Natural Language Processing (NLP) in Python Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Tensorflow 2.0: Deep Learning and Artificial Intelligence Machine Learning Practical: 6 Real-World Applications Artificial Intelligence: Reinforcement Learning in Python Preview this course GET COUPON CODE Description Structure of the course: Part 1 - Optimizing Business Processes Case Study: Optimizing the Flows in an E-Commerce Warehouse AI Solution: Q-Learning Part 2 - Minimizing Costs Case Study: Minimizing the Costs in Energy Consumption of a Data Center AI Solution: Deep Q-Learning Part 3 - Maximizing Revenues Case Study: Maximizing Revenue of an Online Retail Business AI Solution: Thompson Sampling Real World Business Applications: With Artificial Intelligence, you can do three main things for any business: Optimize Business Processes Minimize Costs Maximize Revenues We will show you exactly how to succeed these applications, through Real World Business case studies. And for each of these applications we will build a separate AI to solve the challenge. In Part 1 - Optimizing Processes, we will build an AI that will optimize the flows in an E-Commerce warehouse. In Part 2 - Minimizing Costs, we will build a more advanced AI that will minimize the costs in energy consumption of a data center by more than 50%! Just as Google did last year thanks to DeepMind.
Dampen the Stop-and-Go Traffic with Connected and Automated Vehicles -- A Deep Reinforcement Learning Approach
Jiang, Liming, Xie, Yuanchang, Chen, Danjue, Li, Tienan, Evans, Nicholas G.
Stop-and-go traffic poses many challenges to tranportation system, but its formation and mechanism are still under exploration.however, it has been proved that by introducing Connected Automated Vehicles(CAVs) with carefully designed controllers one could dampen the stop-and-go waves in the vehicle fleet. Instead of using analytical model, this study adopts reinforcement learning to control the behavior of CAV and put a single CAV at the 2nd position of a vehicle fleet with the purpose to dampen the speed oscillation from the fleet leader and help following human drivers adopt more smooth driving behavior. The result show that our controller could decrease the spped oscillation of the CAV by 54% and 8%-28% for those following human-driven vehicles. Significant fuel consumption savings are also observed. Additionally, the result suggest that CAVs may act as a traffic stabilizer if they choose to behave slightly altruistically.
AI and Machine Learning for Healthcare - KDnuggets
The 21st century is only two decades old and it is certain that one of the biggest transformative technologies and enablers for human society of this century is going to be Artificial intelligence (AI). It is a well-established idea that AI and associated services and platforms are set to transform global productivity, working patterns, and lifestyles and create enormous wealth. For example, McKinsey sees it delivering global economic activity of around $13 trillion by 2030. In the short-term, research firm Gartner expects the global AI-based economic activity to increase from about $1.2 trillion in 2018 to about $3.9 Trillion by 2022. It is no secret that this transformation is being, to a large extent, fueled by the powerful Machine Learning (ML) tools and techniques such as Deep Convolutional Networks, Generative Adversarial Networks (GAN), Gradient-boosted-tree models (GBM), Deep Reinforcement Learning (DRL), etc. However, traditional business and technology sectors are not the only fields being impacted by AI.
Discovering Hierarchies for Reinforcement Learning Using Data Mining
Mobley, Dave (University of Kentucky) | Goldsmith, Judy (University of Kentucky) | Harrison, Brent (University of Kentucky)
Reinforcement Learning has the limitation that problems become too large very quickly. Dividing the problem into a hierarchy of subtasks allows for a strategy of divide and conquer, which is what makes Hierarchical Reinforcement Learning (HRL) algorithms often more efficient at finding solutions quicker than more naive approaches. One of the biggest challenges with HRL is the construction of a hierarchy to be used by the algorithm. Hierarchies are often designed by a person using their own knowledge of the problem. We propose method for automatically discovering task hierarchies based on a data mining technique, Association Rule Learning (ARL). These hierarchies can then be applied to Semi-Markov Decision Process (SMDP) problems using the options technique
Lifelong Control of Off-grid Microgrid with Model Based Reinforcement Learning
Totaro, Simone, Boukas, Ioannis, Jonsson, Anders, Cornรฉlusse, Bertrand
The lifelong control problem of an off-grid microgrid is composed of two tasks, namely estimation of the condition of the microgrid devices and operational planning accounting for the uncertainties by forecasting the future consumption and the renewable production. The main challenge for the effective control arises from the various changes that take place over time. In this paper, we present an open-source reinforcement framework for the modeling of an off-grid microgrid for rural electrification. The lifelong control problem of an isolated microgrid is formulated as a Markov Decision Process (MDP). We categorize the set of changes that can occur in progressive and abrupt changes. We propose a novel model based reinforcement learning algorithm that is able to address both types of changes. In particular the proposed algorithm demonstrates generalisation properties, transfer capabilities and better robustness in case of fast-changing system dynamics. The proposed algorithm is compared against a rule-based policy and a model predictive controller with look-ahead. The results show that the trained agent is able to outperform both benchmarks in the lifelong setting where the system dynamics are changing over time.
Model-Augmented Actor-Critic: Backpropagating through Paths
Clavera, Ignasi, Fu, Violet, Abbeel, Pieter
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning. In this paper, we show how to make more effective use of the model by exploiting its differentiability. We construct a policy optimization algorithm that uses the pathwise derivative of the learned model and policy across future timesteps. Instabilities of learning across many timesteps are prevented by using a terminal value function, learning the policy in an actor-critic fashion. Furthermore, we present a derivation on the monotonic improvement of our objective in terms of the gradient error in the model and value function. We show that our approach (i) is consistently more sample efficient than existing state-of-the-art model-based algorithms, (ii) matches the asymptotic performance of model-free algorithms, and (iii) scales to long horizons, a regime where typically past model-based approaches have struggled.
Mutual Information Maximization for Robust Plannable Representations
Ding, Yiming, Clavera, Ignasi, Abbeel, Pieter
Extending the capabilities of robotics to real-world complex, unstructured environments requires the need of developing better perception systems while maintaining low sample complexity. When dealing with high-dimensional state spaces, current methods are either model-free or model-based based on reconstruction objectives. The sample inefficiency of the former constitutes a major barrier for applying them to the real-world. The later, while they present low sample complexity, they learn latent spaces that need to reconstruct every single detail of the scene. In real environments, the task typically just represents a small fraction of the scene. Reconstruction objectives suffer in such scenarios as they capture all the unnecessary components. In this work, we present MIRO, an information theoretic representational learning algorithm for model-based reinforcement learning. We design a latent space that maximizes the mutual information with the future information while being able to capture all the information needed for planning. We show that our approach is more robust than reconstruction objectives in the presence of distractors and cluttered scenes
Data Driven Aircraft Trajectory Prediction with Deep Imitation Learning
Bastas, Alevizos, Kravaris, Theocharis, Vouros, George A.
The current Air Traffic Management (ATM) system worldwide has reached its limits in terms of predictability, efficiency and cost effectiveness. Different initiatives worldwide propose trajectory-oriented transformations that require high fidelity aircraft trajectory planning and prediction capabilities, supporting the trajectory life cycle at all stages efficiently. Recently proposed data-driven trajectory prediction approaches provide promising results. In this paper we approach the data-driven trajectory prediction problem as an imitation learning task, where we aim to imitate experts "shaping" the trajectory. Towards this goal we present a comprehensive framework comprising the Generative Adversarial Imitation Learning state of the art method, in a pipeline with trajectory clustering and classification methods. This approach, compared to other approaches, can provide accurate predictions for the whole trajectory (i.e. with a prediction horizon until reaching the destination) both at the pre-tactical (i.e. starting at the departure airport at a specific time instant) and at the tactical (i.e. from any state while flying) stages, compared to state of the art approaches.