Reinforcement Learning
Machine Learning Tutorial Part 1 Machine Learning For Beginners
Sign in to report inappropriate content. This Machine Learning tutorial will introduce you to the different areas of Machine Learning and Artificial Intelligence. In this part of the course you will learn about the three different learning types (Unsupervised learning, Supervised Learning and Reinforcement Learning) For more see: https://www.Vinsloev.com Remember to Subscribe to the channel to see the upcoming parts of this Tutorial as well.
Evaluating Generalisation in General Video Game Playing
Balla, Martin, Lucas, Simon M., Perez-Liebana, Diego
The General Video Game Artificial Intelligence (GVGAI) competition has been running for several years with various tracks. This paper focuses on the challenge of the GVGAI learning track in which 3 games are selected and 2 levels are given for training, while 3 hidden levels are left for evaluation. This setup poses a difficult challenge for current Reinforcement Learning (RL) algorithms, as they typically require much more data. This work investigates 3 versions of the Advantage Actor-Critic (A2C) algorithm trained on a maximum of 2 levels from the available 5 from the GVGAI framework and compares their performance on all levels. The selected sub-set of games have different characteristics, like stochasticity, reward distribution and objectives. We found that stochasticity improves the generalisation, but too much can cause the algorithms to fail to learn the training levels. The quality of the training levels also matters, different sets of training levels can boost generalisation over all levels. In the GVGAI competition agents are scored based on their win rates and then their scores achieved in the games. We found that solely using the rewards provided by the game might not encourage winning.
Reinforcement learning with human advice. A survey
Najar, Anis, Chetouani, Mohamed
In this paper, we provide an overview of the existing methods for integrating human advice into a Reinforcement Learning process. We propose a taxonomy of different types of teaching signals, and present them according to three main aspects: how they can be provided to the learning agent, how they can be integrated into the learning process, and how they can be interpreted by the agent if their meaning is not determined beforehand. Finally, we compare the benefits and limitations of using each type of teaching signals, and propose a unified view of interactive learning methods.
Attention Routing: track-assignment detailed routing using attention-based reinforcement learning
Liao, Haiguang, Dong, Qingyi, Dong, Xuliang, Zhang, Wentai, Zhang, Wangyang, Qi, Weiyi, Fallon, Elias, Kara, Levent Burak
In the physical design of integrated circuits, global and detailed routing are critical stages involving the determination of the interconnected paths of each net on a circuit while satisfying the design constraints. Existing actual routers as well as routability predictors either have to resort to expensive approaches that lead to high computational times, or use heuristics that do not generalize well. Even though new, learning-based routing methods have been proposed to address this need, requirements on labelled data and difficulties in addressing complex design rule constraints have limited their adoption in advanced technology node physical design problems. In this work, we propose a new router: attention router, which is the first attempt to solve the track-assignment detailed routing problem using reinforcement learning. Complex design rule constraints are encoded into the routing algorithm and an attention-model-based REINFORCE algorithm is applied to solve the most critical step in routing: sequencing device pairs to be routed. The attention router and its baseline genetic router are applied to solve different commercial advanced technologies analog circuits problem sets. The attention router demonstrates generalization ability to unseen problems and is also able to achieve more than 100 times acceleration over the genetic router without significantly compromising the routing solution quality. We also discover a similarity between the attention router and the baseline genetic router in terms of positive correlations in cost and routing patterns, which demonstrate the attention router's ability to be utilized not only as a detailed router but also as a predictor for routability and congestion.
Towards Automated Safety Coverage and Testing for Autonomous Vehicles with Reinforcement Learning
The kind of closed-loop verification likely to be required for autonomous vehicle (AV) safety testing is beyond the reach of traditional test methodologies and discrete verification. Validation puts the autonomous vehicle system to the test in scenarios or situations that the system would likely encounter in everyday driving after its release. These scenarios can either be controlled directly in a physical (closed-course proving ground) or virtual (simulation of predefined scenarios) environment, or they can arise spontaneously during operation in the real world (open-road testing or simulation of randomly generated scenarios). In AV testing, simulation serves primarily two purposes: to assist the development of a robust autonomous vehicle and to test and validate the AV before release. A challenge arises from the sheer number of scenario variations that can be constructed from each of the above sources due to the high number of variables involved (most of which are continuous). Even with continuous variables discretized, the possible number of combinations becomes practically infeasible to test. To overcome this challenge we propose using reinforcement learning (RL) to generate failure examples and unexpected traffic situations for the AV software implementation. Although reinforcement learning algorithms have achieved notable results in games and some robotic manipulations, this technique has not been widely scaled up to the more challenging real world applications like autonomous driving.
Physically realistic attacks on deep reinforcement learning
Deep reinforcement learning (RL) has achieved superhuman performance in problems ranging from data center cooling to video games. RL policies may soon be widely deployed, with research underway in autonomous driving, negotiation and automated trading. Many potential applications are safety-critical: automated trading failures caused Knight Capital to lose USD 460M, while faulty autonomous vehicles have resulted in loss of life. Consequently, it is critical that RL policies are robust: both to naturally occurring distribution shift, and to malicious attacks by adversaries. Unfortunately, we find that RL policies which perform at a high-level in normal situations can harbor serious vulnerabilities which can be exploited by an adversary. Prior work has shown deep RL policies are vulnerable to small adversarial perturbations to their observations, similar to adversarial examples in image classifiers.
Distributed Resource Scheduling for Large-Scale MEC Systems: A Multi-Agent Ensemble Deep Reinforcement Learning with Imitation Acceleration
Jiang, Feibo, Dong, Li, Wang, Kezhi, Yang, Kun, Pan, Cunhua
We consider the optimization of distributed resource scheduling to minimize the sum of task latency and energy consumption for all the Internet of things devices (IoTDs) in a large-scale mobile edge computing (MEC) system. To address this problem, we propose a distributed intelligent resource scheduling (DIRS) framework, which includes centralized training relying on the global information and distributed decision making by each agent deployed in each MEC server. More specifically, we first introduce a novel multi-agent ensemble-assisted distributed deep reinforcement learning (DRL) architecture, which can simplify the overall neural network structure of each agent by partitioning the state space and also improve the performance of a single agent by combining decisions of all the agents. Secondly, we apply action refinement to enhance the exploration ability of the proposed DIRS framework, where the near-optimal state-action pairs are obtained by a novel L\'evy flight search. Finally, an imitation acceleration scheme is presented to pre-train all the agents, which can significantly accelerate the learning process of the proposed framework through learning the professional experience from a small amount of demonstration data. Extensive simulations are conducted to demonstrate that the proposed DIRS framework is efficient and outperforms the existing benchmark schemes.
A clustering-based reinforcement learning approach for tailored personalization of e-Health interventions
Hassouni, Ali el, Hoogendoorn, Mark, van Otterlo, Martijn, Eiben, A. E., Muhonen, Vesa, Barbaro, Eduardo
Personalization is very powerful in improving the effectiveness of health interventions. Reinforcement learning (RL) algorithms are suitable for learning these tailored interventions from sequential data collected about individuals. However, learning can be very fragile. The time to learn intervention policies is limited as disengagement from the user can occur quickly. Also, in e-Health intervention timing can be crucial before the optimal window passes. We present an approach that learns tailored personalization policies for groups of users by combining RL and clustering. The benefits are two-fold: speeding up the learning to prevent disengagement while maintaining a high level of personalization. Our clustering approach utilizes dynamic time warping to compare user trajectories consisting of states and rewards. We apply online and batch RL to learn policies over clusters of individuals and introduce our self-developed and publicly available simulator for e-Health interventions to evaluate our approach. We compare our methods with an e-Health intervention benchmark. We demonstrate that batch learning outperforms online learning for our setting. Furthermore, our proposed clustering approach for RL finds near-optimal clusterings which lead to significantly better policies in terms of cumulative reward compared to learning a policy per individual or learning one non-personalized policy across all individuals. Our findings also indicate that the learned policies accurately learn to send interventions at the right moments and that the users workout more and at the right times of the day.
Decentralized Deep Reinforcement Learning for a Distributed and Adaptive Locomotion Controller of a Hexapod Robot
Schilling, Malte, Konen, Kai, Ohl, Frank W., Korthals, Timo
Locomotion is a prime example for adaptive behavior in animals and biological control principles have inspired control architectures for legged robots. While machine learning has been successfully applied to many tasks in recent years, Deep Reinforcement Learning approaches still appear to struggle when applied to real world robots in continuous control tasks and in particular do not appear as robust solutions that can handle uncertainties well. Therefore, there is a new interest in incorporating biological principles into such learning architectures. While inducing a hierarchical organization as found in motor control has shown already some success, we here propose a decentralized organization as found in insect motor control for coordination of different legs. A decentralized and distributed architecture is introduced on a simulated hexapod robot and the details of the controller are learned through Deep Reinforcement Learning. We first show that such a concurrent local structure is able to learn better walking behavior. Secondly, that the simpler organization is learned faster compared to holistic approaches.