Reinforcement Learning
Learning to Play Soccer by Reinforcement and Applying Sim-to-Real to Compete in the Real World
Bassani, Hansenclever F., Delgado, Renie A., Junior, Jose Nilton de O. Lima, Medeiros, Heitor R., Braga, Pedro H. M., Tapp, Alain
This work presents an application of Reinforcement Learning (RL) for the complete control of real soccer robots of the IEEE Very Small Size Soccer (VSSS) [1], a traditional league in the Latin American Robotics Competition (LARC). In the VSSS league, two teams of three small robots play against each other. We propose a simulated environment in which continuous or discrete control policies can be trained, and a Sim-to-Real method to allow using the obtained policies to control a robot in the real world. The results show that the learned policies display a broad repertoire of behaviors which are difficult to specify by hand. This approach, called VSSS-RL, was able to beat the human-designed policy for the striker of the team ranked 3rd place in the 2018 LARC, in 1-vs-1 matches.
Putting artificial intelligence to work in the lab: Automated scanning probe microscopy controlled by artificial intelligence/machine learning
The new system, dubbed DeepSPM, bridges the gap between nanoscience, automation and artificial intelligence (AI), and firmly establishes the use of machine learning for experimental scientific research. "Optimising SPM data acquisition can be very tedious. This optimisation process is usually performed by the human experimentalist, and is rarely reported," says FLEET Chief Investigator Dr Agustin Schiffrin (Monash University). "Our new AI-driven system can operate and acquire optimal SPM data autonomously, for multiple straight days, and without any human supervision." The advance brings advanced SPM methodologies such as atomically-precise nanofabrication and high-throughput data acquisition closer to a fully automated turnkey application.
Deep Reinforcement Learning with Smooth Policy
Shen, Qianli, Li, Yan, Jiang, Haoming, Wang, Zhaoran, Zhao, Tuo
Deep neural networks have been widely adopted in modern reinforcement learning (RL) algorithms with great empirical successes in various domains. However, the large search space of training a neural network requires a significant amount of data, which makes the current RL algorithms not sample efficient. Motivated by the fact that many environments with continuous state space have smooth transitions, we propose to learn a smooth policy that behaves smoothly with respect to states. In contrast to policies parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy. In this paper, we develop a new training framework --- $\textbf{S}$mooth $\textbf{R}$egularized $\textbf{R}$einforcement $\textbf{L}$earning ($\textbf{SR}^2\textbf{L}$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space of the learning algorithms and enforces smoothness in the learned policy. We apply the proposed framework to both on-policy (TRPO) and off-policy algorithm (DDPG). Through extensive experiments, we demonstrate that our method achieves improved sample efficiency.
Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward
Sheikh, Hassam Ullah, Bรถlรถni, Ladislau
Many cooperative multi-agent problems require agents to learn individual tasks while contributing to the collective success of the group. This is a challenging task for current state-of-the-art multi-agent reinforcement algorithms that are designed to either maximize the global reward of the team or the individual local rewards. The problem is exacerbated when either of the rewards is sparse leading to unstable learning. To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to maximize the global and local rewards. We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG algorithm.
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
Long, Qian, Zhou, Zihan, Gupta, Abhibav, Fang, Fei, Wu, Yi, Wang, Xiaolong
In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially. The project page is https://sites.google.com/view/epciclr2020.
Using Deep Reinforcement Learning Methods for Autonomous Vessels in 2D Environments
Etemad, Mohammad, Zare, Nader, Sarvmaili, Mahtab, Soares, Amilcar, Machado, Bruno Brandoli, Matwin, Stan
Unmanned Surface Vehicles technology (USVs) is an exciting topic that essentially deploys an algorithm to safely and efficiently performs a mission. Although reinforcement learning is a well-known approach to modeling such a task, instability and divergence may occur when combining off-policy and function approximation. In this work, we used deep reinforcement learning combining Q-learning with a neural representation to avoid instability. Our methodology uses deep q-learning and combines it with a rolling wave planning approach on agile methodology. Our method contains two critical parts in order to perform missions in an unknown environment. The first is a path planner that is responsible for generating a potential effective path to a destination without considering the details of the root. The latter is a decision-making module that is responsible for short-term decisions on avoiding obstacles during the near future steps of USV exploitation within the context of the value function. Simulations were performed using two algorithms: a basic vanilla vessel navigator (VVN) as a baseline and an improved one for the vessel navigator with a planner and local view (VNPLV). Experimental results show that the proposed method enhanced the performance of VVN by 55.31 on average for long-distance missions. Our model successfully demonstrated obstacle avoidance by means of deep reinforcement learning using planning adaptive paths in unknown environments.
Google algorithm teaches robot how to walk in mere hours
A new robot has overcome a fundamental challenge of locomotion by teaching itself how to walk. Researchers from Google developed algorithms that helped the four-legged bot to learn how to walk across a range of surfaces within just hours of practice, annihilating the record times set by its human overlords. Their system uses deep reinforcement learning, a form of AI that teaches through trial and error by providing rewards for certain actions. This technique is typically evaluated in virtual environments. However, building simulations that could replicate the robot walking on various surfaces would be highly complex and time-consuming, so the researchers chose to train their system in the real world.
Deep Reinforcement Learning Course
Some of the agents you'll implement during this course: This course is a series of articles and videos where you'll master the skills and architectures you need, to become a deep reinforcement learning expert. You'll build a strong professional portfolio by implementing awesome agents with Tensorflow that learns to play Space invaders, Doom, Sonic the hedgehog and more!
Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics
Mosavi, Amir, Ghamisi, Pedram, Faghan, Yaser, Duan, Puhong
The popularity of deep reinforcement learning (DRL) methods in economics have been exponentially increased. DRL through a wide range of capabilities from reinforcement learning (RL) and deep learning (DL) for handling sophisticated dynamic business environments offers vast opportunities. DRL is characterized by scalability with the potential to be applied to high-dimensional problems in conjunction with noisy and nonlinear patterns of economic data. In this work, we first consider a brief review of DL, RL, and deep RL methods in diverse applications in economics providing an in-depth insight into the state of the art. Furthermore, the architecture of DRL applied to economic applications is investigated in order to highlight the complexity, robustness, accuracy, performance, computational tasks, risk constraints, and profitability. The survey results indicate that DRL can provide better performance and higher accuracy as compared to the traditional algorithms while facing real economic problems at the presence of risk parameters and the ever-increasing uncertainties.