Reinforcement Learning
Hacking Google reCAPTCHA v3 using Reinforcement Learning
Akrout, Ismail, Feriani, Amal, Akrout, Mohamed
We present a Reinforcement Learning (RL) methodology to bypass Google reCAPTCHA v3. We formulate the problem as a grid world where the agent learns how to move the mouse and click on the reCAPTCHA button to receive a high score. We study the performance of the agent when we vary the cell size of the grid world and show that the performance drops when the agent takes big steps toward the goal. Finally, we used a divide and conquer strategy to defeat the reCAPTCHA system for any grid resolution. Our proposed method achieves a success rate of 97.4% on a 100x100 grid and 96.7% on a 1000x1000 screen resolution.
Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces
Fu, Haotian, Tang, Hongyao, Hao, Jianye, Lei, Zihan, Chen, Yingfeng, Fan, Changjie
Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.
Task-oriented Design through Deep Reinforcement Learning
Choi, Junyoung, Hyun, Minsung, Kwak, Nojun
We propose a new low-cost machine-learning-based methodology which assists designers in reducing the gap between the problem and the solution in the design process. Our work applies reinforcement learning (RL) to find the optimal task-oriented design solution through the construction of the design action for each task. For this task-oriented design, the 3D design process in product design is assigned to an action space in Deep RL, and the desired 3D model is obtained by training each design action according to the task. By showing that this method achieves satisfactory design even when applied to a task pursuing multiple goals, we suggest the direction of how machine learning can contribute to the design process. Also, we have validated with product designers that this methodology can assist the creative part in the process of design.
Deep learning for molecular generation and optimization - a review of the state of the art
Elton, Daniel C., Boukouvalas, Zois, Fuge, Mark D., Chung, Peter W.
In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules - in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training.
Stroke-based Artistic Rendering Agent with Deep Reinforcement Learning
Huang, Zhewei, Heng, Wen, Zhou, Shuchang
Excellent painters can use only a few strokes to create a fantastic painting, which is a symbol of human intelligence and art. Reversing the simulator to interpret images is also a challenging task of computer vision in recent years. In this paper, we present SARA, a stroke-based artistic rendering agent that combines the neural renderer and deep reinforcement learning (DRL), allowing the machine to learn the ability to deconstruct images using strokes and create amazing visual effects. Our agent is an end-to-end program that converts natural images into paintings. The training process does not require the experience of human painting or stroke tracking data.
Investigating Reinforcement Learning Agents for Continuous State Space Environments
Abstract--Given an environment with continuous state spaces and discrete actions, we investigate using a Double Deep Q-learning Reinforcement Agent to find optimal policies using the LunarLander-v2 OpenAI gym environment. I. INTRODUCTION For this study, we examine performance of reinforcement learning (RL) algorithms for continuous state space MDPs, specifically OpenAI Gym's LunarLander-v2. In this environment, the goal is for the RL agent to learn to land successfully on a landing pad located a coordinate points (0,0) in the frame. The agent receives -0.03 points for firing its main engine for each frame, and landing on the landing pad is 100-140 points, which can be lost if the agent moves away from the pad. Each leg contact with the ground is 10 points.
A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems
Silva, Felipe Leno Da, Costa, Anna Helena Reali
Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with other agents through autonomous exploration of the environment. However, learning a complex task from scratch is impractical due to the huge sample complexity of RL algorithms. For this reason, reusing knowledge that can come from previous experience or other agents is indispensable to scale up multiagent RL algorithms. This survey provides a unifying view of the literature on knowledge reuse in multiagent RL. We define a taxonomy of solutions for the general knowledge reuse problem, providing a comprehensive discussion of recent progress on knowledge reuse in Multiagent Systems (MAS) and of techniques for knowledge reuse across agents (that may be actuating in a shared environment or not). We aim at encouraging the community to work towards reusing all the knowledge sources available in a MAS. For that, we provide an in-depth discussion of current lines of research and open questions.
Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control
Chu, Tianshu, Wang, Jie, Codecà, Lara, Li, Zhaojian
Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. Multi-agent RL (MARL) overcomes the scalability issue by distributing the global control to each local RL agent, but it introduces new challenges: now the environment becomes partially observable from the viewpoint of each local agent due to limited communication among agents. Most existing studies in MARL focus on designing efficient communication and coordination among traditional Q-learning agents. This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent: advantage actor critic (A2C), within the context of ATSC. In particular, two methods are proposed to stabilize the learning procedure, by improving the observability and reducing the learning difficulty of each local agent. The proposed multi-agent A2C is compared against independent A2C and independent Q-learning algorithms, in both a large synthetic traffic grid and a large real-world traffic network of Monaco city, under simulated peak-hour traffic dynamics. Results demonstrate its optimality, robustness, and sample efficiency over other state-of-the-art decentralized MARL algorithms.
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Steckelmacher, Denis, Plisnier, Hélène, Roijers, Diederik M., Nowé, Ann
We argue that actorcritic PGQL (O'Donoghue et al., 2017) allows for an off-policy algorithms are currently limited by their V function, but requires it to be combined with on-policy need for an on-policy critic, which severely constraints advantage values. Notable examples of algorithms without how the critic is learned. We propose an on-policy critic are AlphaGo Zero (Silver et al., 2017), Bootstrapped Dual Policy Iteration (BDPI), that replaces the critic with a slow-moving target policy a novel model-free actor-critic reinforcementlearning learned with tree search, and the Actor-Mimic (Parisotto algorithm for continuous states and et al., 2016), that minimizes the cross-entropy between discrete actions, with off-policy critics. Offpolicy an actor and the Softmax policies of critics (see Section critics are compatible with experience replay, 4.2). The need of most actor-critic algorithms for an onpolicy ensuring high sample-efficiency, without critic makes them incompatible with state-of-the-art the need for off-policy corrections. The actor, value-based algorithms of the Q-Learning family (Arjona-by slowly imitating the average greedy policy Medina et al., 2018; Hessel et al., 2017), that are all highly of the critics, leads to high-quality and statespecific sample-efficient but off-policy. In a discrete-actions setting, exploration, which we show approximates where off-policy value-based methods can be used, Thompson sampling. Because the actor this raises two questions: and critics are fully decoupled, BDPI is remarkably stable and, contrary to other state-of-theart 1. Can we use off-policy value-based algorithms in an algorithms, unusually forgiving for poorlyconfigured actor-critic setting?
Learning to navigate in cities without a map DeepMind
We depart from the traditional approaches which rely on explicit mapping and exploration (like a cartographer who tries to localise themselves and draw a map at the same time). Our approach, in contrast, is to learn to navigate as humans used to do, without maps, GPS localisation, or other aids, using only visual observations. We build a neural network agent that inputs images observed from the environment and predicts the next action it should take in that environment. We train it end-to-end using deep reinforcement learning, similarly to some recent work on learning to navigate in complex 3D mazes and reinforcement learning with unsupervised auxiliary tasks for playing games. Unlike those studies, which were conducted on small-scale simulated maze environments, we utilise city-scale real-world data, including complex intersections, footpaths, tunnels, and diverse topology across London, Paris, and New York City.