AITopics

1909.11373

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceSep-24-2019, 00:56:31 GMT

Counterfactual Policy Evaluation: A better alternative for quick A/B tests

A/B tests are a crucial tool for evaluating experiments in marketing domain. It is performed by subjecting different population samples to variations of an offer. The results are observed at the end of experiment to select the best performing variant. Marketing teams need to spend a lot of time in crafting these experiments. It also takes time and money to wait for results to iterate and improve.

better alternative, counterfactual policy evaluation, experiment, (4 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Folkers, Andreas, Rick, Matthias, Büskens, Christof

Controlling an Autonomous Vehicle with Deep Reinforcement Learning

-- We present a control approach for autonomous vehicles based on deep reinforcement learning. A neural network agent is trained to map its estimated state to acceleration and steering commands given the objective of reaching a specific target state while considering detected obstacles. Learning is performed using state-of-the-art proximal policy optimization in combination with a simulated environment. Training from scratch takes five to nine hours. The resulting agent is evaluated within simulation and subsequently applied to control a full-size research vehicle. For this, the autonomous exploration of a parking lot is considered, including turning maneuvers and obstacle avoidance. Altogether, this work is among the first examples to successfully apply deep reinforcement learning to a real vehicle. I. INTRODUCTION Self driving cars have the potential to sustainably change modern societies which are heavily based on mobility. The benefits of such a technology range from self-providing car sharing to platooning approaches, which ultimately yield a much more effective usage of vehicles and roads [1]. In recent years, great progress has been made in the development of these systems, with a major factor being the results achieved through deep learning methods.

agent, controller, vehicle, (14 more...)

doi: 10.1109/IVS.2019.8814124

1909.12153

Country:

Europe > Germany > Bremen > Bremen (0.28)
Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Passenger (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

arXiv.org Machine LearningSep-24-2019

Avoidance Learning Using Observational Reinforcement Learning

Venuto, David, Boussioux, Leonard, Wang, Junhao, Dali, Rola, Chakravorty, Jhelum, Bengio, Yoshua, Precup, Doina

Imitation learning seeks to learn an expert policy from sampled demonstrations. However, in the real world, it is often difficult to find a perfect expert and avoiding dangerous behaviors becomes relevant for safety reasons. We present the idea of \textit{learning to avoid}, an objective opposite to imitation learning in some sense, where an agent learns to avoid a demonstrator policy given an environment. We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator. In this work we develop a framework of avoidance learning by defining a suitable objective function for these problems which involves the \emph{distance} of state occupancy distributions of the expert and demonstrator policies. We use density estimates for state occupancy measures and use the aforementioned distance as the reward bonus for avoiding the demonstrator. We validate our theory with experiments using a wide range of partially observable environments. Experimental results show that we are able to improve sample efficiency during training compared to state of the art policy optimization and safety methods.

agent, demonstrator, demonstrator trajectory, (12 more...)

arXiv.org Machine Learning

1909.11228

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Zhong, Yuren, Deshmukh, Aniket Anand, Scott, Clayton

PAC Reinforcement Learning without Real-World Feedback

arXiv.org Machine LearningSep-24-2019

This work studies reinforcement learning in the Sim-to-Real setting, in which an agent is first trained on a number of simulators before being deployed in the real world, with the aim of decreasing the real-world sample complexity requirement. Using a dynamic model known as a rich observation Markov decision process (ROMDP), we formulate a theoretical framework for Sim-to-Real in the situation where feedback in the real world is not available. We establish real-world sample complexity guarantees that are smaller than what is currently known for directly (i.e., without access to simulators) learning a ROMDP with feedback.

pac reinforcement learning, probability, real world, (13 more...)

arXiv.org Machine Learning

1909.10449

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Duisterhof, Bardienus P., Krishnan, Srivatsan, Cruz, Jonathan J., Banbury, Colby R., Fu, William, Faust, Aleksandra, de Croon, Guido C. H. E., Reddi, Vijay Janapa

Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

-- Fully autonomous navigation using nano drones has numerous application in the real world, ranging from search and rescue to source seeking. Nano drones are well-suited for source seeking because of their agility, low price, and ubiquitous character . Unfortunately, their constrained form factor limits flight time, sensor payload, and compute capability. These challenges are a crucial limitation for the use of source-seeking nano drones in GPSdenied and highly cluttered environments. Hereby, we introduce a fully autonomous deep reinforcement learning-based light-seeking nano drone. We present the method for efficiently training, converting, and utilizing deep reinforcement learning policies. Our training methodology and novel quantization scheme allow fitting the trained policy in 3 kB of memory. The quantization scheme uses representative input data and input scaling to arrive at a full 8-bit model. Finally, we evaluate the approach in simulation and flight tests using a Bitcraze CrazyFlie, achieving 80% success rate on average in a highly cluttered and randomized test environment. Even more, the drone finds the light source in 29% fewer steps compared to a baseline simulation (obstacle avoidance without source information). T o our knowledge, this is the first deep reinforcement learning method that enables source seeking within a highly constrained nano drone demonstrating robust flight behavior . Our general methodology is suitable for any (source seeking) highly constrained platform using deep reinforcement learning. In recent years, nano drones have gained traction in the robotics community. Their agility, maneuverability, and low price make them suitable for a wide range of applications, especially in GPSdenied and cluttered environments.

deep reinforcement, nano drone, reinforcement, (14 more...)

1909.11236

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.51)

Industry:

Transportation > Air (0.50)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Recurrent Independent Mechanisms

Goyal, Anirudh, Lamb, Alex, Hoffmann, Jordan, Sodhani, Shagun, Levine, Sergey, Bengio, Yoshua, Schölkopf, Bernhard

Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes. We propose Recurrent Independent Mechanisms (RIMs), a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the bottleneck of attention, and are only updated at time steps where they are most relevant. We show that this leads to specialization amongst the RIMs, which in turn allows for dramatically improved generalization on tasks where some factors of variation differ systematically between training and evaluation.

arxiv preprint arxiv, baseline, mechanism, (12 more...)

1909.10893

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Invariant Transform Experience Replay

Lin, Yijiong, Huang, Jiancong, Zimmer, Matthieu, Rojas, Juan, Weng, Paul

Yijiong Lin 1, Jiancong Huang 1, Matthieu Zimmer 2, Juan Rojas 1, Paul Weng 2 Abstract -- Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. We propose two novel data augmentation techniques for DRL based on invariant transformations of trajectories in order to reuse more efficiently observed interaction. The first one called Kaleidoscope Experience Replay exploits reflectional symmetries, while the second called Goal-augmented Experience Replay takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show a large increase in learning speed. I. INTRODUCTION Deep reinforcement learning (DRL) has demonstrated great promise in recent years [1], [2]. However, despite being shown to be a viable approach in robotics [3], [4], DRL still suffers from low sample efficiency in practice--an acute issue in robot learning. Given how critical this issue is, many diverse propositions have been presented. For brevity, we only recall the most related to our work.

experience replay, symmetry, trajectory, (15 more...)

1909.10707

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

Doan, Thang, Mazoure, Bogdan, Durand, Audrey, Pineau, Joelle, Hjelm, R Devon

Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).

agent, algorithm, exploration, (16 more...)

1909.07543

Country:

North America > Canada > Quebec > Montreal (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Stooke, Adam, Abbeel, Pieter

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

Since the recent advent of deep reinforcement learning for game play and simulated robotic control, a multitude of new algorithms have flourished. Most are model-free algorithms which can be categorized into three families: deep Q-learning, policy gradients, and Q-value policy gradients. These have developed along separate lines of research, such that few, if any, code bases incorporate all three kinds. Yet these algorithms share a great depth of common deep reinforcement learning machinery. We are pleased to share rlpyt, which implements all three algorithm families on top of a shared, optimized infrastructure, in a single repository. It contains modular implementations of many common deep RL algorithms in Python using PyTorch, a leading deep learning library. rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL. This white paper summarizes its features, algorithms implemented, and relation to prior work, and concludes with detailed implementation and usage notes. rlpyt is available at https://github.com/astooke/rlpyt.

algorithm, arxiv preprint arxiv, rlpyt, (12 more...)

1909.015

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)