AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

AI Learns to Park - Deep Reinforcement Learning

#artificialintelligenceOct-14-2019, 19:28:28 GMT

An AI learns to park a car in a parking lot in a 3D physics simulation. The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach. Basically, the input of the Neural Network are the readings of eight depth sensors, the cars current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force.

deep reinforcement learning, neural network, parking spot, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

covariant.ai

#artificialintelligenceOct-14-2019, 00:37:33 GMT

Drawing on recent advances in Deep Imitation Learning and Deep Reinforcement Learning, covariant.ai is developing AI software that makes it easy to teach robots new, complex skills. Founded by Pieter Abbeel, Peter Chen, Rocky Duan and Tianhao Zhang, the company is based in Emeryville, CA and backed by venture funding.

covariant

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

covariant.ai

#artificialintelligenceOct-14-2019, 00:37:29 GMT

covariant

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Weakly Labeled Sound Event Detection Using Tri-training and Adversarial Learning

Park, Hyoungwoo, Yun, Sungrack, Eum, Jungyun, Cho, Janghoon, Hwang, Kyuwoong

arXiv.org Machine LearningOct-14-2019

This paper considers a semi-supervised learning framework for weakly labeled polyphonic sound event detection problems for the DCASE 2019 challenge's task4 by combining both the tri-training and adversarial learning. The goal of the task4 is to detect onsets and offsets of multiple sound events in a single audio clip. The entire dataset consists of the synthetic data with a strong label (sound event labels with boundaries) and real data with weakly labeled (sound event labels) and unlabeled dataset. Given this dataset, we apply the tri-training where two different classifiers are used to obtain pseudo labels on the weakly labeled and unlabeled dataset, and the final classifier is trained using the strongly labeled dataset and weakly/unlabeled dataset with pseudo labels. Also, we apply the adversarial learning to reduce the domain gap between the real and synthetic dataset. We evaluated our learning framework using the validation set of the task4 dataset, and in the experiments, our learning framework shows a considerable performance improvement over the baseline model.

adversarial learning, dataset, international conference, (12 more...)

arXiv.org Machine Learning

1910.0679

Country: North America > United States (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Reinforcement learning with spiking coagents

Aenugu, Sneha, Sharma, Abhishek, Yelamarthi, Sasikiran, Hazan, Hananel, Thomas, Philip S., Kozma, Robert

arXiv.org Machine LearningOct-14-2019

Neuroscientific theory suggests that dopaminergic neurons broadcast global reward prediction errors to large areas of the brain influencing the synaptic plasticity of the neurons in those regions. We build on this theory to propose a multi-agent learning framework with spiking neurons in the generalized linear model (GLM) formulation as agents, to solve reinforcement learning (RL) tasks. We show that a network of GLM spiking agents connected in a hierarchical fashion, where each spiking agent modulates its firing policy based on local information and a global prediction error, can learn complex action representations to solve RL tasks. We further show how leveraging principles of modularity and population coding inspired from the brain can help reduce variance in the learning updates making it a viable optimization technique.

agent, architecture, neuron, (12 more...)

arXiv.org Machine Learning

1910.06489

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Parmas, Paavo, Sugiyama, Masashi

arXiv.org Machine LearningOct-14-2019

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature. We use a first principles approach to explain LR and RP, and show a connection between the two via the divergence theorem. The theory motivated us to derive optimal importance sampling schemes to reduce LR gradient variance. Our newly derived distributions have analytic probability densities and can be directly sampled from. The improvement for Gaussian target distributions was modest, but for other distributions such as a Beta distribution, our method could lead to arbitrarily large improvements, and was crucial to obtain competitive performance in evolution strategies experiments.

gradient, gradient estimator, variance, (14 more...)

arXiv.org Machine Learning

1910.06419

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

On the Reduction of Variance and Overestimation of Deep Q-Learning

Sabry, Mohammed, Khalifa, Amr M. A.

arXiv.org Machine LearningOct-14-2019

The breakthrough of deep Q-Learning on different types of environments revolutionized the algorithmic design of Reinforcement Learning to introduce more stable and robust algorithms, to that end many extensions to deep Q-Learning algorithm have been proposed to reduce the variance of the target values and the overestimation phenomena. In this paper, we examine new methodology to solve these issues, we propose using Dropout techniques on deep Q-Learning algorithm as a way to reduce variance and overestimation. We further present experiments on some of the benchmark environments that demonstrate significant improvement of the stability of the performance and a reduction in variance and overestimation.

algorithm, arxiv preprint arxiv, dropout method, (12 more...)

arXiv.org Machine Learning

1910.05983

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Africa > Sudan > Khartoum State > Khartoum (0.05)
Africa > Sudan > Khartoum (0.05)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Tomar, Manan, Efroni, Yonathan, Ghavamzadeh, Mohammad

arXiv.org Machine LearningOct-14-2019

Multi-step greedy policies have been extensively used in model-based Reinforcement Learning (RL) and in the case when a model of the environment is available (e.g., in the game of Go). In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration. These algorithms iteratively solve short-horizon decision problems and converge to the optimal solution of the original one. By using model-free algorithms as solvers of the short-horizon problems we derive fully model-free algorithms which are instances of the multi-step DP framework. As model-free algorithms are prone to instabilities w.r.t. the decision problem horizon, this simple approach can help in mitigating these instabilities and results in an improved model-free algorithms. We test this approach and show results on both discrete and continuous control problems.

algorithm, conference paper, iteration, (15 more...)

arXiv.org Machine Learning

1910.02919

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Federated Transfer Reinforcement Learning for Autonomous Driving

Liang, Xinle, Liu, Yang, Chen, Tianjian, Liu, Ming, Yang, Qiang

arXiv.org Artificial IntelligenceOct-14-2019

Xinle Liang 1, Y ang Liu 1, Tianjian Chen 1, Ming Liu 2 and Qiang Y ang 1 Abstract -- Reinforcement learning (RL) is widely used in autonomous driving tasks and training RL models typically involves in a multi-step process: pre-training RL models on simulators, uploading the pre-trained model to real-life robots, and fine-tuning the weight parameters on robot vehicles. This sequential process is extremely time-consuming and more importantly, knowledge from the fine-tuned model stays local and can not be reused or leveraged collaboratively. T o tackle this problem, we present an online federated RL transfer process for real-time knowledge extraction where all the participant agents make corresponding actions with the knowledge learned by others, even when they are acting in very different environments. T o validate the effectiveness of the proposed approach, we constructed a real-life collision avoidance system with Microsoft Airsim simulator and NVIDIA JetsonTX2 car agents, which cooperatively learn from scratch to avoid collisions in indoor environment with obstacle objects. We demonstrate that with the proposed framework, the simulator car agents can transfer knowledge to the RC cars in real-time, with 27% increase in the average distance with obstacles and 42% decrease in the collision counts. I. INTRODUCTION Recent Reinforcement Learning (RL) researches in autonomous robots have achieved significant performance improvement by employing distributed architecture for decentralized agents [1], [2], which is termed as Distributed Reinforcement Learning (DRL). However, most existing DRL frameworks consider only synchronous learning with a constant environment.

agent, knowledge, learning, (14 more...)

arXiv.org Artificial Intelligence

1910.06001

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Robotics & Automation (0.86)
Transportation > Ground > Road (0.72)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bootstrapping the Expressivity with Model-based Planning

Dong, Kefan, Luo, Yuping, Ma, Tengyu

arXiv.org Artificial IntelligenceOct-14-2019

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks.

algorithm, neural network, q-function, (15 more...)

arXiv.org Artificial Intelligence

1910.05927

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.61)

Add feedback