AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Foerster, Jakob, Nardelli, Nantas, Farquhar, Gregory, Afouras, Triantafyllos, Torr, Philip H. S., Kohli, Pushmeet, Whiteson, Shimon

arXiv.org Artificial IntelligenceJun-12-2017

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1702.08887

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Artificial Intelligence: Reinforcement Learning in Python

#artificialintelligenceJun-11-2017, 12:10:17 GMT

When people talk about artificial intelligence, they usually don't mean supervised and unsupervised machine learning. These tasks are pretty trivial compared to what we think of AIs doing - playing chess and Go, driving cars, and beating video games at a superhuman level. Reinforcement learning has recently become popular for doing all of that and more. Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn't been until recently that we've been able to observe first hand the amazing results that are possible. In 2016 we saw Google's AlphaGo beat the world Champion in Go.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

Genre:

Instructional Material > Course Syllabus & Notes (0.52)
Instructional Material > Online (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.41)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Meta learning Framework for Automated Driving

Sallab, Ahmad El, Saeed, Mahmoud, Tawab, Omar Abdel, Abdou, Mohammed

arXiv.org Machine LearningJun-11-2017

The success of automated driving deployment is highly depending on the ability to develop an efficient and safe driving policy. The problem is well formulated under the framework of optimal control as a cost optimization problem. Model based solutions using traditional planning are efficient, but require the knowledge of the environment model. On the other hand, model free solutions suffer sample inefficiency and require too many interactions with the environment, which is infeasible in practice. Methods under the Reinforcement Learning framework usually require the notion of a reward function, which is not available in the real world. Imitation learning helps in improving sample efficiency by introducing prior knowledge obtained from the demonstrated behavior, on the risk of exact behavior cloning without generalizing to unseen environments. In this paper we propose a Meta learning framework, based on data set aggregation, to improve generalization of imitation learning algorithms. Under the proposed framework, we propose MetaDAgger, a novel algorithm which tackles the generalization issues in traditional imitation learning. We use The Open Race Car Simulator (TORCS) to test our algorithm. Results on unseen test tracks show significant improvement over traditional imitation learning algorithms, improving the learning time and sample efficiency in the same time. The results are also supported by visualization of the learnt features to prove generalization of the captured details.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1706.04038

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Symmetry Learning for Function Approximation in Reinforcement Learning

Mahajan, Anuj, Tulabandhula, Theja

arXiv.org Machine LearningJun-9-2017

Reinforcement Learning (RL) is the task of training an agent to perform optimally in an environment using the reward and observation signals perceived upon taking actions which change the environment dynamics. Learning optimal behavior is inherently difficult because of challenges like credit assignment and exploration-exploitation trade offs that need to be made while converging to a solution. In many scenarios, like training a rover to move on a Martian surface, the cost of obtaining samples for learning can be high (in terms of robot's energy expenditure etc.), and so sample efficiency is an important subproblem which deserves special attention. Very often it is the case that the environment has intrinsic symmetries which can be leveraged by the agent to improve performance and learn more efficiently. For example, in the Cart-Pole domain [1, 2] the state action space is symmetric with respect to reflection about the plane perpendicular to the direction of motion of the cart (Figure 1). In fact, in many environments, the number of symmetry relations tend to increase with the dimensionality of the state space. For instance, for the simple case of grid world of dimension d (Figure 1) there exist O(d!2

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1706.02999

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

Stochastic Variance Reduction Methods for Policy Evaluation

Du, Simon S., Chen, Jianshu, Li, Lihong, Xiao, Lin, Zhou, Dengyong

arXiv.org Artificial IntelligenceJun-9-2017

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

1702.07944

Country: North America > United States > Pennsylvania (0.28)

Genre:

Research Report (0.64)
Workflow (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

aleju/self-driving-truck

@machinelearnbotJun-8-2017, 16:50:07 GMT

This repository contains code to train and run a self-driving truck in Euro Truck Simulator 2. The resulting AI will automatically steer, accelerate and brake. It is trained (mostly) via reinforcement learning and only has access to the buttons W, A, S and D (i.e. it can not directly set the steering wheel angle). The basic training method follows the standard reinforcement learning approach from the original Atari paper. Additionally, a separation of Q-values in V (value) and A (advantage) - as described in Dueling Network Architectures for Deep Reinforcement Learning - is used. Further, the model tries to predict future states and rewards, similar to the description in Deep Successor Reinforcement Learning.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

@machinelearnbot

Industry:

Transportation > Passenger (0.61)
Transportation > Ground > Road (0.61)
Information Technology > Robotics & Automation (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Reinforcement Learning via Initial Pure Exploration

Putta, Sudeep Raja, Tulabandhula, Theja

arXiv.org Machine LearningJun-7-2017

In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing her scores in the actual tests. We treat this scenario in the context of an agent exploring a fixed-horizon episodic Markov Decision Process (MDP), where the agent can practice on the MDP for some number of episodes (not necessarily known in advance) before starting to incur regret for its actions. During practice, the agent's goal must be to maximize the probability of following an optimal policy. This is akin to the problem of Pure Exploration (PE). We extend the PE problem of Multi Armed Bandits (MAB) to MDPs and propose a Bayesian algorithm called Posterior Sampling for Pure Exploration (PSPE), which is similar to its bandit counterpart. We show that the Bayesian simple regret converges at an optimal exponential rate when using PSPE. When the agent starts being evaluated, its goal would be to minimize the cumulative regret incurred. This is akin to the problem of Reinforcement Learning (RL). The agent uses the Posterior Sampling for Reinforcement Learning algorithm (PSRL) initialized with the posteriors of the practice phase. We hypothesize that this PSPE + PSRL combination is an optimal strategy for minimizing regret in RL problems with an initial practice phase. We show empirical results which prove that having a lower simple regret at the end of the practice phase results in having lower cumulative regret during evaluation.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1706.02237

Country:

North America > United States > Illinois (0.14)
Asia > India (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Project Magenta: Music and Art with Machine Learning (Google I/O '17)

#artificialintelligenceJun-6-2017, 03:20:21 GMT

Google Brain researcher Douglas Eck will discuss Magenta, a project using TensorFlow to generate art and music with deep nets and reinforcement learning. He'll also talk about how artists and musicians fit in to the effort. We'll dive into some of the technical details and challenges faced in building generative models, but no machine learning expertise is required to follow the session. See all the talks from Google I/O '17 here: https://goo.gl/D0D4VE Subscribe to the Google Developers channel: http://goo.gl/mQyv5L

artificial intelligence, machine learning, reinforcement learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Intrinsically motivated model learning for developing curious robots

#artificialintelligenceJun-2-2017, 15:30:13 GMT

Reinforcement Learning (RL) agents are typically deployed to learn a specific, concrete task based on a pre-defined reward function. However, in some cases an agent may be able to gain experience in the domain prior to being given a task. In such cases, intrinsic motivation can be used to enable the agent to learn a useful model of the environment that is likely to help it learn its eventual tasks more efficiently. This paradigm fits robots particularly well, as they need to learn about their own dynamics and affordances which can be applied to many different tasks. The algorithm learns models of the transition dynamics of a domain using random forests.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.78)

Add feedback

[R] [1705.06366] Automatic Goal Generation for Reinforcement Learning Agents • r/MachineLearning

@machinelearnbotJun-1-2017, 05:00:19 GMT

I wanted to like this paper. Curriculum learning is an area that needs more research, and automating the curriculum process is a good idea. However, the use of a GAN here is completely overkill -- the space of goals here is low dimensional and relatively unstructured. I'd wager any generative modeling technique would've worked for their experiments.

artificial intelligence, automatic goal generation, machine learning, (3 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback