AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

3 ways AI can Advance Advertising

#artificialintelligenceMar-30-2018, 02:33:25 GMT

Claudia Collu, Chief Commercial Officer, Claudia, writes about AI's impact and how it is transforming advertising Machines with a mind of their own are now an accepted part of the world we inhabit, with the artificial intelligence (AI) market predicted to grow to more than $47 billion by 2020. According to the Gartner Hype Cycle, machine learning (ML) – a subfield of AI – is currently at the'peak of inflated expectation', but is just two to five years away from mainstream adoption. Nowhere is the ubiquity of AI more apparent than the advertising industry, where machine learning has already revolutionized media trading, enabling programmatic algorithms to make decisions in real time, based on huge volumes of data. But this is a broad discipline with many intricacies and the advertising industry is already moving from traditional rule-based ML to more fluid algorithms inspired by the psychology of human behavior. A particular strain of machine learning known as deep learning (DL) uses neural networks to mimic human decision-making. The ad industry is increasingly using reinforcement learning (RL) – a specific part of DL – to leverage trial and error, learning just as humans do based on rewards and penalties.

advertiser, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.38)

Add feedback

Learning to Adapt: Meta-Learning for Model-Based Control

Clavera, Ignasi, Nagabandi, Anusha, Fearing, Ronald S., Abbeel, Pieter, Levine, Sergey, Finn, Chelsea

arXiv.org Machine LearningMar-30-2018

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations can cause proficient but narrowly-learned policies to fail at test time. In this work, we propose to learn how to quickly and effectively adapt online to new situations as well as to perturbations. To enable sample-efficient meta-learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach trains a global model such that, when combined with recent data, the model can be be rapidly adapted to the local context. Our experiments demonstrate that our approach can enable simulated agents to adapt their behavior online to novel terrains, to a crippled leg, and in highly-dynamic environments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1803.11347

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Rashid, Tabish, Samvelyan, Mikayel, de Witt, Christian Schroeder, Farquhar, Gregory, Foerster, Jakob, Whiteson, Shimon

arXiv.org Machine LearningMar-30-2018

In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Machine Learning

1803.11485

Country: Europe > United Kingdom > England (0.68)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Simple Reinforcement Learning with Tensorflow Part 0: Q-Learning with Tables and Neural Networks

#artificialintelligenceMar-29-2018, 20:01:00 GMT

For this tutorial in my Reinforcement Learning series, we are going to be exploring a family of RL algorithms called Q-Learning algorithms. These are a little different than the policy-based algorithms that will be looked at in the the following tutorials (Parts 1–3). Instead of starting with a complex and unwieldy deep neural network, we will begin by implementing a simple lookup-table version of the algorithm, and then show how to implement a neural-network equivalent using Tensorflow. Given that we are going back to basics, it may be best to think of this as Part-0 of the series. It will hopefully give an intuition into what is really happening in Q-Learning that we can then build on going forward when we eventually combine the policy gradient and Q-learning approaches to build state-of-the-art RL agents (If you are more interested in Policy Networks, or already have a grasp on Q-Learning, feel free to start the tutorial series here instead).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

5 Things You Need to Know about Reinforcement Learning

@machinelearnbotMar-29-2018, 09:16:20 GMT

Reinforcement Learning is one of the hottest research topics currently and its popularity is only growing day by day. Let's look at 5 useful things to know about RL. Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where feedback provided to the agent is correct set of actions for performing a task, reinforcement learning uses rewards and punishment as signals for positive and negative behavior. As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximize the total cumulative reward of the agent.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning for Traffic Light Control in Vehicular Networks

Liang, Xiaoyuan, Du, Xunsheng, Wang, Guiling, Han, Zhu

arXiv.org Machine LearningMar-29-2018

Existing inefficient traffic light control causes numerous problems, such as long delay and waste of energy. To improve efficiency, taking real-time traffic information as an input and dynamically adjusting the traffic light duration accordingly is a must. In terms of how to dynamically adjust traffic signals' duration, existing works either split the traffic signal into equal duration or extract limited traffic information from the real data. In this paper, we study how to decide the traffic signals' duration based on the collected data from different sensors and vehicular networks. We propose a deep reinforcement learning model to control the traffic light. In the model, we quantify the complex traffic scenario as states by collecting data and dividing the whole intersection into small grids. The timing changes of a traffic light are the actions, which are modeled as a high-dimension Markov decision process. The reward is the cumulative waiting time difference between two cycles. To solve the model, a convolutional neural network is employed to map the states to rewards. The proposed model is composed of several components to improve the performance, such as dueling network, target network, double Q-learning network, and prioritized experience replay. We evaluate our model via simulation in the Simulation of Urban MObility (SUMO) in a vehicular network, and the simulation results show the efficiency of our model in controlling traffic lights.

machine learning, reinforcement, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1803.11115

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

How an Electrical Engineer Became an Artificial Intelligence Researcher, a Multiphase Active Contours Analysis

Varshney, Kush R.

arXiv.org Machine LearningMar-29-2018

This essay examines how what is considered to be artificial intelligence (AI) has changed over time and come to intersect with the expertise of the author. Initially, AI developed on a separate trajectory, both topically and institutionally, from pattern recognition, neural information processing, decision and control systems, and allied topics by focusing on symbolic systems within computer science departments rather than on continuous systems in electrical engineering departments. The separate evolutions continued throughout the author's lifetime, with some crossover in reinforcement learning and graphical models, but were shocked into converging by the virality of deep learning, thus making an electrical engineer into an AI researcher. Now that this convergence has happened, opportunity exists to pursue an agenda that combines learning and reasoning bridged by interpretable machine learning models.

machine learning, reinforcement learning, varshney, (14 more...)

arXiv.org Machine Learning

1803.11261

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Unsupervised Predictive Memory in a Goal-Directed Agent

Wayne, Greg, Hung, Chia-Chun, Amos, David, Mirza, Mehdi, Ahuja, Arun, Grabska-Barwinska, Agnieszka, Rae, Jack, Mirowski, Piotr, Leibo, Joel Z., Santoro, Adam, Gemici, Mevlana, Reynolds, Malcolm, Harley, Tim, Abramson, Josh, Mohamed, Shakir, Rezende, Danilo, Saxton, David, Cain, Adam, Hillier, Chloe, Silver, David, Kavukcuoglu, Koray, Botvinick, Matt, Hassabis, Demis, Lillicrap, Timothy

arXiv.org Machine LearningMar-28-2018

Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1803.1076

Country: North America > Canada (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment (0.67)
Media (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Constructing Temporal Abstractions Autonomously in Reinforcement Learning

Bacon, Pierre-Luc (McGill University) | Precup, Doina (McGill University)

AI MagazineMar-27-2018

The idea of temporal abstraction, i.e. learning, planning and representing the world at multiple time scales, has been a constant thread in AI research, spanning sub-fields from classical planning and search to control and reinforcement learning. For example, programming a robot typically involves making decisions over a set of controllers, rather than working at the level of motor torques. While temporal abstraction is a very natural concept, learning such abstractions with no human input has proved quite daunting. In this paper, we present a general architecture, called option-critic, which allows learning temporal abstractions automatically, end-to-end, simply from the agent’s experience. This approach allows continual learning and provides interesting qualitative and quantitative results in several tasks.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

AI Magazine

Country:

Europe (1.00)
North America > United States > California (0.29)
North America > Canada > Quebec > Montreal (0.14)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Q-Learning for Self-Organizing Networks Fault Management and Radio Performance Improvement

Mismar, Faris B., Evans, Brian L.

arXiv.org Machine LearningMar-27-2018

We propose a method to improve the radio link performance in a wireless network using a deep Q-Learning based algorithm. In this paper, we use this reinforcement learning model to allow the wireless network cluster to self-heal by performing certain fault management actions which improves the radio link performance of this wireless network. The main contributions of this paper are: 1) introduce a radio performance tuning algorithm that self-organizing networks can implement in a polynomial runtime, 2) employ deep reinforcement learning to perform fault management, and 3) show that this fault management method can improve the radio link performance in a realistic network setup. Simulation results show that an optimal action sequence to clear alarms is feasible even against the randomness of the network faults and user movements.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1707.02329

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.70)

Industry: Telecommunications (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback