AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Sad robot: Expert says that robots could become so life-like that they will develop mental illnesses too

#artificialintelligenceMay-9-2018, 11:06:34 GMT

It's fair to say that our world has reached a point where technology is so advanced that robots are almost expected to be lifelike – but what about robots that develop mental illnesses, hallucinations and depression like human beings do? Is this just science fiction, or can we really expect artificial intelligence to grow even more similar to humans in the not-so-distant future? Back in March, New York University hosted a symposium in New York City called Canonical Computations in Brains and Machines, where a group of neuroscientists and experts in the field of artificial intelligence spoke about overlaps in the ways in which human beings and machines think and process information. According to one of these neuroscientists – Zachary Mainen of the Champalimaud Centre for the Unknown – we might expect advanced machines to soon be able to experience some of the same mental problems that people do. "I'm drawing on the field of computational psychiatry, which assumes we can learn about a patient who's depressed or hallucinating from studying AI algorithms like reinforcement learning. If you reverse the arrow, why wouldn't an AI be subject to the sort of things that go wrong with patients?"

artificial intelligence, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Country: North America > United States > New York (0.47)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.38)

Add feedback

Deep Reinforcement Learning for Optimal Control of Space Heating

Nagy, Adam, Kazmi, Hussain, Cheaib, Farah, Driesen, Johan

arXiv.org Machine LearningMay-9-2018

Classical methods to control heating systems are often marred by suboptimal performance, inability to adapt to dynamic conditions and unreasonable assumptions e.g. existence of building models. This paper presents a novel deep reinforcement learning algorithm which can control space heating in buildings in a computationally efficient manner, and benchmarks it against other known techniques. The proposed algorithm outperforms rule based control by between 5-10% in a simulation environment for a number of price signals. We conclude that, while not optimal, the proposed algorithm offers additional practical advantages such as faster computation times and increased robustness to non-stationarities in building dynamics.

artificial intelligence, controller, energy conservation, (19 more...)

arXiv.org Machine Learning

1805.03777

Country: Europe > Belgium (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (0.96)
Construction & Engineering > HVAC (0.89)
Energy > Renewable > Geothermal (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning and GANs LiveLessons: Advanced Topics in Deep Learning

#artificialintelligenceMay-8-2018, 21:27:38 GMT

Overview Deep Reinforcement Learning and GANs LiveLessons is an introduction to two of the most exciting topics in Deep Learning today.

artificial intelligence, deep learning, machine learning, (7 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Race for the Galaxy AI

#artificialintelligenceMay-8-2018, 00:41:16 GMT

What makes a game replayable over time? It offers new challenges over and over again. One way to do that is to include an AI opponent that is so skilled, even advanced players will continue to be challenged after hundreds of hours of play. Race has been one of the top selling boardgames this year partly because of the neural network that powers its AI. Race for the Galaxy uses a temporal difference neural network.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Backgammon (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Romoff, Joshua, Piché, Alexandre, Henderson, Peter, Francois-Lavet, Vincent, Pineau, Joelle

arXiv.org Artificial IntelligenceMay-8-2018

In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and control-variates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward signal. This results in theoretical reductions in variance in the tabular case, as well as empirical improvements in both the function approximation and tabular settings in environments where rewards are stochastic. To do so, we use a modified version of Advantage Actor Critic (A2C) on variations of Atari games.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1805.03359

Country:

North America > Canada > Quebec > Montreal (0.15)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems

Ferdowsi, Aidin, Challita, Ursula, Saad, Walid, Mandayam, Narayan B.

arXiv.org Artificial IntelligenceMay-8-2018

To operate effectively in tomorrow's smart cities, autonomous vehicles (AVs) must rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication. Such dependence on sensors and communication links exposes AVs to cyber-physical (CP) attacks by adversaries that seek to take control of the AVs by manipulating their data. Thus, to ensure safe and optimal AV dynamics control, the data processing functions at AVs must be robust to such CP attacks. To this end, in this paper, the state estimation process for monitoring AV dynamics, in presence of CP attacks, is analyzed and a novel adversarial deep reinforcement learning (RL) algorithm is proposed to maximize the robustness of AV dynamics control to CP attacks. The attacker's action and the AV's reaction to CP attacks are studied in a game-theoretic framework. In the formulated game, the attacker seeks to inject faulty data to AV sensor readings so as to manipulate the inter-vehicle optimal safe spacing and potentially increase the risk of AV accidents or reduce the vehicle flow on the roads. Meanwhile, the AV, acting as a defender, seeks to minimize the deviations of spacing so as to ensure robustness to the attacker's actions. Since the AV has no information about the attacker's action and due to the infinite possibilities for data value manipulations, the outcome of the players' past interactions are fed to long-short term memory (LSTM) blocks. Each player's LSTM block learns the expected spacing deviation resulting from its own action and feeds it to its RL algorithm. Then, the the attacker's RL algorithm chooses the action which maximizes the spacing deviation, while the AV's RL algorithm tries to find the optimal action that minimizes such deviation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1805.00983

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Improving Deep Reinforcement Learning for POMDPs

Zhu, Pengfei, Li, Xin, Poupart, Pascal, Miao, Guanghui

arXiv.org Machine LearningMay-8-2018

Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.

artificial intelligence, deep reinforcement learning, machine learning, (1 more...)

arXiv.org Machine Learning

1804.06309

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Advanced AI: Deep Reinforcement Learning in Python

@machinelearnbotMay-7-2018, 14:05:10 GMT

This course is all about the application of deep learning and neural networks to reinforcement learning. If you've taken my first reinforcement learning class, then you know that reinforcement learning is on the bleeding edge of what we can do with AI. Specifically, the combination of deep learning with reinforcement learning has led to AlphaGo beating a world champion in the strategy game Go, it has led to self-driving cars, and it has led to machines that can play video games at a superhuman level. Reinforcement learning has been around since the 70s but none of this has been possible until now. The world is changing at a very fast pace. The state of California is changing their regulations so that self-driving car companies can test their cars without a human in the car to supervise.

machine learning, reinforcement, reinforcement learning, (11 more...)

@machinelearnbot

Country: North America > United States > California (0.26)

Genre: Instructional Material > Course Syllabus & Notes (0.71)

Industry:

Leisure & Entertainment > Games (0.92)
Information Technology (0.78)
Automobiles & Trucks (0.78)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Planning and Learning with Stochastic Action Sets

Boutilier, Craig, Cohen, Alon, Daniely, Amit, Hassidim, Avinatan, Mansour, Yishay, Meshi, Ofer, Mladenov, Martin, Schuurmans, Dale

arXiv.org Artificial IntelligenceMay-7-2018

In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities; and a sampling-based model for unknown distributions. We develop poly-time value and policy iteration methods for both cases; and in the first, we offer a poly-time linear programming solution.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1805.02363

Country: North America > United States (0.67)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Ranking for Relevance and Display Preferences in Complex Presentation Layouts

Oosterhuis, Harrie, de Rijke, Maarten

arXiv.org Artificial IntelligenceMay-7-2018

Learning to Rank has traditionally considered settings where given the relevance information of objects, the desired order in which to rank the objects is clear. However, with today's large variety of users and layouts this is not always the case. In this paper, we consider so-called complex ranking settings where it is not clear what should be displayed, that is, what the relevant items are, and how they should be displayed, that is, where the most relevant items should be placed. These ranking settings are complex as they involve both traditional ranking and inferring the best display order. Existing learning to rank methods cannot handle such complex ranking settings as they assume that the display order is known beforehand. To address this gap we introduce a novel Deep Reinforcement Learning method that is capable of learning complex rankings, both the layout and the best ranking given the layout, from weak reward signals. Our proposed method does so by selecting documents and positions sequentially, hence it ranks both the documents and positions, which is why we call it the Double-Rank Model (DRM). Our experiments show that DRM outperforms all existing methods in complex ranking settings, thus it leads to substantial ranking improvements in cases where the display order is not known a priori.

display order, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3209978.3209992

1805.02404

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback