AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Machine Learning: Build a Ml/AI E-Mail Spam Classifier

@machinelearnbotMay-21-2018, 09:26:07 GMT

When people talk about artificial intelligence, they usually don't mean supervised and unsupervised machine learning. These tasks are pretty trivial compared to what we think of AIs doing - playing chess and Go, driving cars, and beating video games at a superhuman level. Reinforcement learning has recently become popular for doing all of that and more. Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn't been until recently that we've been able to observe first hand the amazing results that are possible. In 2016 we saw Google's AlphaGo beat the world Champion in Go. We saw AIs playing video games like Doom and Super Mario.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

@machinelearnbot

Genre:

Instructional Material > Course Syllabus & Notes (0.53)
Instructional Material > Online (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.83)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients

Khan, Arbaaz, Zhang, Clark, Lee, Daniel D., Kumar, Vijay, Ribeiro, Alejandro

arXiv.org Artificial IntelligenceMay-21-2018

In this paper, we explore using deep reinforcement learning for problems with multiple agents. Most existing methods for deep multi-agent reinforcement learning consider only a small number of agents. When the number of agents increases, the dimensionality of the input and control spaces increase as well, and these methods do not scale well. To address this, we propose casting the multi-agent reinforcement learning problem as a distributed optimization problem. Our algorithm assumes that for multi-agent settings, policies of individual agents in a given population live close to each other in parameter space and can be approximated by a single policy. With this simple assumption, we show our algorithm to be extremely effective for reinforcement learning in multi-agent settings. We demonstrate its effectiveness against existing comparable approaches on co-operative and competitive tasks.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1805.08776

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.56)

Add feedback

Evolutionary Reinforcement Learning

Khadka, Shauharda, Tumer, Kagan

arXiv.org Machine LearningMay-21-2018

Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters. Collectively, these challenges severely limit the applicability of these approaches to real world problems. Evolutionary Algorithms (EAs), a class of black box optimization techniques inspired by natural evolution, are well suited to address each of these three challenges. However, EAs typically suffer with high sample complexity and struggle to solve problems that require optimization of a large number of parameters. In this paper, we introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. ERL inherits EA's ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a population-based approach and complements it with off-policy DRL's ability to leverage gradients for higher sample efficiency and faster learning. Experiments in a range of challenging continuous control benchmark tasks demonstrate that ERL significantly outperforms prior DRL and EA methods, achieving state-of-the-art performances.

algorithm, arxiv preprint arxiv, exploration, (10 more...)

arXiv.org Machine Learning

1805.07917

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Oregon (0.04)
Europe > Italy > Apulia > Bari (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

AGI Safety Literature Review

Everitt, Tom, Lea, Gary, Hutter, Marcus

arXiv.org Artificial IntelligenceMay-21-2018

The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.

agent, artificial intelligence, intelligence, (15 more...)

arXiv.org Artificial Intelligence

1805.01109

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Oceania > Australia (0.14)
Asia > China (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(3 more...)

Add feedback

Monte Carlo Q-learning for General Game Playing

Wang, Hui, Emmerich, Michael, Plaat, Aske

arXiv.org Artificial IntelligenceMay-21-2018

After the recent groundbreaking results of AlphaGo, we have seen a strong interest in reinforcement learning in game playing. General Game Playing (GGP) provides a good testbed for reinforcement learning. In GGP, a specification of games rules is given. GGP problems can be solved by reinforcement learning. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee & Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex), to allow comparison to Banerjee et al. As expected, Q-learning converges, although much slower than MCTS. Borrowing an idea from MCTS, we enhance Q-learning with Monte Carlo Search, to give QM-learning. This enhancement improves the performance of pure Q-learning. We believe that QM-learning can also be used to improve performance of reinforcement learning further for larger games, something which we will test in future work.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1802.05944

Country:

Europe > Netherlands (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Go (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Verifiable Reinforcement Learning via Policy Extraction

Bastani, Osbert, Pu, Yewen, Solar-Lezama, Armando

arXiv.org Machine LearningMay-21-2018

While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.

decision tree policy, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1805.08328

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior

Reddy, Siddharth, Dragan, Anca D., Levine, Sergey

arXiv.org Machine LearningMay-21-2018

Inferring intent from observed behavior has been studied extensively within the frameworks of Bayesian inverse planning and inverse reinforcement learning. These methods infer a goal or reward function that best explains the actions of the observed agent, typically a human demonstrator. Another agent can use this inferred intent to predict, imitate, or assist the human user. However, a central assumption in inverse reinforcement learning is that the demonstrator is close to optimal. While models of suboptimal behavior exist, they typically assume that suboptimal actions are the result of some type of random noise or a known cognitive bias, like temporal inconsistency. In this paper, we take an alternative approach, and model suboptimal behavior as the result of internal model misspecification: the reason that user actions might deviate from near-optimal actions is that the user has an incorrect set of beliefs about the rules -- the dynamics -- governing how actions affect the environment. Our insight is that while demonstrated actions may be suboptimal in the real world, they may actually be near-optimal with respect to the user's internal model of the dynamics. By estimating these internal beliefs from observed behavior, we arrive at a new method for inferring intent. We demonstrate in simulation and in a user study with 12 participants that this approach enables us to more accurately model human intent, and can be used in a variety of applications, including offering assistance in a shared autonomy framework and inferring human preferences.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1805.0801

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Leisure & Entertainment (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents

Yu, Haonan, Lian, Xiaochen, Zhang, Haichao, Xu, Wei

arXiv.org Artificial IntelligenceMay-21-2018

Recently there has been a rising interest in training agents, embodied in virtual environments, to perform language-directed tasks by deep reinforcement learning. In this paper, we propose a simple but effective neural language grounding module for embodied agents that can be trained end to end from scratch taking raw pixels, unstructured linguistic commands, and sparse rewards as the inputs. We model the language grounding process as a language-guided transformation of visual features, where latent sentence embeddings are used as the transformation matrices. In several language-directed navigation tasks that feature challenging partial observation and require simple reasoning, our module significantly outperforms the state of the arts. We also release XWORLD 3D, an easy-to-customize 3D environment that can potentially be modified to evaluate a variety of embodied agents.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1805.08329

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Learning Safe Policies with Expert Guidance

Huang, Jessie, Wu, Fa, Precup, Doina, Cai, Yang

arXiv.org Artificial IntelligenceMay-21-2018

We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the "follow-the-perturbed-leader" algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. The trained agent safely avoids states with potential negative effects while imitating the behavior of the expert in the other states.

machine learning, reinforcement learning, separation oracle, (17 more...)

arXiv.org Artificial Intelligence

1805.08313

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hierarchical Reinforcement Learning with Hindsight

Levy, Andrew, Platt, Robert, Saenko, Kate

arXiv.org Artificial IntelligenceMay-21-2018

Reinforcement Learning (RL) algorithms can suffer from poor sample efficiency when rewards are delayed and sparse. We introduce a solution that enables agents to learn temporally extended actions at multiple levels of abstraction in a sample efficient and automated fashion. Our approach combines universal value functions and hindsight learning, allowing agents to learn policies belonging to different time scales in parallel. We show that our method significantly accelerates learning in a variety of discrete and continuous tasks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1805.0818

Genre: Research Report (0.82)

Industry:

Media > Television (0.47)
Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback