AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Meta Reinforcement Learning with Task Embedding and Shared Policy

Lan, Lin, Li, Zhenguo, Guan, Xiaohong, Wang, Pinghui

arXiv.org Artificial IntelligenceJun-3-2019

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.

machine learning, reinforcement learning, task encoder, (13 more...)

arXiv.org Artificial Intelligence

1905.06527

Country: Asia > China (0.69)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Extending Deep Model Predictive Control with Safety Augmented Value Estimation from Demonstrations

Thananjeyan, Brijen, Balakrishna, Ashwin, Rosolia, Ugo, Li, Felix, McAllister, Rowan, Gonzalez, Joseph E., Levine, Sergey, Borrelli, Francesco, Goldberg, Ken

arXiv.org Artificial IntelligenceJun-2-2019

Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering a dense cost function, which can lead to unintended behavior, and dynamical uncertainty, which makes it hard to enforce constraints during learning. We address these issues with a new model-based reinforcement learning algorithm, safety augmented value estimation from demonstrations (SAVED), which uses supervision that only identifies task completion and a modest set of suboptimal demonstrations to constrain exploration and learn efficiently while handling complex constraints. We derive iterative improvement guarantees for SAVED under known stochastic nonlinear systems. We then compare SAVED with 3 state-of-the-art model-based and model-free RL algorithms on 6 standard simulation benchmarks involving navigation and manipulation and 2 real-world tasks on the da Vinci surgical robot. Results suggest that SAVED outperforms prior methods in terms of success rate, constraint satisfaction, and sample efficiency, making it feasible to safely learn complex maneuvers directly on a real robot in less than an hour. For tasks on the robot, baselines succeed less than 5% of the time while SAVED has a success rate of over 75% in the first 50 training iterations.

constraint, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1905.13402

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Industry: Health & Medicine (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On the Correctness and Sample Complexity of Inverse Reinforcement Learning

Komanduru, Abi, Honorio, Jean

arXiv.org Machine LearningJun-2-2019

Inverse reinforcement learning (IRL) is the problem of finding a reward function that generates a given optimal policy for a given Markov Decision Process. This paper looks at an algorithmic-independent geometric analysis of the IRL problem with finite states and actions. A L1-regularized Support Vector Machine formulation of the IRL problem motivated by the geometric analysis is then proposed with the basic objective of the inverse reinforcement problem in mind: to find a reward function that generates a specified optimal policy. The paper further analyzes the proposed formulation of inverse reinforcement learning with $n$ states and $k$ actions, and shows a sample complexity of $O(n^2 \log (nk))$ for recovering a reward function that generates a policy that satisfies Bellman's optimality condition with respect to the true transition probabilities.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1906.00422

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Feature-Based Q-Learning for Two-Player Stochastic Games

Jia, Zeyu, Yang, Lin F., Wang, Mengdi

arXiv.org Machine LearningJun-2-2019

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space. We propose a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling. The algorithm is shown to find an $\epsilon$-optimal strategy using sample size linear to the number of features. To further improve its sample efficiency, we develop an accelerated algorithm by adopting techniques such as variance reduction, monotonicity preservation and two-sided strategy approximation. We prove that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsilon^{2}(1-\gamma)^{4}))$ samples with high probability, where $K$ is the number of features and $\gamma$ is a discount factor. The sample, time and space complexities of the algorithm are independent of original dimensions of the game.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1906.00423

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

Song, Xingyou, Du, Yilun, Jackson, Jacob

arXiv.org Artificial IntelligenceJun-2-2019

Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains. A key challenge for RL generalization is to quantitatively explain the effects of changing parameters on testing performance. Such parameters include architecture, regularization, and RL-dependent variables such as discount factor and action stochasticity. We provide empirical results that show complex and interdependent relationships between hyperparameters and generalization. We further show that several empirical metrics such as gradient cosine similarity and trajectory-dependent metrics serve to provide intuition towards these results.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1906.00431

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints

Tschiatschek, Sebastian, Ghosh, Ahana, Haug, Luis, Devidze, Rati, Singla, Adish

arXiv.org Artificial IntelligenceJun-2-2019

Inverse reinforcement learning (IRL) enables an agent to learn complex behavior by observing demonstrations from a (near-)optimal policy. The typical assumption is that the learner's goal is to match the teacher's demonstrated behavior. In this paper, we consider the setting where the learner has her own preferences that she additionally takes into consideration. These preferences can for example capture behavioral biases, mismatched worldviews, or physical constraints. We study two teaching approaches: learner-agnostic teaching, where the teacher provides demonstrations from an optimal policy ignoring the learner's preferences, and learner-aware teaching, where the teacher accounts for the learner's preferences. We design learner-aware teaching algorithms and show that significant performance improvements can be achieved over learner-agnostic teaching.

learner, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1906.00429

Genre: Research Report (0.64)

Industry: Education > Educational Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Top 5 Programming Languages For Machine Learning

#artificialintelligenceJun-1-2019, 14:14:48 GMT

Machine learning has been defined by Andrew Ng, a computer scientist at Stanford University, as "the science of getting computers to act without being explicitly programmed." It was first conceived in the 1950s, but experienced limited progress until around the turn of the 21st century. Since then, machine learning has been a driving force behind a number of innovations, most notably artificial intelligence. Machine learning can be broken down into several categories, including supervised, unsupervised, semi-supervised and reinforcement learning. While supervised learning relies on labeled input data in order to infer its relationships with output results, unsupervised learning detects patterns among unlabeled input data. Semi-supervised learning employs a combination of both methods, and reinforcement learning motivates programs to repeat or elaborate on processes with desirable outcomes while avoiding errors.

machine learning, programming language, reinforcement learning, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

An AI taught itself to play a video game – for the first time, it's beating humans

#artificialintelligenceJun-1-2019, 14:04:25 GMT

Since the earliest days of virtual chess and solitaire, video games have been a playing field for developing artificial intelligence (AI). Each victory of machine against human has helped make algorithms smarter and more efficient. But in order to tackle real world problems – such as automating complex tasks including driving and negotiation – these algorithms must navigate more complex environments than board games, and learn teamwork. Teaching AI how to work and interact with other players to succeed had been an insurmountable task – until now. In a new study, researchers detailed a way to train AI algorithms to reach human levels of performance in a popular 3D multiplayer game – a modified version of Quake III Arena in Capture the Flag mode.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (0.76)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Harnessing Reinforcement Learning for Neural Motion Planning

Jurgenson, Tom, Tamar, Aviv

arXiv.org Machine LearningJun-1-2019

Motion planning is an essential component in most of today's robotic applications. In this work, we consider the learning setting, where a set of solved motion planning problems is used to improve the efficiency of motion planning on different, yet similar problems. This setting is important in applications with rapidly changing environments such as in e-commerce, among others. We investigate a general deep learning based approach, where a neural network is trained to map an image of the domain, the current robot state, and a goal robot state to the next robot state in the plan. We focus on the learning algorithm, and compare supervised learning methods with reinforcement learning (RL) algorithms. We first establish that supervised learning approaches are inferior in their accuracy due to insufficient data on the boundary of the obstacles, an issue that RL methods mitigate by actively exploring the domain. We then propose a modification of the popular DDPG RL algorithm that is tailored to motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data. We show that our algorithm, dubbed DDPG-MP, significantly improves the accuracy of the learned motion planning policy. Finally, we show that given enough training data, our method can plan significantly faster on novel domains than off-the-shelf sampling based motion planners. Results of our experiments are shown in https://youtu.be/wHQ4Y4mBRb8.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1906.00214

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

The Principle of Unchanged Optimality in Reinforcement Learning Generalization

Irpan, Alex, Song, Xingyou

arXiv.org Artificial IntelligenceJun-1-2019

Several recent papers have examined generalization in reinforcement learning (RL), by proposing new environments or ways to add noise to existing environments, then benchmarking algorithms and model architectures on those environments. We discuss subtle conceptual properties of RL benchmarks that are not required in supervised learning (SL), and also properties that an RL benchmark should possess. Chief among them is one we call the principle of unchanged optimality: there should exist a single $\pi$ that is optimal across all train and test tasks. In this work, we argue why this principle is important, and ways it can be broken or satisfied due to subtle choices in state representation or model architecture. We conclude by discussing challenges and future lines of research in theoretically analyzing generalization benchmarks.

benchmark, generalization, unchanged optimality, (12 more...)

arXiv.org Artificial Intelligence

1906.00336

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback