AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

RLgraph: Flexible Computation Graphs for Deep Reinforcement Learning

Schaarschmidt, Michael, Mika, Sven, Fricke, Kai, Yoneki, Eiko

arXiv.org Artificial IntelligenceOct-21-2018

Reinforcement learning (RL) tasks are challenging to implement, execute and test due to algorithmic instability, hyper-parameter sensitivity, and heterogeneous distributed communication patterns. We argue for the separation of logical component composition, backend graph definition, and distributed execution. To this end, we introduce RLgraph, a library for designing and executing high performance RL computation graphs in both static graph and define-by-run paradigms. The resulting implementations yield high performance across different deep learning frameworks and distributed backends.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1810.09028

Genre: Research Report (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Building Blocks of Reinforcement Learning: Deep Open Sources TRFL

#artificialintelligenceOct-20-2018, 05:54:57 GMT

Deep reinforcement learning(DRL) has been categorized many times as the future of artificial intelligence(AI). Some of the most important AI breakthroughs of the last few years such as DeepMind's AlphaGo or OpenAI's Dota Five have been based on DRL applications. Despite its importance, the implementation of DRL models remains an incredibly challenging exercise and, for the most part, we have very little ideas about the pieces that make an efficient DRL solution. Earlier this week, DeepMind open sourced TRFL(pronounced truffle, of course), a framework that compiles a series of useful building blocks of DRL models. Most of the current wave of DRL methods have had their origin in the academic environments and they haven't been tested in real world implementations.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supervising strong learners by amplifying weak experts

Christiano, Paul, Shlegeris, Buck, Amodei, Dario

arXiv.org Artificial IntelligenceOct-19-2018

Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., 2017; Silver et al., 2017b), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1810.08575

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Transfer Learning versus Multi-agent Learning regarding Distributed Decision-Making in Highway Traffic

Schutera, Mark, Goby, Niklas, Neumann, Dirk, Reischl, Markus

arXiv.org Artificial IntelligenceOct-19-2018

Transportation and traffic are currently undergoing a rapid increase in terms of both scale and complexity. At the same time, an increasing share of traffic participants are being transformed into agents driven or supported by artificial intelligence resulting in mixed-intelligence traffic. This work explores the implications of distributed decision-making in mixed-intelligence traffic. The investigations are carried out on the basis of an online-simulated highway scenario, namely the MIT \emph{DeepTraffic} simulation. In the first step traffic agents are trained by means of a deep reinforcement learning approach, being deployed inside an elitist evolutionary algorithm for hyperparameter search. The resulting architectures and training parameters are then utilized in order to either train a single autonomous traffic agent and transfer the learned weights onto a multi-agent scenario or else to conduct multi-agent learning directly. Both learning strategies are evaluated on different ratios of mixed-intelligence traffic. The strategies are assessed according to the average speed of all agents driven by artificial intelligence. Traffic patterns that provoke a reduction in traffic flow are analyzed with respect to the different strategies.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1810.08515

Country: Europe > Germany > Baden-Württemberg (0.15)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ProMP: Proximal Meta-Policy Search

Rothfuss, Jonas, Lee, Dennis, Clavera, Ignasi, Asfour, Tamim, Abbeel, Pieter

arXiv.org Machine LearningOct-17-2018

Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1810.06784

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Multi-Agent Fully Decentralized Off-Policy Learning with Linear Convergence Rates

Cassano, Lucas, Yuan, Kun, Sayed, Ali H.

arXiv.org Machine LearningOct-17-2018

In this paper we develop a fully decentralized algorithm for policy evaluation with off-policy learning, linear function approximation, and $O(n)$ complexity in both computation and memory requirements. The proposed algorithm is of the variance reduced kind and achieves linear convergence. We consider the case where a collection of agents have distinct and fixed size datasets gathered following different behavior policies (none of which is required to explore the full state space) and they all collaborate to evaluate a common target policy. The network approach allows all agents to converge to the optimal solution even in situations where neither agent can converge on its own without cooperation. We provide simulations to illustrate the effectiveness of the method.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1810.07792

Country:

Europe (0.92)
North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > Canada > Quebec (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Holodeck - High Fidelity Simulator for Reinforcement Learning and Robotics Research.

#artificialintelligenceOct-16-2018, 13:36:28 GMT

Here you are presented to the release the first public version of a high-fidelity simulator that has been built on the top of Unreal Engine 4 (UE4) called Holodeck, a python package that can be made use for research, classes or even fun! Holodeck is a python package which provides its users with the ability to download pre-built worlds, and also interact with them through a simple, high-level interface. At present, the release comprises of a simple sphere robot, a UAV (quadcopter), an Android, and a navigation agent. It also comes with 6 diverse default worlds. On what principles is Holodeck built in?

artificial intelligence, machine learning, reinforcement learning, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Add feedback

Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories

Zhang, Yanfu, Wang, Wenshan, Bonatti, Rogerio, Maturana, Daniel, Scherer, Sebastian

arXiv.org Artificial IntelligenceOct-16-2018

Predicting the motion of a mobile agent from a third-person perspective is an important component for many robotics applications, such as autonomous navigation and tracking. With accurate motion prediction of other agents, robots can plan for more intelligent behaviors to achieve specified objectives, instead of acting in a purely reactive way. Previous work addresses motion prediction by either only filtering kinematics, or using hand-designed and learned representations of the environment. Instead of separating kinematic and environmental context, we propose a novel approach to integrate both into an inverse reinforcement learning (IRL) framework for trajectory prediction. Instead of exponentially increasing the state-space complexity with kinematics, we propose a two-stage neural network architecture that considers motion and environment together to recover the reward function. The first-stage network learns feature representations of the environment using low-level LiDAR statistics and the second-stage network combines those learned features with kinematics data. We collected over 30 km of off-road driving data and validated experimentally that our method can effectively extract useful environmental and kinematic features. We generate accurate predictions of the distribution of future trajectories of the vehicle, encoding complex behaviors such as multi-modal distributions at road intersections, and even show different predictions at the same intersection depending on the vehicle's speed.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

1810.07225

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Simple Regret Minimization for Contextual Bandits

Deshmukh, Aniket Anand, Sharma, Srinagesh, Cutler, James W., Moldwin, Mark, Scott, Clayton

arXiv.org Machine LearningOct-16-2018

There are two variants of the classical multi-armed bandit (MAB) problem that have received considerable attention from machine learning researchers in recent years: contextual bandits and simple regret minimization. Contextual bandits are a sub-class of MABs where, at every time step, the learner has access to side information that is predictive of the best arm. Simple regret minimization assumes that the learner only incurs regret after a pure exploration phase. In this work, we study simple regret minimization for contextual bandits. Motivated by applications where the learner has separate training and autonomous modes, we assume that, the learner experiences a pure exploration phase, where feedback is received after every action but no regret is incurred, followed by a pure exploitation phase in which regret is incurred but there is no feedback. We present the Contextual-Gap algorithm and establish performance guarantees on the simple regret, i.e., the regret during the pure exploitation phase. Our experiments examine a novel application to adaptive sensor selection for magnetic field estimation in interplanetary spacecraft, and demonstrate considerable improvement over algorithms designed to minimize the cumulative regret.

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1810.07371

Country: North America > United States > Michigan (0.28)

Genre: Research Report > Experimental Study (0.34)

Industry: Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

At Human Speed: Deep Reinforcement Learning with Action Delay

Firoiu, Vlad, Ju, Tina, Tenenbaum, Josh

arXiv.org Artificial IntelligenceOct-16-2018

There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of tasks, from video games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning and reinforcement learning, that learn to play from experience with minimal prior knowledge. However, these machines often do not win through intelligence alone -- they possess vastly superior speed and precision, allowing them to act in ways a human never could. To level the playing field, we restrict the machine's reaction time to a human level, and find that standard deep reinforcement learning methods quickly drop in performance. We propose a solution to the action delay problem inspired by human perception -- to endow agents with a neural predictive model of the environment which "undoes" the delay inherent in their environment -- and demonstrate its efficacy against professional players in Super Smash Bros. Melee, a popular console fighting game.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1810.07286

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback