AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Elon Musk's lab forced bots to create their own language

#artificialintelligenceMar-20-2017, 01:20:10 GMT

Have you ever experienced the dread of overhearing two people, speaking a language you don't understand, begin laughing wildly? You just have to wonder what it is they're talking about, and if it's a joke at your expense. Heck, maybe you even check your teeth to make sure you aren't walking around with half of your lunchtime ham sandwich stuck to your gums. As Wired reports, researchers at OpenAI have made some huge strides in getting bots to communicate with each other, and without actually telling them how to do so. The group published a research paper earlier this week explaining exactly how they were able to accomplish the complex task, and it's all based on reinforcement learning.

machine learning, natural language, reinforcement learning, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Learning from the Hindsight Plan -- Episodic MPC Improvement

Tamar, Aviv, Thomas, Garrett, Zhang, Tianhao, Levine, Sergey, Abbeel, Pieter

arXiv.org Artificial IntelligenceMar-20-2017

Model predictive control (MPC) is a popular control method that has proved effective for robotics, among other fields. MPC performs re-planning at every time step. Re-planning is done with a limited horizon per computational and real-time constraints and often also for robustness to potential model errors. However, the limited horizon leads to suboptimal performance. In this work, we consider the iterative learning setting, where the same task can be repeated several times, and propose a policy improvement scheme for MPC. The main idea is that between executions we can, offline, run MPC with a longer horizon, resulting in a hindsight plan. To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan. This effectively consolidates long-term reasoning into the short-horizon planning. We empirically evaluate our approach in contact-rich manipulation tasks both in simulated and real environments, such as peg insertion by a real PR2 robot.

downstream oil & gas, mpc, neural network, (22 more...)

arXiv.org Artificial Intelligence

1609.09001

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Downstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback

A Survey of Available Corpora for Building Data-Driven Dialogue Systems

Serban, Iulian Vlad, Lowe, Ryan, Henderson, Peter, Charlin, Laurent, Pineau, Joelle

arXiv.org Artificial IntelligenceMar-20-2017

During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.

information retrieval, machine learning, reinforcement learning, (25 more...)

arXiv.org Artificial Intelligence

1512.05742

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Media > Television (1.00)
Media > Film (1.00)
Health & Medicine (1.00)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(13 more...)

Add feedback

Value Iteration Networks

Tamar, Aviv, Wu, Yi, Thomas, Garrett, Levine, Sergey, Abbeel, Pieter

arXiv.org Artificial IntelligenceMar-20-2017

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1602.02867

Genre: Research Report > New Finding (0.68)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Top 10 technologies for 2017

FOX NewsMar-17-2017, 18:10:35 GMT

The technologies making waves in 2017 include brain implants and quantum computers. Here is a list of the top 10 technologies that are expected to be prevalent this year, according to MIT. At the top of the list is behavior-reinforced artificial intelligence. Whether that's mastering the complex game of Go and beating a champion or learning to merge a self-driving car into traffic. The technology is based on reinforcement learning, documented more than a 100 years ago by psychologist Edward Thorndike.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

FOX News

Country:

Asia > Middle East > Syria (0.06)
Asia > China (0.06)

Genre: Research Report > Promising Solution (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.37)
Energy > Renewable > Solar (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)

Add feedback

[1509.03044] Recurrent Reinforcement Learning: A Hybrid Approach

@machinelearnbotMar-17-2017, 16:35:09 GMT

Which authors of this paper are endorsers? Disable MathJax (What is MathJax?)

artificial intelligence, machine learning, recurrent reinforcement learning, (3 more...)

@machinelearnbot

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Add feedback

Revisiting stochastic off-policy action-value gradients

Okesanjo, Yemi, Kofia, Victor

arXiv.org Machine LearningMar-12-2017

A BSTRACT Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural gradient actor-critic algorithms and more recently within the context of deterministic policy gradients. In this paper we briefly discuss the off-policy stochastic counterpart to deterministic action-value gradients, as well as an incremental approach for following the policy gradient in lieu of the natural gradient.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1703.02102

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Anschel, Oron, Baram, Nir, Shimkin, Nahum

arXiv.org Artificial IntelligenceMar-10-2017

Instability and variability of Deep Reinforcement Learning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within a simplified model. We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

1611.01929

Country: Asia > Middle East (0.28)

Genre: Research Report (0.40)

Industry: Education (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

What can you do with a rock? Affordance extraction via word embeddings

Fulda, Nancy, Ricks, Daniel, Murdoch, Ben, Wingate, David

arXiv.org Artificial IntelligenceMar-9-2017

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance detection is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance most of the time. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1703.03429

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games > Computer Games (0.95)
Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sample Efficient Feature Selection for Factored MDPs

Guo, Zhaohan Daniel, Brunskill, Emma

arXiv.org Machine LearningMar-9-2017

In reinforcement learning, the state of the real world is often represented by feature vectors. However, not all of the features may be pertinent for solving the current task. We propose Feature Selection Explore and Exploit (FS-EE), an algorithm that automatically selects the necessary features while learning a Factored Markov Decision Process, and prove that under mild assumptions, its sample complexity scales with the in-degree of the dynamics of just the necessary features, rather than the in-degree of all features. This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1703.03454

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback