AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Machine Theory of Mind

Rabinowitz, Neil C., Perbet, Frank, Song, H. Francis, Zhang, Chiyuan, Eslami, S. M. Ali, Botvinick, Matthew

arXiv.org Artificial IntelligenceFeb-21-2018

Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans' ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network -- a ToMnet -- which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone. Through this process, it acquires a strong prior model for agents' behaviour, as well as the ability to bootstrap to richer predictions about agents' characteristics and mental states using only a small number of behavioural observations. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep reinforcement learning agents from varied populations, and that it passes classic ToM tasks such as the "Sally-Anne" test (Wimmer & Perner, 1983; Baron-Cohen et al., 1985) of recognising that others can hold false beliefs about the world. We argue that this system -- which autonomously learns how to model other agents in its world -- is an important step forward for developing multi-agent AI systems, for building intermediating technology for machine-human interaction, and for advancing the progress on interpretable AI.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1802.0774

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning to Gather without Communication

Mhamdi, El Mahdi El, Guerraoui, Rachid, Maurer, Alexandre, Tempez, Vladislav

arXiv.org Machine LearningFeb-21-2018

A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rules can actually emerge in a group in a short span of time via collective (multi-agent) reinforcement learning, i.e learning via rewards and punishments. We consider the gathering problem: several agents (social animals, swarming robots...) must gather around a same position, which is not determined in advance. They must do so without communication on their planned decision, just by looking at the position of other agents. We present the first experimental evidence that a gathering behavior can be learned without communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to tolerate the brutal loss of up to 90\% of agents without significant impact on the behavior.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1802.07834

Country:

North America > United States > Massachusetts (0.46)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)

Add feedback

Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

Zintgraf, Luisa M, Roijers, Diederik M, Linders, Sjoerd, Jonker, Catholijn M, Nowé, Ann

arXiv.org Machine LearningFeb-21-2018

In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.

decision support system, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1802.07606

Country:

Europe > Belgium (0.28)
Europe > Netherlands > North Holland > Amsterdam (0.25)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
(5 more...)

Add feedback

Clipped Action Policy Gradient

Fujita, Yasuhiro, Maeda, Shin-ichi

arXiv.org Machine LearningFeb-21-2018

Many continuous control tasks have bounded action spaces and clip out-of-bound actions before execution. Policy gradient methods often optimize policies as if actions were not clipped. We propose clipped action policy gradient (CAPG) as an alternative policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that CAPG is unbiased and achieves lower variance than the original estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the original estimator, indicating its promise as a better policy gradient estimator for continuous control tasks.

machine learning, reinforcement learning, variance, (14 more...)

arXiv.org Machine Learning

1802.07564

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

Maei, Hamid Reza

arXiv.org Artificial IntelligenceFeb-21-2018

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to the-curse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-action value functions (Q functions). Using state-value functions helps to lift the curse and as a result naturally turn our policy-gradient solution into classical Actor-Critic architecture whose Actor uses state-value function for the update. Our algorithms, Gradient Actor-Critic and Emphatic Actor-Critic, are derived based on the exact gradient of averaged state-value function objective and thus are guaranteed to converge to its optimal solution, while maintaining all the desirable properties of classical Actor-Critic methods with no additional hyper-parameters. To our knowledge, this is the first time that convergent off-policy learning methods have been extended to classical Actor-Critic methods with function approximation.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1802.07842

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.84)

Add feedback

Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces

Faury, Louis, Vasile, Flavian

arXiv.org Machine LearningFeb-20-2018

Learning to optimize - the idea that we can learn from data algorithms that optimize a numerical criterion - has recently been at the heart of a growing number of research efforts. One of the most challenging issues within this approach is to learn a policy that is able to optimize over classes of functions that are fairly different from the ones that it was trained on. We propose a novel way of framing learning to optimize as a problem of learning a good navigation policy on a partially observable loss surface. To this end, we develop Rover Descent, a solution that allows us to learn a fairly broad optimization policy from training on a small set of prototypical two-dimensional surfaces that encompasses the classically hard cases such as valleys, plateaus, cliffs and saddles and by using strictly zero-order information. We show that, without having access to gradient or curvature information, we achieve state-of-the-art convergence speed on optimization problems not presented at training time such as the Rosenbrock function and other hard cases in two dimensions. We extend our framework to optimize over high dimensional landscapes, while still handling only two-dimensional local landscape information and show good preliminary results.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1801.07222

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
(2 more...)

Add feedback

Predict Responsibly: Increasing Fairness by Learning To Defer

Madras, David, Pitassi, Toniann, Zemel, Richard

arXiv.org Machine LearningFeb-20-2018

In many high-stakes ML applications, there are multiple decision-makers involved, both automated and human. The interaction between these agents often goes unaddressed in algorithmic development. In this work, we explore a simple version of this interaction with a two-stage framework containing an automated model and an external decision-maker. The model can choose to say IDK, and pass the decision downstream, as explored in rejection learning. We extend this concept by proposing learning to defer, which generalizes the rejection learning framework by considering the effect of the other agents in the decision-making process. We propose a learning algorithm which accounts for potential biases held by external decision-makers in a system. Experiments on real-world datasets demonstrate that learning to defer can make a system not only more accurate but also less biased. Even when operated by highly biased users, we show that deferring models can still greatly improve the fairness of the entire system.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1711.06664

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

What is Artificial General Intelligence? – Towards Data Science

#artificialintelligenceFeb-19-2018, 10:42:32 GMT

Artificial Intelligence is a branch of Computer Science ( or Science) which deals with the creation of intelligent systems. Intelligent systems are those systems which posses intelligence just like humans. The science of Artificial intelligence is not new, The term Artificial intelligence has been mentioned in manuscripts of Ancient Greece and Egypt. Greeks believed in god Hephaestus, also known as God of Blacksmiths, according to a Greek mythology Hephaestus made intelligent weapons for all Gods, in their view, the goal of Artificial intelligence is to: be helpful for people to achieve a certain goal, be able to operate automatically and be programmed in advance to react in different ways depending on the situation. Well, The term Artificial Intelligence has become popular in the field of Entertainment, we can see lots of movies based on the concept of Super intelligence.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Country:

Europe > Greece (0.25)
Africa > Middle East > Egypt (0.25)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Reinforcement learning woes, robot doggos, Amazon's homegrown AI chips, and more

#artificialintelligenceFeb-18-2018, 22:07:56 GMT

Here's a brief roundup of some interesting news from the AI world from the past two weeks, beyond what we've already reported. TL;DR: Deep RL sucks – A Google engineer has published a long, detailed blog post explaining the current frustrations in deep reinforcement learning, and why it doesn't live up to the hype. Reinforcement learning makes good headlines. Teaching agents to play games like Go well enough to beat human experts like Ke Jie fuels the man versus machine narrative. But a closer look at deep reinforcement learning, a method of machine learning used to train computers to complete a specific task, shows the practice is riddled with problems.

amazon, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Introduction to Various Reinforcement Learning Algorithms. Part I (Q-Learning, SARSA, DQN, DDPG)

#artificialintelligenceFeb-18-2018, 18:05:15 GMT

Typically, a RL setup is composed of two components, an agent and an environment. Then environment refers to the object that the agent is acting on (e.g. the game itself in the Atari game), while the agent represents the RL algorithm. The environment starts by sending a state to the agent, which then based on its knowledge to take an action in response to that state. After that, the environment send a pair of next state and reward back to the agent. The agent will update its knowledge with the reward returned by the environment to evaluate its last action.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback