AITopics

1705.05427

Country: North America > United States > Michigan (0.28)

Genre:

Research Report (0.64)
Workflow (0.54)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Viejo, Guillaume, Girard, Benoît, Procyk, Emmanuel, Khamassi, Mehdi

Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving task

arXiv.org Artificial IntelligenceNov-2-2017

Accumulating evidence suggest that human behavior in trial-and-error learning tasks based on decisions between discrete actions may involve a combination of reinforcement learning (RL) and working-memory (WM). While the understanding of brain activity at stake in this type of tasks often involve the comparison with non-human primate neurophysiological results, it is not clear whether monkeys use similar combined RL and WM processes to solve these tasks. Here we analyzed the behavior of five monkeys with computational models combining RL and WM. Our model-based analysis approach enables to not only fit trial-by-trial choices but also transient slowdowns in reaction times, indicative of WM use. We found that the behavior of the five monkeys was better explained in terms of a combination of RL and WM despite inter-individual differences. The same coordination dynamics we used in a previous study in humans best explained the behavior of some monkeys while the behavior of others showed the opposite pattern, revealing a possible different dynamics of WM process. We further analyzed different variants of the tested models to open a discussion on how the long pretraining in these tasks may have favored particular coordination dynamics between RL and WM. This points towards either inter-species differences or protocol differences which could be further tested in humans.

machine learning, reaction time, reinforcement learning, (17 more...)

doi: 10.1016/j.bbr.2017.09.030

1711.00698

Country:

Europe > United Kingdom > England (0.46)
North America > Canada (0.28)
Europe > France (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-2-2017

Shallow Updates for Deep Reinforcement Learning

Levine, Nir, Zahavy, Tom, Mankowitz, Daniel J., Tamar, Aviv, Mannor, Shie

Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1705.07461

Country:

North America > United States (0.46)
Asia > Middle East > Israel (0.15)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Iwaki, Ryo, Asada, Minoru

On- and Off-Policy Monotonic Policy Improvement

arXiv.org Machine LearningNov-1-2017

Monotonic policy improvement and off-policy learning are two main desirable properties for reinforcement learning algorithms. In this paper, by lower bounding the performance difference of two policies, we show that the monotonic policy improvement is guaranteed from on- and off-policy mixture samples. An optimization procedure which applies the proposed bound can be regarded as an off-policy natural policy gradient method. In order to support the theoretical result, we provide a trust region policy optimization method using experience replay as a naive application of our bound, and evaluate its performance in two classical benchmark problems.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1710.03442

Country: Asia > Japan (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Lawson, Dieterich, Chiu, Chung-Cheng, Tucker, George, Raffel, Colin, Swersky, Kevin, Jaitly, Navdeep

Learning Hard Alignments with Variational Inference

arXiv.org Artificial IntelligenceNov-1-2017

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

1705.05524

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Sridharan, Mohan (The University of Auckland)

Integrating Knowledge Representation, Reasoning, and Learning for Human-Robot Interaction

Robots interacting with humans often have to represent and reason with different descriptions of incomplete domain knowledge and uncertainty, and revise this knowledge over time. Towards achieving these capabilities, the architecture described in this paper combines the complementary strengths of declarative programming, probabilistic graphical models, and reinforcement learning. For any given goal, non-monotonic logical reasoning with a coarse-resolution representation of the domain is used to compute a tentative plan of abstract actions. Each abstract action is implemented as a sequence of concrete actions by reasoning probabilistically over the relevant part of a fine-resolution representation tightly-coupled to the coarse-resolution representation. The outcomes of executing the concrete actions are used for subsequenct reasoning at the coarse resolution. Furthermore, the task of interactively learning axioms governing action capabilities, preconditions and effects, is posed as a relational reinforcement learning problem, using decision tree regression and sampling to construct and generalize over candidate axioms. These capabilities are illustrated in simulation and on a physical robot moving objects to specific people or locations in an indoor domain.

human-robot interaction, integrating knowledge representation, reasoning, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.40)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.40)

Stocco, Andrea (University of Washington)

An Integrated Computational Framework for Attention, Reinforcement Learning, and Working Memory

This paper proposes a reinterpretation of selective attention as a form of control of working memory based on self-generated reward signals and model-free reinforcement learning. In addition to being simple and parsimonious, this approach systematizes a number of classic psychological constructs without calling for additional, specific mechanisms. Finally, the papers presents the results of an empirical test of this framework, and elaborates on the implications of our findings for general models of control and intelligent behavior, as well as neurobiological models of the basal ganglia.

integrated computational framework, machine learning, reinforcement learning, (2 more...)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)
Information Technology > Artificial Intelligence > Cognitive Science (0.53)

A Framework Using Machine Vision and Deep Reinforcement Learning for Self-Learning Moving Objects in a Virtual Environment

Wu, Richard (University of Massachusetts Dartmouth) | Zhao, Ying (Naval Postgraduate School) | Clarke, Alan (Naval Postgraduate School) | Kendall, Anthony (Naval Postgraduate School)

In recent artificial intelligence (AI) research, convolutional neural networks (CNNs) can create artificial agents capable of self-learning. Self-learning autonomous moving objects utilize machine vision techniques based on processing and recognizing objects in digital images. Afterwards, deep reinforcement learning (Deep-RL) is applied to understand and learn intelligent actions and controls. The objective of our research is to study methods and designs on how machine vision and deep machine learning algorithms can be implemented in a virtual world (e.g., a computer game) for moving objects (e.g., vehicles or aircrafts) to improve their navigation and detection of threats in real life. In this paper, we create a framework for generating and using data from computer games to be used in CNNs and Deep-RL to perform intelligent actions. We show the initial results of applying the framework and identify various military applications that may benefit from this research.

artificial intelligence, machine learning, vision and deep reinforcement learning, (4 more...)

Industry: Leisure & Entertainment > Games > Computer Games (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Dasgupta, Prithviraj (University of Nebraska at Omaha) | Collins, Joseph (Naval Research Laboratory)

Position Paper: Towards a Repeated Bayesian Stackelberg Game Model for Robustness Against Adversarial Learning

machine learning, reinforcement learning, repeated bayesian stackelberg game model, (5 more...)

In this position paper, we propose a game theoretic formulation of the adversarial learning problem called a RepeatedBayesian Stackelberg Game (RBSG) that can be used by aprediction mechanism to make itself robust against adversarial examples.

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Toward Supervised Reinforcement Learning with Partial States for Social HRI

Senft, Emmanuel (Plymouth University) | Lemaignan, Séverin (Plymouth University) | Baxter, Paul (University of Lincoln) | Belpaeme, Tony (Plymouth University)

Social interacting is a complex task for which machine learning holds particular promise. However, as no sufficiently accurate simulator of human interactions exists today, the learning of social interaction strategies has to happen online in the real world. Actions executed by the robot impact on humans, and as such have to be carefully selected, making it impossible to rely on random exploration. Additionally, no clear reward function exists for social interactions. This implies that traditional approaches used for Reinforcement Learning cannot be directly applied for learning how to interact with the social world. As such we argue that robots will profit from human expertise and guidance to learn social interactions. However, as the quantity of input a human can provide is limited, new methods have to be designed to use human input more efficiently. In this paper we describe a setup in which we combine a framework called Supervised Progressively Autonomous Robot Competencies (SPARC), which allows safer online learning with Reinforcement Learning, with the use of partial states rather than full states to accelerate generalisation and obtain a usable action policy more quickly.

artificial intelligence, machine learning, supervised reinforcement learning, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)