AITopics

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsDec-31-2018

On Learning Intrinsic Rewards for Policy Gradient Methods

Zheng, Zeyu, Oh, Junhyuk, Singh, Satinder

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

computer game, intrinsic reward, neural network, (20 more...)

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.69)

Industry: Leisure & Entertainment > Games > Computer Games (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

arXiv.org Artificial IntelligenceDec-3-2018

Generative Adversarial Self-Imitation Learning

Guo, Yijie, Oh, Junhyuk, Singh, Satinder, Lee, Honglak

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.

artificial intelligence, reinforcement learning, trajectory, (13 more...)

1812.0095

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceAug-24-2018

Learning End-to-End Goal-Oriented Dialog with Multiple Answers

Rajendran, Janarthanan, Ganhotra, Jatin, Singh, Satinder, Polymenakos, Lazaros

In a dialog, there can be multiple valid next utterances at any point. The present end-to-end neural methods for dialog do not take this into account. They learn with the assumption that at any time there is only one correct next utterance. In this work, we focus on this problem in the goal-oriented dialog setting where there are different paths to reach a goal. We propose a new method, that uses a combination of supervised learning and reinforcement learning approaches to address this issue. We also propose a new and more effective testbed, permuted-bAbI dialog tasks, by introducing multiple valid next utterances to the original-bAbI dialog tasks, which allows evaluation of goal-oriented dialog systems in a more realistic setting. We show that there is a significant drop in performance of existing end-to-end neural methods from 81.5% per-dialog accuracy on original-bAbI dialog tasks to 30.3% on permuted-bAbI dialog tasks. We also show that our proposed method improves the performance and achieves 47.3% per-dialog accuracy on permuted-bAbI dialog tasks.

deep learning, dialog task, neural network, (19 more...)

1808.09996

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJun-22-2018

Many-Goals Reinforcement Learning

Veeriah, Vivek, Oh, Junhyuk, Singh, Satinder

All-goals updating exploits the off-policy nature of Q-learning to update all possible goals an agent could have from each transition in the world, and was introduced into Reinforcement Learning (RL) by Kaelbling (1993). In prior work this was mostly explored in small-state RL problems that allowed tabular representations and where all possible goals could be explicitly enumerated and learned separately. In this paper we empirically explore 3 different extensions of the idea of updating many (instead of all) goals in the context of RL with deep neural networks (or DeepRL for short). First, in a direct adaptation of Kaelbling's approach we explore if many-goals updating can be used to achieve mastery in non-tabular visual-observation domains. Second, we explore whether many-goals updating can be used to pre-train a network to subsequently learn faster and better on a single main task of interest. Third, we explore whether many-goals updating can be used to provide auxiliary task updates in training a network to learn faster and better on a single main task of interest. We provide comparisons to baselines for each of the 3 extensions.

agent, computer game, neural network, (20 more...)

1806.09605

Country:

North America > United States > Michigan (0.28)
North America > United States > Texas (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceJun-14-2018

Self-Imitation Learning

Oh, Junhyuk, Guo, Yijie, Singh, Satinder, Lee, Honglak

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.

artificial intelligence, computer game, self-imitation learning, (16 more...)

1806.05635

Country: Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceApr-22-2018

Named Entities troubling your Neural Methods? Build NE-Table: A neural approach for handling Named Entities

Rajendran, Janarthanan, Ganhotra, Jatin, Guo, Xiaoxiao, Yu, Mo, Singh, Satinder

Many natural language processing tasks require dealing with Named Entities (NEs) in the texts themselves and sometimes also in external knowledge sources. While this is often easy for humans, recent neural methods that rely on learned word embeddings for NLP tasks have difficulty with it, especially with out of vocabulary or rare NEs. In this paper, we propose a new neural method for this problem, and present empirical evaluations on a structured Question-Answering task, three related Goal-Oriented dialog tasks and a reading-comprehension-based task. They show that our proposed method can be effective in dealing with both in-vocabulary and out of vocabulary (OOV) NEs. We create extended versions of dialog bAbI tasks 1,2 and 4 and Out-of-vocabulary (OOV) versions of the CBT test set which will be made publicly available online.

deep learning, neural network, representation, (22 more...)

1804.0954

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

arXiv.org Artificial IntelligenceApr-17-2018

On Learning Intrinsic Rewards for Policy Gradient Methods

Zheng, Zeyu, Oh, Junhyuk, Singh, Satinder

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et.al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

computer game, intrinsic reward, neural network, (20 more...)

1804.06459

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Neural Information Processing SystemsDec-31-2017

Value Prediction Network

Oh, Junhyuk, Singh, Satinder, Lee, Honglak

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

computer game, deep learning, vpn, (20 more...)

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Oil & Gas (0.68)
Leisure & Entertainment > Games > Computer Games (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsDec-31-2017

Repeated Inverse Reinforcement Learning

Amin, Kareem, Jiang, Nan, Singh, Satinder

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results.

agent, artificial intelligence, reinforcement learning, (16 more...)

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Workflow (0.54)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)