AITopics

Inspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i.e., compresses and paraphrases) to generate a concise overall summary. We use a novel sentence-level policy gradient method to bridge the non-differentiable computation between these two neural networks in a hierarchical way, while maintaining language fluency. Empirically, we achieve the new state-of-the-art on all metrics (including human evaluation) on the CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores. Moreover, by first operating at the sentence-level and then the word-level, we enable parallel decoding of our neural generative model that results in substantially faster (10-20x) inference speed as well as 4x faster training convergence than previous long-paragraph encoder-decoder models. We also demonstrate the generalization of our model on the test-only DUC-2002 dataset, where we achieve higher scores than a state-of-the-art model.

machine learning, natural language, reinforcement learning, (19 more...)

1805.1108

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Forster, Richard, Fulop, Agnes

Hierarchical clustering with deep Q-learning

The reconstruction and analyzation of high energy particle physics data is just as important as the analyzation of the structure in real world networks. In a previous study it was explored how hierarchical clustering algorithms can be combined with kt cluster algorithms to provide a more generic clusterization method. Building on that, this paper explores the possibilities to involve deep learning in the process of cluster computation, by applying reinforcement learning techniques. The result is a model, that by learning on a modest dataset of 10; 000 nodes during 70 epochs can reach 83; 77% precision in predicting the appropriate clusters.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

1805.109

Country:

North America > United States > Oregon (0.05)
North America > United States > Texas (0.05)
North America > United States > New York (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Nardelli, Nantas, Synnaeve, Gabriel, Lin, Zeming, Kohli, Pushmeet, Torr, Philip H. S., Usunier, Nicolas

Value Propagation Networks

We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. Furthermore, we show that the module enables learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1805.11199

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Tessler, Chen, Mankowitz, Daniel J., Mannor, Shie

Reward Constrained Policy Optimization

Teaching agents to perform tasks using Reinforcement Learning is no easy feat. As the goal of reinforcement learning agents is to maximize the accumulated reward, they often find loopholes and misspecifications in the reward signal which lead to unwanted behavior. To overcome this, often, regularization is employed through the technique of reward shaping - the agent is provided an additional weighted reward signal, meant to lead it towards a desired behavior. The weight is considered as a hyper-parameter and is selected through trial and error, a time consuming and computationally intensive task. In this work, we present a novel multi-timescale approach for constrained policy optimization, called, 'Reward Constrained Policy Optimization' (RCPO), which enables policy regularization without the use of reward shaping. We prove the convergence of our approach and provide empirical evidence of its ability to train constraint satisfying policies.

constraint, machine learning, reinforcement learning, (15 more...)

1805.11074

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Learning to Play General Video-Games via an Object Embedding Network

Woof, William, Chen, Ke

Deep reinforcement learning (DRL) has proven to be an effective tool for creating general video-game AI. However most current DRL video-game agents learn end-to-end from the video-output of the game, which is superfluous for many applications and creates a number of additional problems. More importantly, directly working on pixel-based raw video data is substantially distinct from what a human player does.In this paper, we present a novel method which enables DRL agents to learn directly from object information. This is obtained via use of an object embedding network (OEN) that compresses a set of object feature vectors of different lengths into a single fixed-length unified feature vector representing the current game-state and fulfills the DRL simultaneously. We evaluate our OEN-based DRL agent by comparing to several state-of-the-art approaches on a selection of games from the GVG-AI Competition. Experimental results suggest that our object-based DRL agent yields performance comparable to that of those approaches used in our comparative study.

machine learning, object-oriented architecture, reinforcement learning, (14 more...)

1803.05262

Genre: Research Report > Promising Solution (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
(3 more...)

Paul, Supratik, Osborne, Michael A., Whiteson, Shimon

Contextual Policy Optimisation

arXiv.org Artificial IntelligenceMay-27-2018

Policy gradient methods have been successfully applied to a variety of reinforcement learning tasks. However, while learning in a simulator, these methods do not utilise the opportunity to improve learning by adjusting certain environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but that are controllable in a simulator. This can lead to slow learning, or convergence to highly suboptimal policies. In this paper, we present contextual policy optimisation (CPO). The central idea is to use Bayesian optimisation to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this Bayesian optimisation practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. We apply CPO to a number of continuous control tasks of varying difficulty and show that CPO can efficiently learn policies that are robust to significant rare events, which are unlikely to be observable under random sampling but are key to learning good policies.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1805.10662

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Kreutzer, Julia, Uyheng, Joshua, Riezler, Stefan

Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

arXiv.org Machine LearningMay-27-2018

We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of a reward estimator, and the effect of the quality of reward estimates on the overall RL task. Our analysis of cardinal (5-point ratings) and ordinal (pairwise preferences) feedback shows that their intra- and inter-annotator $\alpha$-agreement is comparable. Best reliability is obtained for standardized cardinal feedback, and cardinal feedback is also easiest to learn and generalize from. Finally, improvements of over 1 BLEU can be obtained by integrating a regression-based reward estimator trained on cardinal feedback for 800 translations into RL for NMT. This shows that RL is possible even from small amounts of fairly reliable human feedback, pointing to a great potential for applications at larger scale.

machine learning, reinforcement learning, translation, (18 more...)

arXiv.org Machine Learning

1805.10627

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.68)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Doan, Thang, Mazoure, Bogdan, Lyle, Clare

GAN Q-learning

arXiv.org Machine LearningMay-27-2018

Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Q-learning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to traditional methods.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1805.04874

Country: North America (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

@machinelearnbotMay-26-2018, 02:15:26 GMT

[D] Better reinforcement learning algorithms than A3C? • r/MachineLearning

This sounds like an underspecified example. I mean, A3C and DQN/Q-learning aren't even the same in terms of off or on-policy learning. A3C has mostly been replaced by PPO, and on-policy SOTA has moved on from that to Impala/Unicorn. I'm not sure what is SOTA for off-policy learning, but Rainbow outperforms DQN and most of the DQN zoo. And progress here may be somewhat illusory, as the methodological papers have been pointing out: a lot of these tasks are not inherently difficult, there's so much variance in training runs, improvements may be to undocumented tweaks or just somewhat better hyperparameters...

artificial intelligence, machine learning, reinforcement learning, (4 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceMay-26-2018, 01:45:47 GMT

aslanides/aixijs

AIXIjs is a JavaScript demo for running General Reinforcement Learning (RL) agents in the browser. In particular, it provides a general and extensible framework for running experiments on Bayesian RL agents in general (partially observable, non-Markov, non-ergodic) environments. UPDATE (May 2017): I'll be presenting a conference paper containing a literature survey along with some experiments based on AIXIjs at IJCAI 2017, in Melbourne, Australia. The paper (to appear) is: J. S. Aslanides, Jan Leike, and Marcus Hutter. See the main site for more background, documentation, references, and demos.

experiment, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Country: Oceania > Australia > Victoria > Melbourne (0.27)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)