Goto

Collaborating Authors

 Reinforcement Learning


Transfer Reward Learning for Policy Gradient-Based Text Generation

arXiv.org Machine Learning

Task-specific scores are often used to optimize for and evaluate the performance of conditional text generation systems. However, such scores are non-differentiable and cannot be used in the standard supervised learning paradigm. Hence, policy gradient methods are used since the gradient can be computed without requiring a differentiable objective. However, we argue that current n-gram overlap based measures that are used as rewards can be improved by using model-based rewards transferred from tasks that directly compare the similarity of sentence pairs. These reward models either output a score of sentence-level syntactic and semantic similarity between entire predicted and target sentences as the expected return, or for intermediate phrases as segmented accumulative rewards. We demonstrate that using a \textit{Transferable Reward Learner} leads to improved results on semantical evaluation measures in policy-gradient models for image captioning tasks. Our InferSent actor-critic model improves over a BLEU trained actor-critic model on MSCOCO when evaluated on a Word Mover's Distance similarity measure by 6.97 points, also improving on a Sliding Window Cosine Similarity measure by 10.48 points. Similar performance improvements are also obtained on the smaller Flickr-30k dataset, demonstrating the general applicability of the proposed transfer learning method.


Neural Architecture Search in Embedding Space

arXiv.org Machine Learning

The neural architecture search (NAS) algorithm with reinforcement learning can be a powerful and novel framework for the automatic discovering process of neural architectures. However, its application is restricted by noncontinuous and high-dimensional search spaces, which result in difficulty in optimization. To resolve these problems, we proposed NAS in embedding space (NASES), which is a novel framework. Unlike other NAS with reinforcement learning approaches that search over a discrete and high-dimensional architecture space, this approach enables reinforcement learning to search in an embedding space by using architecture encoders and decoders. The current experiment demonstrated that the performance of the final architecture network using the NASES procedure is comparable with that of other popular NAS approaches for the image classification task on CIFAR-10. The beneficial-performance and effectiveness of NASES was impressive even when only the architecture-embedding searching and pre-training controller were applied without other NAS tricks such as parameter sharing. Specifically, considerable reduction in searches was achieved by reducing the average number of searching to 100 architectures to achieve a final architecture for the NASES procedure. Introduction Deep neural networks have enabled advances in image recognition, sequential pattern recognition, recommendation systems, and various tasks in the past decades.


$\sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

arXiv.org Machine Learning

In this paper, we consider the problem of online learning of Markov decision processes (MDPs) with very large state spaces. Under the assumptions of realizable function approximation and low Bellman ranks, we develop an online learning algorithm that learns the optimal value function while at the same time achieving very low cumulative regret during the learning process. Our learning algorithm, Adaptive Value-function Elimination (AVE), is inspired by the policy elimination algorithm proposed in (Jiang et al., 2017), known as OLIVE. One of our key technical contributions in AVE is to formulate the elimination steps in OLIVE as contextual bandit problems. This technique enables us to apply the active elimination and expert weighting methods from (Dudik et al., 2011), instead of the random action exploration scheme used in the original OLIVE algorithm, for more efficient exploration and better control of the regret incurred in each policy elimination step. To the best of our knowledge, this is the first $\sqrt{n}$-regret result for reinforcement learning in stochastic MDPs with general value function approximation.


Self-driving scale car trained by Deep reinforcement Learning

arXiv.org Artificial Intelligence

This paper considers the problem of self-driving algorithm based on deep learning. This is a hot topic because self-driving is the most important application field of artificial intelligence. Existing work focused on deep learning which has the ability to learn end-to-end self-driving control directly from raw sensory data, but this method is just a mapping between images and driving. We prefer deep reinforcement learning to train a self-driving car in a virtual simulation environment created by Unity and then migrate to reality. Deep reinforcement learning makes the machine own the driving descision-making ability like human. The virtual to realistic training method can efficiently handle the problem that reinforcement learning requires reward from the environment which probably cause cars damge. We have derived a theoretical model and analysis on how to use Deep Q-learning to control a car to drive. We have carried out simulations in the Unity virtual environment for evaluating the performance. Finally, we successfully migrate te model to the real world and realize self-driving.


Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more: Maxim Lapan: 9781788834247: Amazon.com: Books

#artificialintelligence

When I started learning RL three years ago, it was really hard to get practical information about the methods and ways that they could be implemented. Sparse blog posts about individual methods and theoretical papers, without code examples, were the only source of knowledge. To get something to experiment with, lots of time and effort was needed, fighting with weird bugs and misunderstanding mystic math in papers. With the rising popularity of RL, the situation has improved slightly, but, still, there is a lack of structured overview of the modern deep RL methods with a unified code base. This book fills the gap between theory and practice, providing a structured overview of recent RL methods, using clear examples written in uniform style.


DataWorkshop Club Conf 2019 Machine Learning Conference Europe

#artificialintelligence

Recent years have seen a rising interest in developing AI algorithms for real world big data domains ranging from autonomous cars to personalized assistants. At the core of these algorithms are architectures that combine deep neural networks, for approximating the underlying multidimensional state-spaces, with reinforcement learning, for controlling agents that learn to operate in said state-spaces towards achieving a given objective. The talk will first outline notable past and future efforts in deep reinforcement learning as well as identify fundamental problems that this technology has been struggling to overcome. Towards mitigating these problems (and open up an alternative path to general artificial intelligence), I will then summarize a brain computing model of intelligence, rooted in the latest findings in neuroscience. The talk will conclude with an overview of the recent research efforts in the field of multi-agent systems, to provide the future teams of humans and agents with the necessary tools that allow them to safely co-exist.



What Are Major Reinforcement Learning Achievements & Papers From 2018?

#artificialintelligence

At a 2017 O'Reilly AI conference, Andrew Ng ranked reinforcement learning dead last in terms of its utility for business applications. Compared to other machine learning methods like supervised learning, transfer learning, and even unsupervised learning, deep reinforcement learning (RL) is incredibly data hungry, often unstable, and rarely the best option in terms of performance. RL has historically been successfully applied only in arenas where mountains of simulated data can be generated on demand, such as games and robotics. Despite RL's limitations in solving business use cases, some AI experts believe this approach is the most viable strategy for achieving human or superhuman Artificial General Intelligence (AGI). The recent victory of DeepMind's AlphaStar over top-ranked professional StarCraft players suggests we might be on the cusp of applying deep RL to real world problems with real-time demands, extraordinary complexity, and incomplete information.


Mature GAIL: Imitation Learning for Low-level and High-dimensional Input using Global Encoder and Cost Transformation

arXiv.org Machine Learning

Recently, GAIL framework and various variants have shown remarkable possibilities for solving practical MDP problems. However, detailed researches of low-level, and high-dimensional state input in this framework, such as image sequences, has not been conducted. Furthermore, the cost function learned in the traditional GAIL frame-work only lies on a negative range, acting as a non-penalized reward and making the agent difficult to learn the optimal policy. In this paper, we propose a new algorithm based on the GAIL framework that includes a global encoder and the reward penalization mechanism. The global encoder solves two issues that arise when applying GAIL framework to high-dimensional image state. Also, it is shown that the penalization mechanism provides more adequate reward to the agent, resulting in stable performance improvement. Our approach's potential can be backed up by the fact that it is generally applicable to variants of GAIL framework. We conducted in-depth experiments by applying our methods to various variants of the GAIL framework. And, the results proved that our method significantly improves the performances when it comes to low-level and high-dimensional tasks.


Automatic Financial Trading Agent for Low-risk Portfolio Management using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

The autonomous trading agent is one of the most actively studied areas of artificial intelligence to solve the capital market portfolio management problem. The two primary goals of the portfolio management problem are maximizing profit and restrainting risk. However, most approaches to this problem solely take account of maximizing returns. Therefore, this paper proposes a deep reinforcement learning based trading agent that can manage the portfolio considering not only profit maximization but also risk restraint. We also propose a new target policy to allow the trading agent to learn to prefer low-risk actions. The new target policy can be reflected in the update by adjusting the greediness for the optimal action through the hyper parameter. The proposed trading agent verifies the performance through the data of the cryptocurrency market. The Cryptocurrency market is the best test-ground for testing our trading agents because of the huge amount of data accumulated every minute and the market volatility is extremely large. As a experimental result, during the test period, our agents achieved a return of 1800% and provided the least risky investment strategy among the existing methods. And, another experiment shows that the agent can maintain robust generalized performance even if market volatility is large or training period is short.