Goto

Collaborating Authors

 Reinforcement Learning


Artificial Intelligence Top 10 Articles -- June 2018

#artificialintelligence

Build an AI that combines the power of Data Science, Machine Learning and Deep Learning to create powerful AI for Real-World applications. You will also have the chance to understand the story behind Artificial Intelligence. Completely understand the relationship between reinforcement learning and psychology and on a technical level. Apply gradient-based supervised machine learning methods to reinforcement learning and implement 17 different reinforcement learning algorithms.


Guided Tour of Machine Learning in Finance Coursera

#artificialintelligence

About this course: This course aims at providing an introductory and broad overview of the field of ML with the focus on applications on Finance. Supervised Machine Learning methods are used in the capstone project to predict bank closures. Simultaneously, while this course can be taken as a separate course, it serves as a preview of topics that are covered in more details in subsequent modules of the specialization Machine Learning and Reinforcement Learning in Finance. The goal of Guided Tour of Machine Learning in Finance is to get a sense of what Machine Learning is, what it is for and in how many different financial problems it can be applied to.


Assumed Density Filtering Q-learning

arXiv.org Artificial Intelligence

While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods using Assumed Density Filtering (ADFQ), which updates beliefs on state-action values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs provide a natural regularization for learning, and we show how ADFQ reduces in a limiting case to the traditional Q-learning algorithm. Our empirical results demonstrate that the proposed ADFQ algorithms outperform comparable algorithms on several task domains. Moreover, our algorithms are computationally more efficient than other existing approaches to Bayesian reinforcement learning.


Implicit Policy for Reinforcement Learning

arXiv.org Artificial Intelligence

We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients. We empirically show that, despite its simplicity in implementation, entropy regularization combined with a rich policy class can attain desirable properties displayed under maximum entropy reinforcement learning framework, such as robustness and multi-modality.


Distributional Advantage Actor-Critic

arXiv.org Artificial Intelligence

In traditional reinforcement learning, an agent maximizes the reward collected during its interaction with the environment by approximating the optimal policy through the estimation of value functions. Typically, given a state s and action a, the corresponding value is the expected discounted sum of rewards. The optimal action is then chosen to be the action a with the largest value estimated by value function. However, recent developments have shown both theoretical and experimental evidence of superior performance when value function is replaced with value distribution in context of deep Q learning [1]. In this paper, we develop a new algorithm that combines advantage actor-critic with value distribution estimated by quantile regression. We evaluated this new algorithm, termed Distributional Advantage Actor-Critic (DA2C or QR-A2C) on a variety of tasks, and observed it to achieve at least as good as baseline algorithms, and outperforming baseline in some tasks with smaller variance and increased stability.


Reinforcement Learning from scratch โ€“ Insight Data

#artificialintelligence

Recently, I gave a talk at the O'Reilly AI conference in Beijing about some of the interesting lessons we've learned in the world of NLP. While there, I was lucky enough to attend a tutorial on Deep Reinforcement Learning (Deep RL) from scratch by Unity Technologies. I thought that the session, led by Arthur Juliani, was extremely informative and wanted to share some big takeaways below. In our conversations with companies, we've seen a rise of interesting Deep RL applications, tools and results. In parallel, the inner workings and applications of Deep RL, such as AlphaGo pictured above, can often seem esoteric and hard to understand.


Fidelity-based Probabilistic Q-learning for Control of Quantum Systems

arXiv.org Machine Learning

The balance between exploration and exploitation is a key problem for reinforcement learning methods, especially for Q-learning. In this paper, a fidelity-based probabilistic Q-learning (FPQL) approach is presented to naturally solve this problem and applied for learning control of quantum systems. In this approach, fidelity is adopted to help direct the learning process and the probability of each action to be selected at a certain state is updated iteratively along with the learning process, which leads to a natural exploration strategy instead of a pointed one with configured parameters. A probabilistic Q-learning (PQL) algorithm is first presented to demonstrate the basic idea of probabilistic action selection. Then the FPQL algorithm is presented for learning control of quantum systems. Two examples (a spin- 1/2 system and a lamda-type atomic system) are demonstrated to test the performance of the FPQL algorithm. The results show that FPQL algorithms attain a better balance between exploration and exploitation, and can also avoid local optimal policies and accelerate the learning process.


Randomized Prior Functions for Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior' network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.


Automated Curriculum Learning by Rewarding Temporally Rare Events

arXiv.org Artificial Intelligence

Reward shaping allows reinforcement learning (RL) agents to accelerate learning by receiving additional reward signals. However, these signals can be difficult to design manually, especially for complex RL tasks. We propose a simple and general approach that determines the reward of pre-defined events by their rarity alone. Here events become less rewarding as they are experienced more often, which encourages the agent to continually explore new types of events as it learns. The adaptiveness of this reward function results in a form of automated curriculum learning that does not have to be specified by the experimenter. We demonstrate that this \emph{Rarity of Events} (RoE) approach enables the agent to succeed in challenging VizDoom scenarios without access to the extrinsic reward from the environment. Furthermore, the results demonstrate that RoE learns a more versatile policy that adapts well to critical changes in the environment. Rewarding events based on their rarity could help in many unsolved RL environments that are characterized by sparse extrinsic rewards but a plethora of known event types.


The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

arXiv.org Artificial Intelligence

Dyna is an architecture for reinforcement learning agents that interleaves planning, acting, and learning in an online setting. This architecture aims to make fuller use of limited experience to achieve better performance with fewer environmental interactions. Dyna has been well studied in problems with a tabular representation of states, and has also been extended to some settings with larger state spaces that require function approximation. However, little work has studied Dyna in environments with high-dimensional state spaces like images. In Dyna, the environment model is typically used to generate one-step transitions from selected start states. We applied one-step Dyna to several games from the Arcade Learning Environment and found that the model-based updates offered surprisingly little benefit, even with a perfect model. However, when the model was used to generate longer trajectories of simulated experience, performance improved dramatically. This observation also holds when using a model that is learned from experience; even though the learned model is flawed, it can still be used to accelerate learning.