Goto

Collaborating Authors

 Markov Models


Markov Chain Monte Carlo in Python – Towards Data Science

@machinelearnbot

The past few months, I encountered one term again and again in the data science world: Markov Chain Monte Carlo. In my research lab, in podcasts, in articles, every time I heard the phrase I would nod and think that sounds pretty cool with only a vague idea of what anyone was talking about. Several times I tried to learn MCMC and Bayesian inference, but every time I started reading the books, I soon gave up. Exasperated, I turned to the best method to learn any new skill: apply it to a problem. Using some of my sleep data I had been meaning to explore and a hands-on application-based book (Bayesian Methods for Hackers, available free online), I finally learned Markov Chain Monte Carlo through a real-world project.


Why is machine learning in finance so hard?

#artificialintelligence

Financial markets have been one of the earliest adopters of machine learning (ML). People have been using ML to spot patterns in the markets since 1980s. Even though ML has had enormous successes in predicting the market outcomes in the past, the recent advances in deep learning haven't helped financial market predictions much. While deep learning and other ML techniques have finally made it possible for Alexa, Google Assistant and Google Photos to work, there hasn't been much progress when it comes to stock markets. I am not a researcher.


Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

arXiv.org Machine Learning

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov Decision Process (MDP) for which an upper bound c on the span of the optimal bias function is known. For an MDP with S states, A actions and Gamma <= S possible next states, we prove a regret bound of O(c\sqrt{Gamma SAT}), which significantly improves over existing algorithms (e.g., UCRL and PSRL), whose regret scales linearly with the MDP diameter D. In fact, the optimal bias span is finite and often much smaller than D (e.g., D=infinity in non-communicating MDPs). A similar result was originally derived by Bartlett and Tewari (2009) for REGAL.C, for which no tractable algorithm is available. In this paper, we relax the optimization problem at the core of REGAL.C, we carefully analyze its properties, and we provide the first computationally efficient algorithm to solve it. Finally, we report numerical simulations supporting our theoretical findings and showing how SCAL significantly outperforms UCRL in MDPs with large diameter and small span.


SparseMAP: Differentiable Sparse Structured Inference

arXiv.org Machine Learning

Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP, a new method for sparse structured inference, together with corresponding loss functions. SparseMAP inference is able to automatically select only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structures, including implausible ones. Importantly, SparseMAP can be computed using only calls to a MAP oracle, hence it is applicable even to problems where marginal inference is intractable, such as linear assignment. Moreover, thanks to the solution sparsity, gradient backpropagation is efficient regardless of the structure. SparseMAP thus enables us to augment deep neural networks with generic and sparse structured hidden layers. Experiments in dependency parsing and natural language inference reveal competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems.


Q-learning with Nearest Neighbors

arXiv.org Machine Learning

We consider the problem of model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernels, when only a single sample path of the system is available. We focus on the classical approach of Q-learning where the goal is to learn the optimal Q-function. We propose the Nearest Neighbor Q-Learning approach that utilizes nearest neighbor regression method to learn the Q function. We provide finite sample analysis of the convergence rate using this method. In particular, we establish that the algorithm is guaranteed to output an $\epsilon$-accurate estimate of the optimal Q-function with high probability using a number of observations that depends polynomially on $\epsilon$ and the model parameters. To establish our results, we develop a robust version of stochastic approximation results; this may be of interest in its own right.


Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

arXiv.org Machine Learning

Modern reinforcement learning algorithms reach super-human performance in many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduce VaST (Variational State Tabulation), which maps an environment with a high-dimensional state space (e.g. the space of visual inputs) to an abstract tabular environment. Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities.


Stochastic quasi-Newton with adaptive step lengths for large-scale problems

arXiv.org Machine Learning

We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions.


Modeling Dynamics with Deep Transition-Learning Networks

arXiv.org Machine Learning

Markov processes, both classical and higher order, are often used to model dynamic processes, such as stock prices, molecular dynamics, and Monte Carlo methods. Previous works have shown that an autoencoder can be formulated as a specific type of Markov chain. Here, we propose a generative neural network known as a transition encoder, or transcoder, which learns such continuous-state dynamic processes. We show that the transcoder is able to learn both deterministic and stochastic dynamic processes on several systems. We explore a number of applications of the transcoder including generating unseen trajectories and examining the propensity for chaos in a dynamic system. Further, we show that the transcoder can speed up Markov Chain Monte Carlo (MCMC) sampling to a convergent distribution by training it to make several steps at a time. Finally, we show that the hidden layers of a transcoder are useful for visualization and salient feature extraction of the transition process itself.


Introduction to Learning to Trade with Reinforcement Learning

#artificialintelligence

The academic Deep Learning research community has largely stayed away from the financial markets. Maybe that's because the finance industry has a bad reputation, the problem doesn't seem interesting from a research perspective, or because data is difficult and expensive to obtain. In this post, I'm going to argue that training Reinforcement Learning agents to trade in the financial (and cryptocurrency) markets can be an extremely interesting research problem. I believe that it has not received enough attention from the research community but has the potential to push the state-of-the art of many related fields. It is quite similar to training agents for multiplayer games such as DotA, and many of the same research problems carry over. Knowing virtually nothing about trading, I have spent the past few months working on a project in this field. This is not a "price prediction using Deep Learning" post. So, if you're looking for example code and models you may be disappointed. Instead, I want to talk on a more high level about why learning to trade using Machine Learning is difficult, what some of the challenges are, and where I think Reinforcement Learning fits in. If there's enough interest in this area I may follow up with another post that includes concrete examples. I expect most readers to have no background in trading, just like I didn't, so I will start out with covering some of the basics. I'm by no means an expert, so please let me know in the comments so if you find mistakes. I will use cryptocurrencies as a running example in this post, but the same concepts apply to most of the financial markets. The reason to use cryptocurrencies is that data is free, public, and easily accessible. Anyone can sign up to trade. The barriers to trading in the financial markets are a little higher, and data can be expensive.


Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces

arXiv.org Machine Learning

In spoken dialogue systems, we aim to deploy artificial intelligence to build automated dialogue agents that can converse with humans. A part of this effort is the policy optimisation task, which attempts to find a policy describing how to respond to humans, in the form of a function taking the current state of the dialogue and returning the response of the system. In this paper, we investigate deep reinforcement learning approaches to solve this problem. Particular attention is given to actor-critic methods, off-policy reinforcement learning with experience replay, and various methods aimed at reducing the bias and variance of estimators. When combined, these methods result in the previously proposed ACER algorithm that gave competitive results in gaming environments. These environments however are fully observable and have a relatively small action set so in this paper we examine the application of ACER to dialogue policy optimisation. We show that this method beats the current state-of-the-art in deep learning approaches for spoken dialogue systems. This not only leads to a more sample efficient algorithm that can train faster, but also allows us to apply the algorithm in more difficult environments than before. We thus experiment with learning in a very large action space, which has two orders of magnitude more actions than previously considered. We find that ACER trains significantly faster than the current state-of-the-art.