Markov Models
Relativistic Monte Carlo
Lu, Xiaoyu, Perrone, Valerio, Hasenclever, Leonard, Teh, Yee Whye, Vollmer, Sebastian J.
Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune well. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.
Clustering Time Series and the Surprising Robustness of HMMs
Suppose that we are given a time series where consecutive samples are believed to come from a probabilistic source, that the source changes from time to time and that the total number of sources is fixed. Our objective is to estimate the distributions of the sources. A standard approach to this problem is to model the data as a hidden Markov model (HMM). However, since the data often lacks the Markov or the stationarity properties of an HMM, one can ask whether this approach is still suitable or perhaps another approach is required. In this paper we show that a maximum likelihood HMM estimator can be used to approximate the source distributions in a much larger class of models than HMMs. Specifically, we propose a natural and fairly general non-stationary model of the data, where the only restriction is that the sources do not change too often. Our main result shows that for this model, a maximum-likelihood HMM estimator produces the correct second moment of the data, and the results can be extended to higher moments.
Policy Networks with Two-Stage Training for Dialogue Systems
Fatemi, Mehdi, Asri, Layla El, Schulz, Hannes, He, Jing, Suleman, Kaheer
In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and domain expertise. In order to remove the need to define such summary spaces, we show that deep RL can also be trained efficiently on the original state and action spaces. Dialogue systems based on partially observable Markov decision processes are known to require many dialogues to train, which makes them unappealing for practical deployment. We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actor-critic deep learner is considerably boot-strapped from a combination of supervised and batch RL. In addition, convergence to an optimal policy is significantly sped up compared to other deep RL methods initialized on the data with batch RL. All experiments are performed on a restaurant domain derived from the Dialogue State Tracking Challenge 2 (DSTC2) dataset.
Nonparametric risk bounds for time-series forecasting
McDonald, Daniel J., Shalizi, Cosma Rohilla, Schervish, Mark
Generalization error bounds are probabilistically valid, non-asymptotic tools for characterizing the predictive ability of forecasting models. This methodology is fundamentally about choosing particular prediction functions out of some class of plausible alternatives so that, with high reliability, the resulting predictions will be nearly as accurate as possible ("probably approximately correct"). While many of these results are aimed at classification problems with independent and identically distributed (i.i.d.) data, this paper adapts and extends these methods to time-series models, so that economic and financial forecasting techniques can be evaluated rigorously. In particular, these methods control the expected accuracy of future predictions from mis-specified models based on finite samples. This allows for immediate model comparisons which neither appeal to asymptotics nor make strong assumptions about the data-generating process, in stark contrast to such popular model-selection tools as AIC.
Google's DeepMind has learnt how to talk like a human
Anyone that might be concerned about computers taking over look away now, because they are a step closer to sounding just like humans. Researchers in the UK at Google's DeepMind unit have been working on making computer-generated speech sound as "natural" as humans. The technology, called WaveNet, which is focused on the area of speech synthesis, or text-to-speech, was found to sound more natural than any of Google's products. However, this was only achieved after the WaveNet artificial neural network was trained to produce English and Chinese speech which required copious amounts of computing power, so the technology probably won't be hitting the mainstream any time soon. Using a convolutional neural network, which is used for artificial intelligence in deep learning, it is trained on data and then the systems make inferences about new data, in addition to being used to generate new data.
PyData Carolinas 2016 Presentation: Deep Finch? A Continued Comparison of Machine Learning Models to Label Birdsong Syllables
Songbirds provide a model system that neuroscientists use to understand how the brain learns and controls speech and similar skills. Much like infants learning to speak from their parents, songbirds learn their song from a tutor and practice it millions of times before reaching maturity. Also like humans, songbirds have evolved special brain regions for learning and producing their vocalizations. These newly-evolved brain regions in songbirds, known as the song system, are found within broader brain areas shared by birds and humans across evolution. So by studying how the song system works, we can learn about our own brains.
Learning Boltzmann Machine with EM-like Method
We propose an expectation-maximization-like(EMlike) method to train Boltzmann machine with unconstrained connectivity. It adopts Monte Carlo approximation in the E-step, and replaces the intractable likelihood objective with efficiently computed objectives or directly approximates the gradient of likelihood objective in the M-step. The EM-like method is a modification of alternating minimization. We prove that EM-like method will be the exactly same with contrastive divergence in restricted Boltzmann machine if the M-step of this method adopts special approximation. We also propose a new measure to assess the performance of Boltzmann machine as generative models of data, and its computational complexity is O(Rmn). Finally, we demonstrate the performance of EM-like method using numerical experiments.
Difference of Convex Functions Programming Applied to Control with Expert Data
Piot, Bilal, Geist, Matthieu, Pietquin, Olivier
This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets.
The 7 Best Data Science and Machine Learning Podcasts
This column is by Matt Fogel, Co-Founder, Fuzzy.io Data science and machine learning have long been interests of mine, but now that I'm working on Fuzzy.ai I need to keep on top of all the news in both fields. My preferred way to do this is through listening to podcasts. I've listened to a bunch of machine learning and data science podcasts in the last few months, so I thought I'd share my favorites: Every other week, they release a 10–15 minute episode where hosts, Kyle and Linda Polich give a short primer on topics like k-means clustering, natural language processing and decision tree learning, often using analogies related to their pet parrot, Yoshi.
Towards Bayesian Deep Learning: A Framework and Some Existing Methods
While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This paper proposes a general framework for Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this paper, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.