Markov Models
Statistical Inference for Generative Models with Maximum Mean Discrepancy
Briol, Francois-Xavier, Barp, Alessandro, Duncan, Andrew B., Girolami, Mark
While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models.
A Variational Autoencoder for Probabilistic Non-Negative Matrix Factorisation
Squires, Steven, Bennett, Adam Prรผgel, Niranjan, Mahesan
We introduce and demonstrate the variational autoencoder (VAE) for probabilistic non-negative matrix factorisation (PAE-NMF). We design a network which can perform non-negative matrix factorisation (NMF) and add in aspects of a VAE to make the coefficients of the latent space probabilistic. By restricting the weights in the final layer of the network to be non-negative and using the non-negative Weibull distribution we produce a probabilistic form of NMF which allows us to generate new data and find a probability distribution that effectively links the latent and input variables. We demonstrate the effectiveness of PAE-NMF on three heterogeneous datasets: images, financial time series and genomic.
DCEF: Deep Collaborative Encoder Framework for Unsupervised Clustering
Chu, Jielei, Wang, Hongjun, Liu, Jing, Yu, Zeng, Li, Tianrui
Collaborative representation is a popular feature learning approach, which encoding process is assisted by variety types of information. In this paper, we propose a collaborative representation restricted Boltzmann Machine (CRRBM) for modeling binary data and a collaborative representation Gaussian restricted Boltzmann Machine (CRGRBM) for modeling realvalued data by applying a collaborative representation strategy in the encoding procedure. We utilize Locality Sensitive Hashing (LSH) to generate similar sample subsets of the instance and observed feature set simultaneously from input data. Hence, we can obtain some mini blocks, which come from the intersection of instance and observed feature subsets. Then we integrate Contrastive Divergence and Bregman Divergence methods with mini blocks to optimize our CRRBM and CRGRBM models. In their training process, the complex collaborative relationships between multiple instances and features are fused into the hidden layer encoding. Hence, these encodings have dual characteristics of concealment and cooperation. Here, we develop two deep collaborative encoder frameworks (DCEF) based on the CRRBM and CRGRBM models: one is a DCEF with Gaussian linear visible units (GDCEF) for modeling real-valued data, and the other is a DCEF with binary visible units (BDCEF) for modeling binary data. We explore the collaborative representation capability of the hidden features in every layer of the GDCEF and BDCEF framework, especially in the deepest hidden layer. The experimental results show that the GDCEF and BDCEF frameworks have more outstanding performances than the classic Autoencoder framework for unsupervised clustering task on the MSRA-MM2.0 and UCI datasets, respectively.
Variance-reduced $Q$-learning is minimax optimal
Markov decision processes and reinforcement learning algorithms provide a flexible framework for decision-making in dynamic settings, and have been studied for decades (e.g., [23, 27, 8, 9, 29]). Given the explosion in the amount of available data and computing power, recent years have witnessed dramatic success of reinforcement learning (RL) techniques in various application domains (e.g., [30, 19, 26, 22, 27]). In broad terms, algorithms for reinforcement learning are often separated into model-based versus model-free approaches. Model-based approaches based on directly learning a model for the dynamics of the system, and then computing optimal policies from the learned model. In contrast, a model-free approach directly targets learning of the optimal value function or policy. Naturally, a model-free approach is more robust to model mismatch; however, model-based approaches can often be more sample efficient. Providing a firm theoretical foundation to the tradeoffs intrinsic to different classes of methods, as characterized by their access to the underlying Markov decision process, is a major open question in RL.
Online Learning and Planning in Partially Observable Domains without Prior Knowledge
How an agent can act optimally in stochastic, partially observable domains is a challenge problem, the standard approach to address this issue is to learn the domain model firstly and then based on the learned model to find the (near) optimal policy. However, offline learning the model often needs to store the entire training data and cannot utilize the data generated in the planning phase. Furthermore, current research usually assumes the learned model is accurate or presupposes knowledge of the nature of the unobservable part of the world. In this paper, for systems with discrete settings, with the benefits of Predictive State Representations~(PSRs), a model-based planning approach is proposed where the learning and planning phases can both be executed online and no prior knowledge of the underlying system is required. Experimental results show compared to the state-of-the-art approaches, our algorithm achieved a high level of performance with no prior knowledge provided, along with theoretical advantages of PSRs. Source code is available at https://github.com/DMU-XMU/PSR-MCTS-Online.
15 Best Machine Learning Course in 2019 MLAIT
Below is the 15 best machine learning course to accelerate your ML journey this year. The holy grail of machine learning online course, Machine Learning by Stanford is considered as the best machine learning course by many. This course is prepared and maintained by Andrew Ng, pioneer machine learning scientist who've led ML research projects for both Google and Chinese giant Baidu. Although the course requires a paid subscription, you can ask for financial aid if you're a student. This online machine learning course from DataCamp is the best machine learning course with a primary emphasis on statistics โ the de facto requirement for effective data science projects.
Tackling Climate Change with Machine Learning
Rolnick, David, Donti, Priya L., Kaack, Lynn H., Kochanski, Kelly, Lacoste, Alexandre, Sankaran, Kris, Ross, Andrew Slavin, Milojevic-Dupont, Nikola, Jaques, Natasha, Waldman-Brown, Anna, Luccioni, Alexandra, Maharaj, Tegan, Sherwin, Evan D., Mukkavilli, S. Karthik, Kording, Konrad P., Gomes, Carla, Ng, Andrew Y., Hassabis, Demis, Platt, John C., Creutzig, Felix, Chayes, Jennifer, Bengio, Yoshua
Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.
Combining Generative and Discriminative Models for Hybrid Inference
Satorras, Victor Garcia, Akata, Zeynep, Welling, Max
A graphical model is a structured representation of the data generating process. The traditional method to reason over random variables is to perform inference in this graphical model. However, in many cases the generating process is only a poor approximation of the much more complex true data generating process, leading to suboptimal estimation. The subtleties of the generative process are however captured in the data itself and we can "learn to infer", that is, learn a direct mapping from observations to explanatory latent variables. In this work we propose a hybrid model that combines graphical inference with a learned inverse model, which we structure as in a graph neural network, while the iterative algorithm as a whole is formulated as a recurrent neural network. By using cross-validation we can automatically balance the amount of work performed by graphical inference versus learned inference. We apply our ideas to the Kalman filter, a Gaussian hidden Markov model for time sequences, and show, among other things, that our model can estimate the trajectory of a noisy chaotic Lorenz Attractor much more accurately than either the learned or graphical inference run in isolation.
Using generative modelling to produce varied intonation for speech synthesis
Hodari, Zack, Watts, Oliver, King, Simon
Unlike human speakers, typical text-to-speech (TTS) systems are unable to produce multiple distinct renditions of a given sentence. This has previously been addressed by adding explicit external control. In contrast, generative models are able to capture a distribution over multiple renditions and thus produce varied renditions using sampling. Typical neural TTS models learn the average of the data because they minimise mean squared error. In the context of prosody, taking the average produces flatter, more boring speech: an "average prosody". A generative model that can synthesise multiple prosodies will, by design, not model average prosody. We use variational autoencoders (VAE) which explicitly place the most "average" data close to the mean of the Gaussian prior. We propose that by moving towards the tails of the prior distribution, the model will transition towards generating more idiosyncratic, varied renditions. Focusing here on intonation, we investigate the trade-off between naturalness and intonation variation and find that typical acoustic models can either be natural, or varied, but not both. However, sampling from the tails of the VAE prior produces much more varied intonation than the traditional approaches, whilst maintaining the same level of naturalness.
On the Optimality of Sparse Model-Based Planning for Markov Decision Processes
Agarwal, Alekh, Kakade, Sham, Yang, Lin F.
This work considers the sample complexity of obtaining an $\epsilon$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this model, the learner accesses the underlying transition model via a sampling oracle that provides a sample of the next state, when given any state-action pair as input. In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP. We ask arguably the most basic and unresolved question in model-based planning: is the na\"ive "plug-in" approach, non-asymptotically, minimax optimal in the quality of the policy it finds, given a fixed sample size? With access to a generative model, we resolve this question in the strongest possible sense: our main result shows that \emph{any} high accuracy solution in the plug-in model constructed with $N$ samples, provides an $\epsilon$-optimal policy in the true underlying MDP. In comparison, all prior (non-asymptotically) minimax optimal results use model-free approaches, such as the Variance Reduced Q-value iteration algorithm (Sidford et al 2018), while the best known model-based results (e.g. Azar et al 2013) require larger sample sample sizes in their dependence on the planning horizon or the state space. Notably, we show that the model-based approach allows the use of \emph{any} efficient planning algorithm in the empirical MDP, which simplifies the algorithm design as this approach does not tie the algorithm to the sampling procedure. The core of our analysis is a novel "absorbing MDP" construction to address the statistical dependency issues that arise in the analysis of model-based planning approaches, a construction which may be helpful more generally.