Goto

Collaborating Authors

 Bayesian Inference


A Bayesian Nonparametric Method for Clustering Imputation, and Forecasting in Multivariate Time Series

arXiv.org Machine Learning

This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel "temporally-coupled" extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging prediction and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating competitive imputation performance and improved forecasting accuracy as compared to several state-of-the art baselines. We also show that the model discovers interpretable clusters in datasets with hundreds of time series using macroeconomic data from the Gapminder Foundation.


Bayesian Nonparametric Poisson-Process Allocation for Time-Sequence Modeling

arXiv.org Machine Learning

Analyzing the underlying structure of multiple time-sequences provides insight into the understanding of social networks and human activities. In this work, we present the Bayesian nonparametric Poisson process allocation (BaNPPA), a generative model to automatically infer the number of latent functions in temporal data. We model the intensity of each sequence as an infinite mixture of latent functions, each of which is the square of a function drawn from a Gaussian process. A technical challenge for the inference of such mixture models is the identifiability issue between coefficients and the scale of latent functions. We propose to cope with the identifiability issue by regulating the volume of each latent function and derive a variational inference algorithm that can scale well to large-scale data sets. Our algorithm is computationally efficient and scalable to large-scale datasets. Finally, we demonstrate the usefulness of the proposed Bayesian nonparametric model through experiments on both synthetic and real-world data sets.


Telstra builds 900 machine learning models for marketing overhaul

#artificialintelligence

Telstra has used open source machine learning technology to answer the age-old question that plagues every marketer: how effective is my ad spend? The telco wields one of the biggest marketing budgets in Australia, but that doesn't stop Telstra from wanting to track the performance of every dollar spent. The company previously faced a six-month lag to get visibility into the effectiveness of its marketing spend; that is now down to five weeks using new marketing mix modelling developed in partnership with Accenture, Deakin University and Servian. The telco previously used a traditional econometric model to assess the performance of its marketing spend, pulling together 800 variables โ€“ which took two-and-a-half months to assemble โ€“ and then modelling this using regression techniques. "Six months after the marketing period had ended I could tell the CMO [chief marketing officer] and the marketers how effective their marketing was... six months ago," Telstra's director of research, insights & analytics Liz Moore told the recent Big Data & Analytics Innovation Summit in Sydney.


Robust Maximum Likelihood Estimation of Sparse Vector Error Correction Model

arXiv.org Machine Learning

In econometrics and finance, the vector error correction model (VECM) is an important time series model for cointegration analysis, which is used to estimate the long-run equilibrium variable relationships. The traditional analysis and estimation methodologies assume the underlying Gaussian distribution but, in practice, heavy-tailed data and outliers can lead to the inapplicability of these methods. In this paper, we propose a robust model estimation method based on the Cauchy distribution to tackle this issue. In addition, sparse cointegration relations are considered to realize feature selection and dimension reduction. An efficient algorithm based on the majorization-minimization (MM) method is applied to solve the proposed nonconvex problem. The performance of this algorithm is shown through numerical simulations.


Coresets for Dependency Networks

arXiv.org Machine Learning

But how can we train graphical models on a massive data set? In this paper, we show how to construct coresets--compressed data sets which can be used as proxy for the original data and have provably bounded worst case error--for Gaussian dependency networks (DNs), i.e., cyclic directed graphical models over Gaussians, where the parents of each variable are its Markov blanket. Specifically, we prove that Gaussian DNs admit coresets of size independent of the size of the data set. Unfortunately, this does not extend to DNs over members of the exponential family in general. As we will prove, Poisson DNs do not admit small coresets. Despite this worst-case result, we will provide an argument why our coreset construction for DNs can still work well in practice on count data. To corroborate our theoretical results, we empirically evaluated the resulting Core DNs on real data sets. The results demonstrate significant gains over no or naive sub-sampling, even in the case of count data.


Bayesian Decision Theory Made Ridiculously Simple ยท Statistics @ Home

@machinelearnbot

So how do we determine the "best" decision? This requires that we first define some notion of what we want (what are we trying to do?). The formal object that we use to do this goes by many names depending on the field: I will refer to it as a Loss function (\(\mathcal{L}\)) but the same general concept may be alternatively called a cost function, a utility function, an acquisition function, or any number of different things. The crucial idea is that this is a function that allows us to quantify how bad/good a given decision (\(a\)) is given some information (\(\theta\)). What does it mean to quantify?


CarsonScott/CALM

#artificialintelligence

The following is an unsupervised machine-learning algorithm that recognizes and predicts input patterns in real-time. The algorithm is essentially a system of information-processing inspired by the cognitive functions of the mind. The system has a collection of interacting memory systems akin to the spychological concepts of short-term and long-term memory, which allows it to learn by observation and adapt based on the frequency of patterns and their relationships through time. An association value is a representation of conditional probability which is calculated between the current observation and the rest of the elements in the buffer, i.e. the previous N observations, then stored in the association matrix. Each elements current place in the buffer determines the strength of the delta applied to its association value, so more recent observation are more strongly associated with the current observations than those further in the buffer.


Learning Infinite RBMs with Frank-Wolfe

arXiv.org Machine Learning

In this work, we propose an infinite restricted Boltzmann machine (RBM), whose maximum likelihood estimation (MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.


Automated Scalable Bayesian Inference via Hilbert Coresets

arXiv.org Machine Learning

The automation of posterior inference in Bayesian data analysis has enabled experts and nonexperts alike to use more sophisticated models, engage in faster exploratory modeling and analysis, and ensure experimental reproducibility. However, standard automated posterior inference algorithms are not tractable at the scale of massive modern datasets, and modifications to make them so are typically model-specific, require expert tuning, and can break theoretical guarantees on inferential quality. Building on the Bayesian coresets framework, this work instead takes advantage of data redundancy to shrink the dataset itself as a preprocessing step, providing fully-automated, scalable Bayesian inference with theoretical guarantees. We begin with an intuitive reformulation of Bayesian coreset construction as sparse vector sum approximation, and demonstrate that its automation and performance-based shortcomings arise from the use of the supremum norm. To address these shortcomings we develop Hilbert coresets, i.e., Bayesian coresets constructed under a norm induced by an inner-product on the log-likelihood function space. We propose two Hilbert coreset construction algorithms---one based on importance sampling, and one based on the Frank-Wolfe algorithm---along with theoretical guarantees on approximation quality as a function of coreset size. Since the exact computation of the proposed inner-products is model-specific, we automate the construction with a random finite-dimensional projection of the log-likelihood functions. The resulting automated coreset construction algorithm is simple to implement, and experiments on a variety of models with real and synthetic datasets show that it provides high-quality posterior approximations and a significant reduction in the computational cost of inference.


Marginal sequential Monte Carlo for doubly intractable models

arXiv.org Machine Learning

Bayesian inference for models that have an intractable partition function is known as a doubly intractable problem, where standard Monte Carlo methods are not applicable. The past decade has seen the development of auxiliary variable Monte Carlo techniques (M{\o}ller et al., 2006; Murray et al., 2006) for tackling this problem; these approaches being members of the more general class of pseudo-marginal, or exact-approximate, Monte Carlo algorithms (Andrieu and Roberts, 2009), which make use of unbiased estimates of intractable posteriors. Everitt et al. (2017) investigated the use of exact-approximate importance sampling (IS) and sequential Monte Carlo (SMC) in doubly intractable problems, but focussed only on SMC algorithms that used data-point tempering. This paper describes SMC samplers that may use alternative sequences of distributions, and describes ways in which likelihood estimates may be improved adaptively as the algorithm progresses, building on ideas from Moores et al. (2015). This approach is compared with a number of alternative algorithms for doubly intractable problems, including approximate Bayesian computation (ABC), which we show is closely related to the method of M{\o}ller et al. (2006).