Directed Networks
On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms
Trillos, Nicolas Garcia, Kaplan, Zachary, Samakhoana, Thabo, Sanz-Alonso, Daniel
A popular approach to semi-supervised learning proceeds by endowing the input data with a graph structure in order to extract geometric information and incorporate it into a Bayesian framework. We introduce new theory that gives appropriate scalings of graph parameters that provably lead to a well-defined limiting posterior as the size of the unlabeled data set grows. Furthermore, we show that these consistency results have profound algorithmic implications. When consistency holds, carefully designed graph-based Markov chain Monte Carlo algorithms are proved to have a uniform spectral gap, independent of the number of unlabeled inputs. Several numerical experiments corroborate both the statistical consistency and the algorithmic scalability established by the theory.
Finite-dimensional Gaussian approximation with linear inequality constraints
Lรณpez-Lopera, Andrรฉs F., Bachoc, Franรงois, Durrande, Nicolas, Roustant, Olivier
Introducing inequality constraints in Gaussian process (GP) models can lead to more realistic uncertainties in learning a great variety of real-world problems. We consider the finite-dimensional Gaussian approach from Maatouk and Bay (2017) which can satisfy inequality conditions everywhere (either boundedness, monotonicity or convexity). Our contributions are threefold. First, we extend their approach in order to deal with general sets of linear inequalities. Second, we explore several Markov Chain Monte Carlo (MCMC) techniques to approximate the posterior distribution. Third, we investigate theoretical and numerical properties of the constrained likelihood for covariance parameter estimation. According to experiments on both artificial and real data, our full framework together with a Hamiltonian Monte Carlo-based sampler provides efficient results on both data fitting and uncertainty quantification.
Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models
Dahlin, Johan, Schรถn, Thomas B.
This tutorial provides a gentle introduction to the particle Metropolis-Hastings (PMH) algorithm for parameter inference in nonlinear state-space models together with a software implementation in the statistical programming language R. We employ a step-by-step approach to develop an implementation of the PMH algorithm (and the particle filter within) together with the reader. This final implementation is also available as the package pmhtutorial in the CRAN repository. Throughout the tutorial, we provide some intuition as to how the algorithm operates and discuss some solutions to problems that might occur in practice. To illustrate the use of PMH, we consider parameter inference in a linear Gaussian state-space model with synthetic data and a nonlinear stochastic volatility model with real-world data.
Coding up a Neural Network classifier from scratch โ Towards Data Science โ Medium
High-level deep learning libraries such as TensorFlow, Keras, and Pytorch do a wonderful job in making the life of a deep learning practitioner easier by hiding many of the tedious inner-working details of neural networks. As great as this is for deep learning, it comes with the minor downside of leaving many new-comers with less foundational understanding to be learned elsewhere. Our goal here is to simply provide a 1 hidden-layer fully-connected neural network classifier written from scratch (no deep learning libraries) to help chip away that mysterious black-box feeling you might have with neural networks. The provided neural network classifies a dataset describing geometrical properties of kernels belonging to three classes of wheat (you can easily replace this with your own custom dataset). An L2-loss function is assumed, and a sigmoid transfer function is used on every node in the hidden and output layers.
Probabilistic Integration: A Role in Statistical Computation?
Briol, Franรงois-Xavier, Oates, Chris. J., Girolami, Mark, Osborne, Michael A., Sejdinovic, Dino
A research frontier has emerged in scientific computation, wherein numerical error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow. This paper examines the case for probabilistic numerical methods in routine statistical computation. Our focus is on numerical integration, where a probabilistic integrator is equipped with a full distribution over its output that reflects the presence of an unknown numerical error. Our main technical contribution is to establish, for the first time, rates of posterior contraction for these methods. These show that probabilistic integrators can in principle enjoy the "best of both worlds", leveraging the sampling efficiency of Monte Carlo methods whilst providing a principled route to assess the impact of numerical error on scientific conclusions. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and a computer model for an oil reservoir.
A Bayesian Nonparametric Method for Clustering Imputation, and Forecasting in Multivariate Time Series
Saad, Feras A., Mansinghka, Vikash K.
This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel "temporally-coupled" extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging prediction and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating competitive imputation performance and improved forecasting accuracy as compared to several state-of-the art baselines. We also show that the model discovers interpretable clusters in datasets with hundreds of time series using macroeconomic data from the Gapminder Foundation.
Bayesian Nonparametric Poisson-Process Allocation for Time-Sequence Modeling
Ding, Hongyi, Sato, Issei, Sugiyama, Masashi
Analyzing the underlying structure of multiple time-sequences provides insight into the understanding of social networks and human activities. In this work, we present the Bayesian nonparametric Poisson process allocation (BaNPPA), a generative model to automatically infer the number of latent functions in temporal data. We model the intensity of each sequence as an infinite mixture of latent functions, each of which is the square of a function drawn from a Gaussian process. A technical challenge for the inference of such mixture models is the identifiability issue between coefficients and the scale of latent functions. We propose to cope with the identifiability issue by regulating the volume of each latent function and derive a variational inference algorithm that can scale well to large-scale data sets. Our algorithm is computationally efficient and scalable to large-scale datasets. Finally, we demonstrate the usefulness of the proposed Bayesian nonparametric model through experiments on both synthetic and real-world data sets.
Telstra builds 900 machine learning models for marketing overhaul
Telstra has used open source machine learning technology to answer the age-old question that plagues every marketer: how effective is my ad spend? The telco wields one of the biggest marketing budgets in Australia, but that doesn't stop Telstra from wanting to track the performance of every dollar spent. The company previously faced a six-month lag to get visibility into the effectiveness of its marketing spend; that is now down to five weeks using new marketing mix modelling developed in partnership with Accenture, Deakin University and Servian. The telco previously used a traditional econometric model to assess the performance of its marketing spend, pulling together 800 variables โ which took two-and-a-half months to assemble โ and then modelling this using regression techniques. "Six months after the marketing period had ended I could tell the CMO [chief marketing officer] and the marketers how effective their marketing was... six months ago," Telstra's director of research, insights & analytics Liz Moore told the recent Big Data & Analytics Innovation Summit in Sydney.
On the challenges of learning with inference networks on sparse, high-dimensional data
Krishnan, Rahul G., Liang, Dawen, Hoffman, Matthew
We study parameter estimation in Nonlinear Factor Analysis (NFA) where the generative model is parameterized by a deep neural network. Recent work has focused on learning such models using inference (or recognition) networks; we identify a crucial problem when modeling large, sparse, high-dimensional datasets -- underfitting. We study the extent of underfitting, highlighting that its severity increases with the sparsity of the data. We propose methods to tackle it via iterative optimization inspired by stochastic variational inference \citep{hoffman2013stochastic} and improvements in the sparse data representation used for inference. The proposed techniques drastically improve the ability of these powerful models to fit sparse data, achieving state-of-the-art results on a benchmark text-count dataset and excellent results on the task of top-N recommendation.
Robust Maximum Likelihood Estimation of Sparse Vector Error Correction Model
Zhao, Ziping, Palomar, Daniel P.
In econometrics and finance, the vector error correction model (VECM) is an important time series model for cointegration analysis, which is used to estimate the long-run equilibrium variable relationships. The traditional analysis and estimation methodologies assume the underlying Gaussian distribution but, in practice, heavy-tailed data and outliers can lead to the inapplicability of these methods. In this paper, we propose a robust model estimation method based on the Cauchy distribution to tackle this issue. In addition, sparse cointegration relations are considered to realize feature selection and dimension reduction. An efficient algorithm based on the majorization-minimization (MM) method is applied to solve the proposed nonconvex problem. The performance of this algorithm is shown through numerical simulations.