Uncertainty
Forecasting of commercial sales with large scale Gaussian Processes
Rivera, Rodrigo, Burnaev, Evgeny
This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.
Statistical inference on random dot product graphs: a survey
Athreya, Avanti, Fishkind, Donniell E., Levin, Keith, Lyzinski, Vince, Park, Youngser, Qin, Yichen, Sussman, Daniel L., Tang, Minh, Vogelstein, Joshua T., Priebe, Carey E.
The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the analogues, in graph inference, of several canonical tenets of classical Euclidean inference: in particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference.
The Truth About Bayesian Priors and Overfitting
Have you ever thought about how strong a prior is compared to observed data? In order to alleviate this trouble I will take you through some simulation exercises. These are meant as a fruit for thought and not necessarily a recommendation. However, many of the considerations we will run through will be directly applicable to your everyday life of applying Bayesian methods to your specific domain. We will start out by creating some data generated from a known process.
Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems
Sรคrkkรค, Simo, รlvarez, Mauricio A., Lawrence, Neil D.
Abstract--This paper is concerned with estimation and stochastic control in physical systems which contain unknown input signals or forces. These unknown signals are modeled as Gaussian processes (GP) in the sense that GP models are used in machine learning. The resulting latent force models (LFMs) can be seen as hybrid models that contain a first-principles physical model part and a nonparametric GP model part. The aim of this paper is to collect and extend the statistical inference and learning methods for this kind of models, provide new theoretical results for the models, and to extend the methodology and theory to stochastic control of LFMs. The generalizations of this kind of models to arbitrary differential equations are called latent force models (LFM) [2]-[6] in machine learning literature. In addition to learning problem on the LFMs, we also consider the problem of controlling the LFM using the control functionc(t) . In particular, we consider the problem of optimal stochastic control design for LFMs. The present problem is also closely related to so called input estimation problem that has previously been addressed in target tracking literature (e.g. Simo S arkk a is with the Department of Electrical Engineering and Automation (EEA), Aalto University, Rakentajanaukio 2c, 02150 Espoo, Finland (simo.sarkka@aalto.fi). The difference is that here is no concept of time in this equation, nor a possibility for controlling the equation. A. General problem formulation The models considered in this article can be seen to belong to the following three classes: 1) Basic latent force models which are ordinary differential equations (ODEs) driven by Gaussian input processes u (t) and control inputsc(t) . X, MONTH 20XX 2 2) We also consider are dynamic partial and pseudo differential equation (PDE) based models that can generally be written in form L f (x,t) u (x,t) c(x,t), (7) where L is a linear operator in space and time. The input Gaussian processu (x,t) and control inputc(x,t) are also space-time processes. Typically, the operator has the form L A m d m dt m ยทยทยท A 1 d dt A 0, (8) where A 0,...,A m are some spatial partial differential or pseudo-differential operators. This kind of models can often be also written in form of spatiotemporal state-space models f (x,t) t A f f (x,t) B f u (x,t) M f c (x,t), (9) which again is strictly more general than the model (8). For this kind of models there is no control problem per se, because there is no time dependence. These models do not naturally allow for a state-space representation either.
Mixtures and products in two graphical models
We compare two statistical models of three binary random variables. One is a mixture model and the other is a product of mixtures model called a restricted Boltzmann machine. Although the two models we study look different from their parametrizations, we show that they represent the same set of distributions on the interior of the probability simplex, and are equal up to closure. We give a semi-algebraic description of the model in terms of six binomial inequalities and obtain closed form expressions for the maximum likelihood estimates. We briefly discuss extensions to larger models.
Dependence Modeling in Ultra High Dimensions with Vine Copulas and the Graphical Lasso
Mรผller, Dominik, Czado, Claudia
To model high dimensional data, Gaussian methods are widely used since they remain tractable and yield parsimonious models by imposing strong assumptions on the data. Vine copulas are more flexible by combining arbitrary marginal distributions and (conditional) bivariate copulas. Yet, this adaptability is accompanied by sharply increasing computational effort as the dimension increases. The approach proposed in this paper overcomes this burden and makes the first step into ultra high dimensional non-Gaussian dependence modeling by using a divide-and-conquer approach. First, we apply Gaussian methods to split datasets into feasibly small subsets and second, apply parsimonious and flexible vine copulas thereon. Finally, we reconcile them into one joint model. We provide numerical results demonstrating the feasibility of our approach in moderate dimensions and showcase its ability to estimate ultra high dimensional non-Gaussian dependence models in thousands of dimensions.
AI โ The Present in the Making - Dataconomy
For many people, the concept of Artificial Intelligence (AI) is a thing of the future. It is the technology that is yet to be introduced. But Professor Jon Oberlander disagrees. He was quick to point out that AI is not in the future, it is now in the making. He began by mentioning Alexa, Amazon's star product. It's an artificial intelligent personal assistant, which was made popular by Amazon Echo devices.
Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks
Wang, Yingfei, Wang, Chu, Powell, Warren
We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success in either offline (training) or online (testing) phases. Our problem is motivated by real-world applications where observations are time-consuming and/or expensive. We develop a knowledge gradient policy using an online Bayesian linear classifier to guide the experiment by maximizing the expected value of information of labeling each alternative. We provide a finite-time analysis of the estimated error and show that the maximum likelihood estimator based produced by the KG policy is consistent and asymptotically normal. We also show that the knowledge gradient policy is asymptotically optimal in an offline setting. This work further extends the knowledge gradient to the setting of contextual bandits. We report the results of a series of experiments that demonstrate its efficiency.
Measuring Sample Quality with Kernels
Gorham, Jackson, Mackey, Lester
Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing kernels to define a closed-form kernel Stein discrepancy (KSD) computable by summing kernel evaluations across pairs of sample points. We develop a theory of weak convergence for KSDs based on Stein's method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions. The resulting convergence-determining KSDs are suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies. We use our tools to compare biased samplers, select sampler hyperparameters, and improve upon existing KSD approaches to one-sample hypothesis testing and sample quality improvement.
High-Dimensional Dependency Structure Learning for Physical Processes
Golmohammadi, Jamal, Ebert-Uphoff, Imme, He, Sijie, Deng, Yi, Banerjee, Arindam
In this paper, we consider the use of structure learning methods for probabilistic graphical models to identify statistical dependencies in high-dimensional physical processes. Such processes are often synthetically characterized using PDEs (partial differential equations) and are observed in a variety of natural phenomena, including geoscience data capturing atmospheric and hydrological phenomena. Classical structure learning approaches such as the PC algorithm and variants are challenging to apply due to their high computational and sample requirements. Modern approaches, often based on sparse regression and variants, do come with finite sample guarantees, but are usually highly sensitive to the choice of hyper-parameters, e.g., parameter $\lambda$ for sparsity inducing constraint or regularization. In this paper, we present ACLIME-ADMM, an efficient two-step algorithm for adaptive structure learning, which estimates an edge specific parameter $\lambda_{ij}$ in the first step, and uses these parameters to learn the structure in the second step. Both steps of our algorithm use (inexact) ADMM to solve suitable linear programs, and all iterations can be done in closed form in an efficient block parallel manner. We compare ACLIME-ADMM with baselines on both synthetic data simulated by partial differential equations (PDEs) that model advection-diffusion processes, and real data (50 years) of daily global geopotential heights to study information flow in the atmosphere. ACLIME-ADMM is shown to be efficient, stable, and competitive, usually better than the baselines especially on difficult problems. On real data, ACLIME-ADMM recovers the underlying structure of global atmospheric circulation, including switches in wind directions at the equator and tropics entirely from the data.