Goto

Collaborating Authors

 Uncertainty


Nesting Probabilistic Programs

arXiv.org Machine Learning

We formalize the notion of nesting probabilistic programming queries and investigate the resulting statistical implications. We demonstrate that query nesting allows the definition of models which could not otherwise be expressed, such as those involving agents reasoning about other agents, but that existing systems take approaches that lead to inconsistent estimates. We show how to correct this by delineating possible ways one might want to nest queries and asserting the respective conditions required for convergence. We further introduce, and prove the correctness of, a new online nested Monte Carlo estimation method that makes it substantially easier to ensure these conditions are met, thereby providing a simple framework for designing statistically correct inference engines.


A particle-based variational approach to Bayesian Non-negative Matrix Factorization

arXiv.org Machine Learning

Bayesian Non-negative Matrix Factorization (NMF) is a promising approach for understanding uncertainty and structure in matrix data. However, a large volume of applied work optimizes traditional non-Bayesian NMF objectives that fail to provide a principled understanding of the non-identifiability inherent in NMF-- an issue ideally addressed by a Bayesian approach. Despite their suitability, current Bayesian NMF approaches have failed to gain popularity in an applied setting; they sacrifice flexibility in modeling for tractable computation, tend to get stuck in local modes, and require many thousands of samples for meaningful uncertainty estimates. We address these issues through a particle-based variational approach to Bayesian NMF that only requires the joint likelihood to be differentiable for tractability, uses a novel initialization technique to identify multiple modes in the posterior, and allows domain experts to inspect a `small' set of factorizations that faithfully represent the posterior. We introduce and employ a class of likelihood and prior distributions for NMF that formulate a Bayesian model using popular non-Bayesian NMF objectives. On several real datasets, we obtain better particle approximations to the Bayesian NMF posterior in less time than baselines and demonstrate the significant role that multimodality plays in NMF-related tasks.


Large-Scale Model Selection with Misspecification

arXiv.org Machine Learning

Model selection is crucial to high-dimensional learning and inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work assumes implicitly that the models are correctly specified or have fixed dimensionality. Yet both features of model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles in misspecified models originated in Lv and Liu (2014) and investigate the asymptotic expansion of Bayesian principle of model selection in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates Kullback-Leibler divergence, we suggest the high-dimensional generalized Bayesian information criterion with prior probability (HGBIC_p) for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of HGBIC_p in ultra-high dimensions under some mild regularity conditions. The advantages of our new method are supported by numerical studies.


Edward โ€“ Home

@machinelearnbot

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilistic models, ranging from classical hierarchical models on small data sets to complex deep probabilistic models on large data sets. Edward is built on TensorFlow. It enables features such as computational graphs, distributed training, CPU/GPU integration, automatic differentiation, and visualization with TensorBoard. Edward is led by Dustin Tran with guidance by David Blei.


Optimal Bipartite Network Clustering

arXiv.org Machine Learning

We consider the problem of bipartite community detection in networks, or more generally the network biclustering problem. We present a fast two-stage procedure based on spectral initialization followed by the application of a pseudo-likelihood classifier twice. Under mild regularity conditions, we establish the weak consistency of the procedure (i.e., the convergence of the misclassification rate to zero) under a general bipartite stochastic block model. We show that the procedure is optimal in the sense that it achieves the optimal convergence rate that is achievable by a biclustering oracle, adaptively over the whole class, up to constants. The optimal rate we obtain sharpens some of the existing results and generalizes others to a wide regime of average degree growth. As a special case, we recover the known exact recovery threshold in the $\log n$ regime of sparsity. To obtain the general consistency result, as part of the provable version of the algorithm, we introduce a sub-block partitioning scheme that is also computationally attractive, allowing for distributed implementation of the algorithm without sacrificing optimality. The provable version of the algorithm is derived from a general blueprint for pseudo-likelihood biclustering algorithms that employ simple EM type updates. We show the effectiveness of this general class by numerical simulations.


EEG machine learning with Higuchi fractal dimension and Sample Entropy as features for successful detection of depression

arXiv.org Machine Learning

Reliable diagnosis of depressive disorder is essential for both optimal treatment and prevention of fatal outcomes. In this study, we aimed to elucidate the effectiveness of two non-linear measures, Higuchi Fractal Dimension (HFD) and Sample Entropy (SampEn), in detecting depressive disorders when applied on EEG. HFD and SampEn of EEG signals were used as features for seven machine learning algorithms including Multilayer Perceptron, Logistic Regression, Support Vector Machines with the linear and polynomial kernel, Decision Tree, Random Forest, and Naive Bayes classifier, discriminating EEG between healthy control subjects and patients diagnosed with depression. We confirmed earlier observations that both non-linear measures can discriminate EEG signals of patients from healthy control subjects. The results suggest that good classification is possible even with a small number of principal components. Average accuracy among classifiers ranged from 90.24% to 97.56%. Among the two measures, SampEn had better performance. Using HFD and SampEn and a variety of machine learning techniques we can accurately discriminate patients diagnosed with depression vs controls which can serve as a highly sensitive, clinically relevant marker for the diagnosis of depressive disorders.


Capturing Structure Implicitly from Time-Series having Limited Data

arXiv.org Machine Learning

Scientific fields such as insider-threat detection and highway-safety planning often lack sufficient amounts of time-series data to estimate statistical models for the purpose of scientific discovery. Moreover, the available limited data are quite noisy. This presents a major challenge when estimating time-series models that are robust to overfitting and have well-calibrated uncertainty estimates. Most of the current literature in these fields involve visualizing the time-series for noticeable structure and hard coding them into pre-specified parametric functions. This approach is associated with two limitations. First, given that such trends may not be easily noticeable in small data, it is difficult to explicitly incorporate expressive structure into the models during formulation. Second, it is difficult to know $\textit{a priori}$ the most appropriate functional form to use. To address these limitations, a nonparametric Bayesian approach was proposed to implicitly capture hidden structure from time series having limited data. The proposed model, a Gaussian process with a spectral mixture kernel, precludes the need to pre-specify a functional form and hard code trends, is robust to overfitting and has well-calibrated uncertainty estimates.


Sylvester Normalizing Flows for Variational Inference

arXiv.org Machine Learning

Variational inference relies on flexible approximate posterior distributions. Normalizing flows provide a general recipe to construct flexible variational posteriors. We introduce Sylvester normalizing flows, which can be seen as a generalization of planar flows. Sylvester normalizing flows remove the well-known single-unit bottleneck from planar flows, making a single transformation much more flexible. We compare the performance of Sylvester normalizing flows against planar flows and inverse autoregressive flows and demonstrate that they compare favorably on several datasets.


Development and analysis of a Bayesian water balance model for large lake systems

arXiv.org Machine Learning

Water balance models (WBMs) are often employed to understand regional hydrologic cycles over various time scales. Most WBMs, however, are physically-based, and few employ state-of-the-art statistical methods to reconcile independent input measurement uncertainty and bias. Further, few WBMs exist for large lakes, and most large lake WBMs perform additive accounting, with minimal consideration towards input data uncertainty. Here, we introduce a framework for improving a previously developed large lake statistical water balance model (L2SWBM). Focusing on the water balances of Lakes Superior and Michigan-Huron, we demonstrate our new analytical framework, identifying L2SWBMs from 26 alternatives that adequately close the water balance of the lakes with satisfactory computation times compared with the prototype model. We expect our new framework will be used to develop water balance models for other lakes around the world.


Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models

arXiv.org Machine Learning

Learning a Bayesian network (BN) from data can be useful for decision-making or discovering causal relationships. However, traditional methods often fail in modern applications, which exhibit a larger number of observed variables than data points. The resulting uncertainty about the underlying network as well as the desire to incorporate prior information recommend a Bayesian approach to learning the BN, but the highly combinatorial structure of BNs poses a striking challenge for inference. The current state-of-the-art methods such as order MCMC are faster than previous methods but prevent the use of many natural structural priors and still have running time exponential in the maximum indegree of the true directed acyclic graph (DAG) of the BN. We here propose an alternative posterior approximation based on the observation that, if we incorporate empirical conditional independence tests, we can focus on a high-probability DAG associated with each order of the vertices. We show that our method allows the desired flexibility in prior specification, removes timing dependence on the maximum indegree and yields provably good posterior approximations; in addition, we show that it achieves superior accuracy, scalability, and sampler mixing on several datasets.