Directed Networks
Context-specific independence in graphical log-linear models
Nyman, Henrik, Pensar, Johan, Koski, Timo, Corander, Jukka
Log-linear models are the popular workhorses of analyzing contingency tables. A log-linear parameterization of an interaction model can be more expressive than a direct parameterization based on probabilities, leading to a powerful way of defining restrictions derived from marginal, conditional and context-specific independence. However, parameter estimation is often simpler under a direct parameterization, provided that the model enjoys certain decomposability properties. Here we introduce a cyclical projection algorithm for obtaining maximum likelihood estimates of log-linear parameters under an arbitrary context-specific graphical log-linear model, which needs not satisfy criteria of decomposability. We illustrate that lifting the restriction of decomposability makes the models more expressive, such that additional context-specific independencies embedded in real data can be identified. It is also shown how a context-specific graphical model can correspond to a non-hierarchical log-linear parameterization with a concise interpretation. This observation can pave way to further development of non-hierarchical log-linear models, which have been largely neglected due to their believed lack of interpretability.
Ambiguity-Driven Fuzzy C-Means Clustering: How to Detect Uncertain Clustered Records
Ghaffari, Meysam, Ghadiri, Nasser
As a well-known clustering algorithm, Fuzzy C-Means (FCM) allows each input sample to belong to more than one cluster, providing more flexibility than non-fuzzy clustering methods. However, the accuracy of FCM is subject to false detections caused by noisy records, weak feature selection and low certainty of the algorithm in some cases. The false detections are very important in some decision-making application domains like network security and medical diagnosis, where weak decisions based on such false detections may lead to catastrophic outcomes. They are mainly emerged from making decisions about a subset of records that do not provide enough evidence to make a good decision. In this paper, we propose a method for detecting such ambiguous records in FCM by introducing a certainty factor to decrease invalid detections. This approach enables us to send the detected ambiguous records to another discrimination method for a deeper investigation, thus increasing the accuracy by lowering the error rate. Most of the records are still processed quickly and with low error rate which prevents performance loss compared to similar hybrid methods. Experimental results of applying the proposed method on several datasets from different domains show a significant decrease in error rate as well as improved sensitivity of the algorithm.
Bayesian Discovery of Threat Networks
Smith, Steven T., Kao, Edward K., Senne, Kenneth D., Bernstein, Garrett, Philips, Scott
A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a graph, at least one observation, and a diffusion model for threat. A link to well-known spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio-temporal relationships between vertices, and is used for a specific space-time formulation that leads to significant performance improvements on coordinated covert networks. This performance is demonstrated using a new hybrid mixed-membership blockmodel introduced to simulate random covert networks with realistic properties.
Variational Inference for Uncertainty on the Inputs of Gaussian Process Models
Damianou, Andreas C., Titsias, Michalis K., Lawrence, Neil D.
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximized over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximizing an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from iid observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the nonlinear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain inputs and semi-supervised Gaussian processes. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.
A Truncated EM Approach for Spike-and-Slab Sparse Coding
Sheikh, Abdul-Saboor, Shelton, Jacquelyn A., Lรผcke, Jรถrg
We study inference and learning based on a sparse coding model with `spike-and-slab' prior. As in standard sparse coding, the model used assumes independent latent sources that linearly combine to generate data points. However, instead of using a standard sparse prior such as a Laplace distribution, we study the application of a more flexible `spike-and-slab' distribution which models the absence or presence of a source's contribution independently of its strength if it contributes. We investigate two approaches to optimize the parameters of spike-and-slab sparse coding: a novel truncated EM approach and, for comparison, an approach based on standard factored variational distributions. The truncated approach can be regarded as a variational approach with truncated posteriors as variational distributions. In applications to source separation we find that both approaches improve the state-of-the-art in a number of standard benchmarks, which argues for the use of `spike-and-slab' priors for the corresponding data domains. Furthermore, we find that the truncated EM approach improves on the standard factored approach in source separation tasks$-$which hints to biases introduced by assuming posterior independence in the factored variational approach. Likewise, on a standard benchmark for image denoising, we find that the truncated EM approach improves on the factored variational approach. While the performance of the factored approach saturates with increasing numbers of hidden dimensions, the performance of the truncated approach improves the state-of-the-art for higher noise levels.
Crowd Labeling: a survey
Muhammadi, Jafar, Rabiee, Hamid Reza, Hosseini, Abbas
Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide are rather noisy and erroneous. This challenge is usually resolved by collecting multiple labels for each sample, and then aggregating them to estimate the true label. Although the mechanism leads to high-quality labels, it is not actually cost-effective. As a result, efforts are currently made to maximize the accuracy in estimating true labels, while fixing the number of acquired labels. This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterwards, algorithms to make inference on these models will be surveyed. Moreover, adaptive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished methods, and provides guidelines for future work required to address the current open issues.
Inference of Cancer Progression Models with Biological Noise
Korsunsky, Ilya, Ramazzotti, Daniele, Caravagna, Giulio, Mishra, Bud
Many applications in translational medicine require the understanding of how diseases progress through the accumulation of persistent events. Specialized Bayesian networks called monotonic progression networks offer a statistical framework for modeling this sort of phenomenon. Current machine learning tools to reconstruct Bayesian networks from data are powerful but not suited to progression models. We combine the technological advances in machine learning with a rigorous philosophical theory of causation to produce Polaris, a scalable algorithm for learning progression networks that accounts for causal or biological noise as well as logical relations among genetic events, making the resulting models easy to interpret qualitatively. We tested Polaris on synthetically generated data and showed that it outperforms a widely used machine learning algorithm and approaches the performance of the competing special-purpose, albeit clairvoyant algorithm that is given a priori information about the model parameters. We also prove that under certain rather mild conditions, Polaris is guaranteed to converge for sufficiently large sample sizes. Finally, we applied Polaris to point mutation and copy number variation data in Prostate cancer from The Cancer Genome Atlas (TCGA) and found that there are likely three distinct progressions, one major androgen driven progression, one major non-androgen driven progression, and one novel minor androgen driven progression.
Joint Hierarchical Gaussian Process Model with Application to Forecast in Medical Monitoring
Duan, Leo L., Clancy, John P., Szczesniak, Rhonda D.
A novel extrapolation method is proposed for longitudinal forecasting. A hierarchical Gaussian process model is used to combine nonlinear population change and individual memory of the past to make prediction. The prediction error is minimized through the hierarchical design. The method is further extended to joint modeling of continuous measurements and survival events. The baseline hazard, covariate and joint effects are conveniently modeled in this hierarchical structure. The estimation and inference are implemented in fully Bayesian framework using the objective and shrinkage priors. In simulation studies, this model shows robustness in latent estimation, correlation detection and high accuracy in forecasting. The model is illustrated with medical monitoring data from cystic fibrosis (CF) patients. Estimation and forecasts are obtained in the measurement of lung function and records of acute respiratory events. Keyword: Extrapolation, Joint Model, Longitudinal Model, Hierarchical Gaussian Process, Cystic Fibrosis, Medical Monitoring
A new integral loss function for Bayesian optimization
Vazquez, Emmanuel, Bect, Julien
We consider the problem of maximizing a real-valued continuous function $f$ using a Bayesian approach. Since the early work of Jonas Mockus and Antanas \v{Z}ilinskas in the 70's, the problem of optimization is usually formulated by considering the loss function $\max f - M_n$ (where $M_n$ denotes the best function value observed after $n$ evaluations of $f$). This loss function puts emphasis on the value of the maximum, at the expense of the location of the maximizer. In the special case of a one-step Bayes-optimal strategy, it leads to the classical Expected Improvement (EI) sampling criterion. This is a special case of a Stepwise Uncertainty Reduction (SUR) strategy, where the risk associated to a certain uncertainty measure (here, the expected loss) on the quantity of interest is minimized at each step of the algorithm. In this article, assuming that $f$ is defined over a measure space $(\mathbb{X}, \lambda)$, we propose to consider instead the integral loss function $\int_{\mathbb{X}} (f - M_n)_{+}\, d\lambda$, and we show that this leads, in the case of a Gaussian process prior, to a new numerically tractable sampling criterion that we call $\rm EI^2$ (for Expected Integrated Expected Improvement). A numerical experiment illustrates that a SUR strategy based on this new sampling criterion reduces the error on both the value and the location of the maximizer faster than the EI-based strategy.
What Regularized Auto-Encoders Learn from the Data Generating Distribution
Alain, Guillaume, Bengio, Yoshua
What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the auto-encoder captures the score (derivative of the log-density with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the auto-encoder: they show what the auto-encoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising auto-encoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate Metropolis-Hastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments.