Directed Networks
Joint estimation of quantile planes over arbitrary predictor spaces
In spite of the recent surge of interest in quantile regression, joint estimation of linear quantile planes remains a great challenge in statistics and econometrics. We propose a novel parametrization that characterizes any collection of non-crossing quantile planes over arbitrarily shaped convex predictor domains in any dimension by means of unconstrained scalar, vector and function valued parameters. Statistical models based on this parametrization inherit a fast computation of the likelihood function, enabling penalized likelihood or Bayesian approaches to model fitting. We introduce a complete Bayesian methodology by using Gaussian process prior distributions on the function valued parameters and develop a robust and efficient Markov chain Monte Carlo parameter estimation. The resulting method is shown to offer posterior consistency under mild tail and regularity conditions. We present several illustrative examples where the new method is compared against existing approaches and is found to offer better accuracy, coverage and model fit.
Dependent Indian Buffet Process-based Sparse Nonparametric Nonnegative Matrix Factorization
Xuan, Junyu, Lu, Jie, Zhang, Guangquan, Da Xu, Richard Yi, Luo, Xiangfeng
Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices appropriate for the intended applications. The method has been widely used for unsupervised learning tasks, including recommender systems (rating matrix of users by items) and document clustering (weighting matrix of papers by keywords). However, traditional NMF methods typically assume the number of latent factors (i.e., dimensionality of the loading matrices) to be fixed. This assumption makes them inflexible for many applications. In this paper, we propose a nonparametric NMF framework to mitigate this issue by using dependent Indian Buffet Processes (dIBP). In a nutshell, we apply a correlation function for the generation of two stick weights associated with each pair of columns of loading matrices, while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two loading matrices will be column-wise (indirectly) correlated. Under this same framework, two classes of correlation function are proposed (1) using Bivariate beta distribution and (2) using Copula function. Both methods allow us to adopt our work for various applications by flexibly choosing an appropriate parameter settings. Compared with the other state-of-the art approaches in this area, such as using Gaussian Process (GP)-based dIBP, our work is seen to be much more flexible in terms of allowing the two corresponding binary matrix columns to have greater variations in their non-zero entries. Our experiments on the real-world and synthetic datasets show that three proposed models perform well on the document clustering task comparing standard NMF without predefining the dimension for the factor matrices, and the Bivariate beta distribution-based and Copula-based models have better flexibility than the GP-based model.
Analysis of Microarray Data using Artificial Intelligence Based Techniques
The bioinformatics is an interdisciplinary area of study where one of the objectives is to deal with the analysis and interpretation of large sets of data generated from various large-scale biological experiments. The example of one such large-scale biological experiment is measuring the expression levels of tens of thousands of genes simultaneously under some environmental condition. Microarray is one of the essential technologies used by the biologist to measure genome-wide expression levels of genes in a particular organism. As microarrays technologies have become more prevalent, the challenges 1 associated with collecting, managing, and analyzing the data from each experiment have essentially increased. Robust laboratory protocols, improved understanding of the complex experimental design and falling prices of commercial platforms, all these have combined to drive the field to more complex experiments, generating huge amounts of data (Brazma and Vilo, 2000).
Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Noisy Matrix Decomposition
Sedghi, Hanie, Anandkumar, Anima, Jonckheere, Edmond
We propose an efficient ADMM method with guarantees for high-dimensional problems. We provide explicit bounds for the sparse optimization problem and the noisy matrix decomposition problem. For sparse optimization, we establish that the modified ADMM method has an optimal convergence rate of $\mathcal{O}(s\log d/T)$, where $s$ is the sparsity level, $d$ is the data dimension and $T$ is the number of steps. This matches with the minimax lower bounds for sparse estimation. For matrix decomposition into sparse and low rank components, we provide the first guarantees for any online method, and prove a convergence rate of $\tilde{\mathcal{O}}((s+r)\beta^2(p) /T) + \mathcal{O}(1/p)$ for a $p\times p$ matrix, where $s$ is the sparsity level, $r$ is the rank and $\Theta(\sqrt{p})\leq \beta(p)\leq \Theta(p)$. Our guarantees match the minimax lower bound with respect to $s,r$ and $T$. In addition, we match the minimax lower bound with respect to the matrix dimension $p$, i.e. $\beta(p)=\Theta(\sqrt{p})$, for many important statistical models including the independent noise model, the linear Bayesian network and the latent Gaussian graphical model under some conditions. Our ADMM method is based on epoch-based annealing and consists of inexpensive steps which involve projections on to simple norm balls. Experiments show that for both sparse optimization and matrix decomposition problems, our algorithm outperforms the state-of-the-art methods. In particular, we reach higher accuracy with same time complexity.
Sparse Approximate Inference for Spatio-Temporal Point Process Models
Cseke, Botond, Mangion, Andrew Zammit, Heskes, Tom, Sanguinetti, Guido
Spatio-temporal point process models play a central role in the analysis of spatially distributed systems in several disciplines. Yet, scalable inference remains computa- tionally challenging both due to the high resolution modelling generally required and the analytically intractable likelihood function. Here, we exploit the sparsity structure typical of (spatially) discretised log-Gaussian Cox process models by using approximate message-passing algorithms. The proposed algorithms scale well with the state dimension and the length of the temporal horizon with moderate loss in distributional accuracy. They hence provide a flexible and faster alternative to both non-linear filtering-smoothing type algorithms and to approaches that implement the Laplace method or expectation propagation on (block) sparse latent Gaussian models. We infer the parameters of the latent Gaussian model using a structured variational Bayes approach. We demonstrate the proposed framework on simulation studies with both Gaussian and point-process observations and use it to reconstruct the conflict intensity and dynamics in Afghanistan from the WikiLeaks Afghan War Diary.
Towards Data-Driven Autonomics in Data Centers
Sรฎrbu, Alina, Babaoglu, Ozalp
Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven autonomics, where management and control are based on holistic predictive models that are built and updated using generated data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating a predictive model for node failures. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing machine state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if machines will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%. We discuss the practicality of including our predictive model as the central component of a data-driven autonomic manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available from the authors' website.
D-MFVI: Distributed Mean Field Variational Inference using Bregman ADMM
Babagholami-Mohamadabadi, Behnam, Yoon, Sejong, Pavlovic, Vladimir
Bayesian models provide a framework for probabilistic modelling of complex datasets. However, many of such models are computationally demanding especially in the presence of large datasets. On the other hand, in sensor network applications, statistical (Bayesian) parameter estimation usually needs distributed algorithms, in which both data and computation are distributed across the nodes of the network. In this paper we propose a general framework for distributed Bayesian learning using Bregman Alternating Direction Method of Multipliers (B-ADMM). We demonstrate the utility of our framework, with Mean Field Variational Bayes (MFVB) as the primitive for distributed Matrix Factorization (MF) and distributed affine structure from motion (SfM).
Learning the intensity of time events with change-points
Alaya, Mokhtar Zahdi, Gaรฏffas, Stรฉphane, Guilloux, Agathe
We consider the problem of learning the inhomogeneous intensity of a counting process, under a sparse segmentation assumption. We introduce a weighted total-variation penalization, using data-driven weights that correctly scale the penalization along the observation interval. We prove that this leads to a sharp tuning of the convex relaxation of the segmentation prior, by stating oracle inequalities with fast rates of convergence, and consistency for change-points detection. This provides first theoretical guarantees for segmentation with a convex proxy beyond the standard i.i.d signal + white noise setting. We introduce a fast algorithm to solve this convex problem. Numerical experiments illustrate our approach on simulated and on a high-frequency genomics dataset.
Classical vs. Bayesian methods for linear system identification: point estimators and confidence sets
Romeres, D., Prando, G., Pillonetto, G., Chiuso, A.
This paper compares classical parametric methods with recently developed Bayesian methods for system identification. A Full Bayes solution is considered together with one of the standard approximations based on the Empirical Bayes paradigm. Results regarding point estimators for the impulse response as well as for confidence regions are reported.
Identification of stable models via nonparametric prediction error methods
Romeres, Diego, Pillonetto, Gianluigi, Chiuso, Alessandro
A new Bayesian approach to linear system identification has been proposed in a series of recent papers. The main idea is to frame linear system identification as predictor estimation in an infinite dimensional space, with the aid of regularization/Bayesian techniques. This approach guarantees the identification of stable predictors based on the prediction error minimization. Unluckily, the stability of the predictors does not guarantee the stability of the impulse response of the system. In this paper we propose and compare various techniques to address this issue. Simulations results comparing these techniques will be provided.