Statistical Learning
Learning population and subject-specific brain connectivity networks via Mixed Neighborhood Selection
Monti, Ricardo Pio, Anagnostopoulos, Christoforos, Montana, Giovanni
At the forefront of neuroscientific research is the study of functional connectivity; defined as the statistical dependencies across spatially remote brain regions [Friston, 1994, 2011]. While traditional neuroimaging studies focused on the roles of specific brain regions, there has recently been a significant shift towards understanding the connectivity across regions [Smith, 2012]. This shift has been partially catalyzed by recent advances in imaging techniques. In particular, the introduction of functional MRI (fMRI) has played a crucial role by providing a noninvasive mechanism through which to obtain whole-brain coverage of neuronal activity [Huettel, Song and McCarthy, 2004, Poldrack, Mumford and Nichols, 2011]. The focus of this work involves estimating functional connectivity networks from fMRI data, however the methodology presented can also be used in conjunction with other imaging modalities. From a statistical perspective, Gaussian Graphical models (GGMs) are often employed to model functional connectivity [Smith et al., 2011, Varoquaux and Craddock, 2013]. In this manner, undirected connectivity networks can be inferred by studying the conditional independence structures across brain regions [Lauritzen, 1996].
Holographic Embeddings of Knowledge Graphs
Nickel, Maximilian, Rosasco, Lorenzo, Poggio, Tomaso
Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator HolE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. In extensive experiments we show that holographic embeddings are able to outperform state-of-the-art methods for link prediction in knowledge graphs and relational learning benchmark datasets.
Nonparametric Reduced-Rank Regression for Multi-SNP, Multi-Trait Association Mapping
Valente, Ashlee, Ginsburg, Geoffrey, Engelhardt, Barbara E
Genome-wide association studies have proven to be essential for understanding the genetic basis of disease. However, many complex traits---personality traits, facial features, disease subtyping---are inherently high-dimensional, impeding simple approaches to association mapping. We developed a nonparametric Bayesian reduced rank regression model for multi-SNP, multi-trait association mapping that does not require the rank of the linear subspace to be specified. We show in simulations and real data that our model shares strength over SNPs and over correlated traits, improving statistical power to identify genetic associations with an interpretable, SNP-supervised low-dimensional linear projection of the high-dimensional phenotype. On the HapMap phase 3 gene expression QTL study data, we identify pleiotropic expression QTLs that classical univariate tests are underpowered to find and that two step approaches cannot recover. Our Python software, BERRRI, is publicly available at GitHub: https://github.com/ashlee1031/BERRRI.
Explaining reviews and ratings with PACO: Poisson Additive Co-Clustering
Wu, Chao-Yuan, Beutel, Alex, Ahmed, Amr, Smola, Alexander J.
Understanding a user's motivations provides valuable information beyond the ability to recommend items. Quite often this can be accomplished by perusing both ratings and review texts, since it is the latter where the reasoning for specific preferences is explicitly expressed. Unfortunately matrix factorization approaches to recommendation result in large, complex models that are difficult to interpret and give recommendations that are hard to clearly explain to users. In contrast, in this paper, we attack this problem through succinct additive co-clustering. We devise a novel Bayesian technique for summing co-clusterings of Poisson distributions. With this novel technique we propose a new Bayesian model for joint collaborative filtering of ratings and text reviews through a sum of simple co-clusterings. The simple structure of our model yields easily interpretable recommendations. Even with a simple, succinct structure, our model outperforms competitors in terms of predicting ratings with reviews.
Iteratively reweighted adaptive lasso for conditional heteroscedastic time series with applications to AR-ARCH type processes
Shrinkage algorithms are of great importance in almost every area of statistics due to the increasing impact of big data. Especially time series analysis benefits from efficient and rapid estimation techniques such as the lasso. However, currently lasso type estimators for autoregressive time series models still focus on models with homoscedastic residuals. Therefore, an iteratively reweighted adaptive lasso algorithm for the estimation of time series models under conditional heteroscedasticity is presented in a high-dimensional setting. The asymptotic behaviour of the resulting estimator is analysed. It is found that the proposed estimation procedure performs substantially better than its homoscedastic counterpart. A special case of the algorithm is suitable to compute the estimated multivariate AR-ARCH type models efficiently. Extensions to the model like periodic AR-ARCH, threshold AR-ARCH or ARMA-GARCH are discussed. Finally, different simulation results and applications to electricity market data and returns of metal prices are shown.
Regularized EM Algorithms: A Unified Framework and Statistical Guarantees
Yi, Xinyang, Caramanis, Constantine
Latent variable models are a fundamental modeling tool in machine learning applications, but they present significant computational and analytical challenges. The popular EM algorithm and its variants, is a much used algorithmic tool; yet our rigorous understanding of its performance is highly incomplete. Recently, work in Balakrishnan et al. (2014) has demonstrated that for an important class of problems, EM exhibits linear local convergence. In the high-dimensional setting, however, the M-step may not be well defined. We address precisely this setting through a unified treatment using regularization. While regularization for high-dimensional problems is by now well understood, the iterative EM algorithm requires a careful balancing of making progress towards the solution while identifying the right structure (e.g., sparsity or low-rank). In particular, regularizing the M-step using the state-of-the-art high-dimensional prescriptions (e.g., ร la Wainwright (2014)) is not guaranteed to provide this balance. Our algorithm and analysis are linked in a way that reveals the balance between optimization and statistical errors. We specialize our general framework to sparse gaussian mixture models, high-dimensional mixed regression, and regression with missing variables, obtaining statistical guarantees for each of these examples.
Feature Selection for Ridge Regression with Provable Guarantees
Paul, Saurabh, Drineas, Petros
We introduce single-set spectral sparsification as a deterministic sampling based feature selection technique for regularized least squares classification, which is the classification analogue to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world datasets, namely a subset of TechTC-300 datasets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Stochastic Expectation Propagation for Large Scale Gaussian Process Classification
Hernรกndez-Lobato, Daniel, Hernรกndez-Lobato, Josรฉ Miguel, Li, Yingzhen, Bui, Thang, Turner, Richard E.
A method for large scale Gaussian process classification has been recently proposed based on expectation propagation (EP). Such a method allows Gaussian process classifiers to be trained on very large datasets that were out of the reach of previous deployments of EP and has been shown to be competitive with related techniques based on stochastic variational inference. Nevertheless, the memory resources required scale linearly with the dataset size, unlike in variational methods. This is a severe limitation when the number of instances is very large. Here we show that this problem is avoided when stochastic EP is used to train the model.
Necessary and Sufficient Conditions and a Provably Efficient Algorithm for Separable Topic Discovery
Ding, Weicong, Ishwar, Prakash, Saligrama, Venkatesh
We develop necessary and sufficient conditions and a novel provably consistent and efficient algorithm for discovering topics (latent factors) from observations (documents) that are realized from a probabilistic mixture of shared latent factors that have certain properties. Our focus is on the class of topic models in which each shared latent factor contains a novel word that is unique to that factor, a property that has come to be known as separability. Our algorithm is based on the key insight that the novel words correspond to the extreme points of the convex hull formed by the row-vectors of a suitably normalized word co-occurrence matrix. We leverage this geometric insight to establish polynomial computation and sample complexity bounds based on a few isotropic random projections of the rows of the normalized word co-occurrence matrix. Our proposed random-projections-based algorithm is naturally amenable to an efficient distributed implementation and is attractive for modern web-scale distributed data mining applications.
Learning with Group Invariant Features: A Kernel Perspective
Mroueh, Youssef, Voinea, Stephen, Poggio, Tomaso
We analyze in this paper a random feature map based on a theory of invariance I-theory introduced recently. More specifically, a group invariant signal signature is obtained through cumulative distributions of group transformed random projections. Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar integration kernel that is invariant to the specified group action. We show how this non-linear random feature map approximates this group invariant kernel uniformly on a set of $N$ points. Moreover, we show that it defines a function space that is dense in the equivalent Invariant Reproducing Kernel Hilbert Space. Finally, we quantify error rates of the convergence of the empirical risk minimization, as well as the reduction in the sample complexity of a learning algorithm using such an invariant representation for signal classification, in a classical supervised learning setting.