Performance Analysis
Nonparametric mixture of Gaussian graphical models
Graphical model has been widely used to investigate the complex dependence structure of high-dimensional data, and it is common to assume that observed data follow a homogeneous graphical model. However, observations usually come from different resources and have heterogeneous hidden commonality in real-world applications. Thus, it is of great importance to estimate heterogeneous dependencies and discover subpopulation with certain commonality across the whole population. In this work, we introduce a novel regularized estimation scheme for learning nonparametric mixture of Gaussian graphical models, which extends the methodology and applicability of Gaussian graphical models and mixture models. We propose a unified penalized likelihood approach to effectively estimate nonparametric functional parameters and heterogeneous graphical parameters. We further design an efficient generalized effective EM algorithm to address three significant challenges: high-dimensionality, non-convexity, and label switching. Theoretically, we study both the algorithmic convergence of our proposed algorithm and the asymptotic properties of our proposed estimators. Numerically, we demonstrate the performance of our method in simulation studies and a real application to estimate human brain functional connectivity from ADHD imaging data, where two heterogeneous conditional dependencies are explained through profiling demographic variables and supported by existing scientific findings.
Assessing binary classifiers using only positive and unlabeled data
Claesen, Marc, Davis, Jesse, De Smet, Frank, De Moor, Bart
Bart De Moor Dept. of Electrical Engineering, STADIUS KU Leuven & iMinds Medical IT Assessing the performance of a learned model is a crucial part of machine learning. However, in some domains only positive and unlabeled examples are available, which prohibits the use of most standard evaluation metrics. We propose an approach to estimate any metric based on contingency tables, including ROC and PR curves, using only positive and unlabeled data. Estimating these performance metrics is essentially reduced to estimating the fraction of (latent) positives in the unlabeled set, assuming known positives are a random sample of all positives. We provide theoretical bounds on the quality of our estimates, illustrate the importance of estimating the fraction of positives in the unlabeled set and demonstrate empirically that we are able to reliably estimate ROC and PR curves on real data.
Decoding index finger position from EEG using random forests
Weichwald, Sebastian, Meyer, Timm, Schölkopf, Bernhard, Ball, Tonio, Grosse-Wentrup, Moritz
While invasively recorded brain activity is known to provide detailed information on motor commands, it is an open question at what level of detail information about positions of body parts can be decoded from non-invasively acquired signals. In this work it is shown that index finger positions can be differentiated from non-invasive electroencephalographic (EEG) recordings in healthy human subjects. Using a leave-one-subject-out cross-validation procedure, a random forest distinguished different index finger positions on a numerical keyboard above chance-level accuracy. Among the different spectral features investigated, high $\beta$-power (20-30 Hz) over contralateral sensorimotor cortex carried most information about finger position. Thus, these findings indicate that finger position is in principle decodable from non-invasive features of brain activity that generalize across individuals.
Cross-validation of matching correlation analysis by resampling matching weights
The strength of association between a pair of data vectors is represented by a nonnegative real number, called matching weight. For dimensionality reduction, we consider a linear transformation of data vectors, and define a matching error as the weighted sum of squared distances between transformed vectors with respect to the matching weights. Given data vectors and matching weights, the optimal linear transformation minimizing the matching error is solved by the spectral graph embedding of Yan et al. (2007). This method is a generalization of the canonical correlation analysis, and will be called as matching correlation analysis (MCA). In this paper, we consider a novel sampling scheme where the observed matching weights are randomly sampled from underlying true matching weights with small probability, whereas the data vectors are treated as constants. We then investigate a cross-validation by resampling the matching weights. Our asymptotic theory shows that the cross-validation, if rescaled properly, computes an unbiased estimate of the matching error with respect to the true matching weights. Existing ideas of cross-validation for resampling data vectors, instead of resampling matching weights, are not applicable here. MCA can be used for data vectors from multiple domains with different dimensions via an embarrassingly simple idea of coding the data vectors. This method will be called as cross-domain matching correlation analysis (CDMCA), and an interesting connection to the classical associative memory model of neural networks is also discussed.
Learning population and subject-specific brain connectivity networks via Mixed Neighborhood Selection
Monti, Ricardo Pio, Anagnostopoulos, Christoforos, Montana, Giovanni
At the forefront of neuroscientific research is the study of functional connectivity; defined as the statistical dependencies across spatially remote brain regions [Friston, 1994, 2011]. While traditional neuroimaging studies focused on the roles of specific brain regions, there has recently been a significant shift towards understanding the connectivity across regions [Smith, 2012]. This shift has been partially catalyzed by recent advances in imaging techniques. In particular, the introduction of functional MRI (fMRI) has played a crucial role by providing a noninvasive mechanism through which to obtain whole-brain coverage of neuronal activity [Huettel, Song and McCarthy, 2004, Poldrack, Mumford and Nichols, 2011]. The focus of this work involves estimating functional connectivity networks from fMRI data, however the methodology presented can also be used in conjunction with other imaging modalities. From a statistical perspective, Gaussian Graphical models (GGMs) are often employed to model functional connectivity [Smith et al., 2011, Varoquaux and Craddock, 2013]. In this manner, undirected connectivity networks can be inferred by studying the conditional independence structures across brain regions [Lauritzen, 1996].
Feature Selection for Ridge Regression with Provable Guarantees
Paul, Saurabh, Drineas, Petros
We introduce single-set spectral sparsification as a deterministic sampling based feature selection technique for regularized least squares classification, which is the classification analogue to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world datasets, namely a subset of TechTC-300 datasets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Learning with Group Invariant Features: A Kernel Perspective
Mroueh, Youssef, Voinea, Stephen, Poggio, Tomaso
We analyze in this paper a random feature map based on a theory of invariance I-theory introduced recently. More specifically, a group invariant signal signature is obtained through cumulative distributions of group transformed random projections. Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar integration kernel that is invariant to the specified group action. We show how this non-linear random feature map approximates this group invariant kernel uniformly on a set of $N$ points. Moreover, we show that it defines a function space that is dense in the equivalent Invariant Reproducing Kernel Hilbert Space. Finally, we quantify error rates of the convergence of the empirical risk minimization, as well as the reduction in the sample complexity of a learning algorithm using such an invariant representation for signal classification, in a classical supervised learning setting.
CrossCat: A Fully Bayesian Nonparametric Method for Analyzing Heterogeneous, High Dimensional Data
Mansinghka, Vikash, Shafto, Patrick, Jonas, Eric, Petschulat, Cap, Gasner, Max, Tenenbaum, Joshua B.
There is a widespread need for statistical methods that can analyze high-dimensional datasets with- out imposing restrictive or opaque modeling assumptions. This paper describes a domain-general data analysis method called CrossCat. CrossCat infers multiple non-overlapping views of the data, each consisting of a subset of the variables, and uses a separate nonparametric mixture to model each view. CrossCat is based on approximately Bayesian inference in a hierarchical, nonparamet- ric model for data tables. This model consists of a Dirichlet process mixture over the columns of a data table in which each mixture component is itself an independent Dirichlet process mixture over the rows; the inner mixture components are simple parametric models whose form depends on the types of data in the table. CrossCat combines strengths of mixture modeling and Bayesian net- work structure learning. Like mixture modeling, CrossCat can model a broad class of distributions by positing latent variables, and produces representations that can be efficiently conditioned and sampled from for prediction. Like Bayesian networks, CrossCat represents the dependencies and independencies between variables, and thus remains accurate when there are multiple statistical signals. Inference is done via a scalable Gibbs sampling scheme; this paper shows that it works well in practice. This paper also includes empirical results on heterogeneous tabular data of up to 10 million cells, such as hospital cost and quality measures, voting records, unemployment rates, gene expression measurements, and images of handwritten digits. CrossCat infers structure that is consistent with accepted findings and common-sense knowledge in multiple domains and yields predictive accuracy competitive with generative, discriminative, and model-free alternatives.
Machine Learning Sentiment Prediction based on Hybrid Document Representation
Stalidis, Panagiotis, Giatsoglou, Maria, Diamantaras, Konstantinos, Sarigiannidis, George, Chatzisavvas, Konstantinos Ch.
Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a daily basis, express their opinions on products and services to blogs, wikis, social networks, message boards, etc., render the reliable, automated export of sentiments and opinions from unstructured text crucial for several commercial applications. In this paper, we present a novel hybrid vectorization approach for textual resources that combines a weighted variant of the popular Word2Vec representation (based on Term Frequency-Inverse Document Frequency) representation and with a Bag- of-Words representation and a vector of lexicon-based sentiment values. The proposed text representation approach is assessed through the application of several machine learning classification algorithms on a dataset that is used extensively in literature for sentiment detection. The classification accuracy derived through the proposed hybrid vectorization approach is higher than when its individual components are used for text represenation, and comparable with state-of-the-art sentiment detection methodologies.
Causal inference using invariant prediction: identification and confidence intervals
Peters, Jonas, Bühlmann, Peter, Meinshausen, Nicolai
What is the difference of a prediction that is made with a causal model and a non-causal model? Suppose we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference: given different experimental settings (for example various interventions) we collect all models that do show invariance in their predictive accuracy across settings and interventions. The causal model will be a member of this set of models with high probability. This approach yields valid confidence intervals for the causal relationships in quite general scenarios. We examine the example of structural equation models in more detail and provide sufficient assumptions under which the set of causal predictors becomes identifiable. We further investigate robustness properties of our approach under model misspecification and discuss possible extensions. The empirical properties are studied for various data sets, including large-scale gene perturbation experiments.