Goto

Collaborating Authors

 Regression


Logistic Regression for Single Trial EEG Classification

Neural Information Processing Systems

We propose a novel framework for the classification of single trial ElectroEncephaloGraphy (EEG), based on regularized logistic regression. Framed in this robust statistical framework no prior feature extraction or outlier removal is required. In the first case, the problem is convex and the logistic regression is optimal under a generative model. The latter case is shown to be related to the Common Spatial Pattern (CSP) algorithm, which is a popular technique in Brain Computer Interfacing. The regression coefficients can also be topographically mapped onto the scalp similarly to CSP pro jections, which allows neuro-physiological interpretation.


Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression

Neural Information Processing Systems

We propose a method for reconstruction of human brain states directly from functional neuroimaging data. The method extends the traditional multivariate regression analysis of discretized fMRI data to the domain of stochastic functional measurements, facilitating evaluation of brain responses to naturalistic stimuli and boosting the power of functional imaging. Population based incremental learning is used to search for spatially distributed voxel clusters, taking into account the variation in Haemodynamic lag across brain areas and among subjects by voxel-wise non-linear registration of stimuli to fMRI data. The method captures spatially distributed brain responses to naturalistic stimuli without attempting to localize function. Application of the method for prediction of naturalistic stimuli from new and unknown fMRI data shows that the approach is capable of identifying distributed clusters of brain locations that are highly predictive of a specific stimuli.


Online Linear Regression and Its Application to Model-Based Reinforcement Learning

Neural Information Processing Systems

We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a model-based approach and show that a special type of online linear regression allows us to learn MDPs with (possibly kernalized) linearly parameterized dynamics. This result builds on Kearns and Singh's work that provides a provably efficient algorithm for finite state MDPs. Our approach is not restricted to the linear setting, and is applicable to other classes of continuous MDPs.


FilterBoost: Regression and Classification on Large Datasets

Neural Information Processing Systems

We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm's strong theoretical proper- ties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification.


Privacy-preserving logistic regression

Neural Information Processing Systems

This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. First we apply an idea of Dwork et al. to design a specific privacy-preserving machine learning algorithm, logistic regression. This involves bounding the sensitivity of logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. Noting that the approach of Dwork et al. has limitations when applied to other machine learning algorithms, we then present another privacy-preserving logistic regression algorithm. The algorithm is based on solving a perturbed objective, and does not depend on the sensitivity.


High-dimensional support union recovery in multivariate regression

Neural Information Processing Systems

We study the behavior of block (cid:96)1/(cid:96)2 regularization for multivariate regression, where a K-dimensional response vector is regressed upon a fixed set of p co- variates. The problem of support union recovery is to recover the subset of covariates that are active in at least one of the regression problems. Study- ing this problem under high-dimensional scaling (where the problem parame- ters as well as sample size n tend to infinity simultaneously), our main result is to show that exact recovery is possible once the order parameter given by θ(cid:96)1/(cid:96)2(n, p, s): n/[2ψ(B) log(p s)] exceeds a critical threshold. Here n is the sample size, p is the ambient dimension of the regression model, s is the size of the union of supports, and ψ(B) is a sparsity-overlap function that measures a combination of the sparsities and overlaps of the K-regression coefficient vectors that constitute the model. This sparsity-overlap function reveals that block (cid:96)1/(cid:96)2 regularization for multivariate regression never harms performance relative to a naive (cid:96)1-approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal rela- tive to the design.


The Infinite Hierarchical Factor Regression Model

Neural Information Processing Systems

We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.


Heterogeneous multitask learning with joint sparsity constraints

Neural Information Processing Systems

Multitask learning addressed the problem of learning related tasks whose information can be shared each other. In this paper we consider the problem learning multiple related tasks where tasks consist of both continuous and discrete outputs from a common set of input variables that lie in a high-dimensional space. All of the tasks are related in the sense that they share the same set of relevant input variables, but the amount of influence of each input on different outputs may vary. We formulate this problem as a combination of linear regression and logistic regression and model the joint sparsity as L1/Linf and L1/L2-norm of the model parameters. Among several possible applications, our approach addresses an important open problem in genetic association mapping, where we are interested in discovering genetic markers that influence multiple correlated traits jointly.


Directed Regression

Neural Information Processing Systems

When used to guide decisions, linear regression analysis typically involves estimation of regression coefficients via ordinary least squares and their subsequent use to make decisions. When there are multiple response variables and features do not perfectly capture their relationships, it is beneficial to account for the decision objective when computing regression coefficients. Empirical optimization does so but sacrifices performance when features are well-chosen or training data are insufficient. We propose directed regression, an efficient algorithm that combines merits of ordinary least squares and empirical optimization. We demonstrate through a computational study that directed regression can generate significant performance gains over either alternative.


Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference

Neural Information Processing Systems

We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique to account for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. We illustrate the utility of this framework for inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to PageRank. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.