Regression
Generalized² Linear² Models
We introduce the Generalized2 Linear2 Model, a statistical estima(cid:173) tor which combines features of nonlinear regression and factor anal(cid:173) ysis. Here A and Bare low-rank matrices, while j, g, and h are link functions. They also include new and interesting special cases, one of which we describe below. We also present an iterative procedure which optimizes the parameters of a (GL)2M. This procedure reduces to well-known algorithms for some of the special cases listed above; for other special cases, it is new.
A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities
We devise and experiment with a dynamical kernel-based system for tracking hand movements from neural activity. The state of the system corresponds to the hand location, velocity, and acceleration, while the system's input are the instantaneous spike rates. The system's state dy- namics is defined as a combination of a linear mapping from the previous estimated state and a kernel-based mapping tailored for modeling neural activities. In contrast to generative models, the activity-to-state mapping is learned using discriminative methods by minimizing a noise-robust loss function. We use this approach to predict hand trajectories on the basis of neural activity in motor cortex of behaving monkeys and find that the proposed approach is more accurate than both a static approach based on support vector regression and the Kalman filter. The paper focuses on the problem of tracking hand movements, which constitute smooth spatial trajectories, from spike trains of a neural population. We do so by devising a dynam- ical system which employs a tailored kernel for spike trains along with a linear mapping corresponding to the states' dynamics.
Computing regularization paths for learning multiple kernels
The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an al- gorithm that computes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complex- ity of solving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic re- gression, we show empirically that the effect of the block 1-norm reg- ularization differs notably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case.
Online Bounds for Bayesian Algorithms
We present a competitive analysis of Bayesian learning algorithms in the online learning setting and show that many simple Bayesian algorithms (such as Gaussian linear regression and Bayesian logistic regression) per- form favorably when compared, in retrospect, to the single best model in the model class. The analysis does not assume that the Bayesian algo- rithms' modeling assumptions are "correct," and our bounds hold even if the data is adversarially chosen. For Gaussian linear regression (us- ing logloss), our error bounds are comparable to the best bounds in the online learning literature, and we also provide a lower bound showing that Gaussian linear regression is optimal in a certain worst case sense. We also give bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression.
From Lasso regression to Feature vector machine
Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors.
Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares
An increasing number of projects in neuroscience requires the sta- tistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural firing or in operating artificial de- vices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classical linear regression approaches are often numerically too fragile in high dimensions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faith- fully reconstructed with linear approaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that auto- matically detects and excludes irrelevant features in the data, reg- ularizing against overfitting. In comparison with ordinary least squares, stepwise regression, partial least squares, LASSO regres- sion and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method offers a superior mixture of characteristics in terms of reg- ularization against overfitting, computational efficiency and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques.
Large-scale biophysical parameter estimation in single neurons via constrained linear regression
Our understanding of the input-output function of single cells has been substantially advanced by biophysically accurate multi-compartmental models. The large number of parameters needing hand tuning in these models has, however, somewhat hampered their applicability and interpretability. Here we propose a simple and well-founded method for automatic estimation of many of these key parameters: 1) the spatial distribution of channel densities on the cell's membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels' reversal potentials; 4) the intercompartmental conductances; and 5) the noise level in each compartment. We assume experimental access to: a) the spatiotemporal voltage signal in the dendrite (or some contiguous subpart thereof, e.g. The key observation is that, given data a)-c), all of the parameters 1)-4) may be simultaneously inferred by a version of constrained linear regression; this regression, in turn, is efficiently solved using standard algorithms, without any "local minima" problems despite the large number of parameters and complex dynamics.
Sparse Gaussian Processes using Pseudo-inputs
We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N) training cost and O(M 2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail.
Speakers optimize information density through syntactic reduction
If language users are rational, they might choose to structure their utterances so as to optimize communicative properties. In particular, information-theoretic and psycholinguistic considerations suggest that this may include maximizing the uniformity of information density in an utterance. We investigate this possibility in the context of syntactic reduction, where the speaker has the option of either marking a higher-order unit (a phrase) with an extra word, or leaving it unmarked. We demonstrate that speakers are more likely to reduce less information-dense phrases. In a second step, we combine a stochastic model of structured utterance production with a logistic-regression model of syntactic reduction to study which types of cues speakers employ when estimating the predictability of upcoming elements.
Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
Multinomial logistic regression provides the standard penalised maximum- likelihood solution to multi-class pattern recognition problems. More recently, the development of sparse multinomial logistic regression models has found ap- plication in text processing and microarray classification, where explicit identifi- cation of the most informative features is of value. In this paper, we propose a sparse multinomial logistic regression method, in which the sparsity arises from the use of a Laplace prior, but where the usual regularisation parameter is inte- grated out analytically. Evaluation over a range of benchmark datasets reveals this approach results in similar generalisation performance to that obtained using cross-validation, but at greatly reduced computational expense.