Goto

Collaborating Authors

 Industry


Kernel Hyperalignment

Neural Information Processing Systems

We offer a regularized, kernel extension of the multi-set, orthogonal Procrustes problem, or hyperalignment. Our new method, called Kernel Hyperalignment, expands the scope of hyperalignment to include nonlinear measures of similarity and enables the alignment of multiple datasets with a large number of base features. With direct application to fMRI data analysis, kernel hyperalignment is well-suited for multi-subject alignment of large ROIs, including the entire cortex. We conducted experiments using real-world, multi-subject fMRI data.


Causal discovery with scale-mixture model for spatiotemporal variance dependencies

Neural Information Processing Systems

In conventional causal discovery, structural equation models (SEM) are directly applied to the observed variables, meaning that the causal effect can be represented as a function of the direct causes themselves. However, in many real world problems, there are significant dependencies in the variances or energies, which indicates that causality may possibly take place at the level of variances or energies. In this paper, we propose a probabilistic causal scale-mixture model with spatiotemporal variance dependencies to represent a specific type of generating mechanism of the observations. In particular, the causal mechanism including contemporaneous and temporal causal relations in variances or energies is represented by a Structural Vector AutoRegressive model (SVAR). We prove the identifiability of this model under the non-Gaussian assumption on the innovation processes. We also propose algorithms to estimate the involved parameters and discover the contemporaneous causal structure. Experiments on synthesis and real world data are conducted to show the applicability of the proposed model and algorithms.


A lattice filter model of the visual pathway

Neural Information Processing Systems

Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Because the typical correlation time of natural stimuli, as well as the extent of temporal receptive fields of lateral geniculate nucleus (LGN) neurons, is much greater than neuronal time constants, such decorrelation must be done in stages combining contributions of multiple neurons. We propose to model temporal decorrelation in the visual pathway with the lattice filter, a signal processing device for stage-wise decorrelation of temporal signals. The stage-wise architecture of the lattice filter maps naturally onto the visual pathway (photoreceptors -> bipolar cells -> retinal ganglion cells -> LGN) and its filter weights can be learned using Hebbian rules in a stage-wise sequential manner. Moreover, predictions of neural activity from the lattice filter model are consistent with physiological measurements in LGN neurons and fruit fly second-order visual neurons. Therefore, the lattice filter model is a useful abstraction that may help unravel visual system function.


Waveform Driven Plasticity in BiFeO3 Memristive Devices: Model and Implementation

Neural Information Processing Systems

Memristive devices have recently been proposed as efficient implementations of plastic synapses in neuromorphic systems. The plasticity in these memristive devices, i.e. their resistance change, is defined by the applied waveforms. This behavior resembles biological synapses, whose plasticity is also triggered by mechanisms that are determined by local waveforms. However, learning in memristive devices has so far been approached mostly on a pragmatic technological level. The focus seems to be on finding any waveform that achieves spike-timing-dependent plasticity (STDP), without regard to the biological veracity of said waveforms or to further important forms of plasticity. Bridging this gap, we make use of a plasticity model driven by neuron waveforms that explains a large number of experimental observations and adapt it to the characteristics of the recently introduced BiFeO$_3$ memristive material. Based on this approach, we show STDP for the first time for this material, with learning window replication superior to previous memristor-based STDP implementations. We also demonstrate in measurements that it is possible to overlay short and long term plasticity at a memristive device in the form of the well-known triplet plasticity. To the best of our knowledge, this is the first implementations of triplet plasticity on any physical memristive device.


Nonparametric Reduced Rank Regression

Neural Information Processing Systems

We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a $q$-dimensional response, with a shared $p$-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data.


Bayesian Warped Gaussian Processes

Neural Information Processing Systems

Warped Gaussian processes (WGP) [1] model output observations in regression tasks as a parametric nonlinear transformation of a Gaussian process (GP). The use of this nonlinear transformation, which is included as part of the probabilistic model, was shown to enhance performance by providing a better prior model on several data sets. In order to learn its parameters, maximum likelihood was used. In this work we show that it is possible to use a non-parametric nonlinear transformation in WGP and variationally integrate it out. The resulting Bayesian WGP is then able to work in scenarios in which the maximum likelihood WGP failed: Low data regime, data with censored values, classification, etc. We demonstrate the superior performance of Bayesian warped GPs on several real data sets.


Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

Neural Information Processing Systems

Sparse graphical modelling/inverse covariance selection is an important problem in machine learning and has seen significant advances in recent years. A major focus has been on methods which perform model selection in high dimensions. To this end, numerous convex $\ell_1$ regularization approaches have been proposed in the literature. It is not however clear which of these methods are optimal in any well-defined sense. A major gap in this regard pertains to the rate of convergence of proposed optimization methods. To address this, an iterative thresholding algorithm for numerically solving the $\ell_1$-penalized maximum likelihood problem for sparse inverse covariance estimation is presented. The proximal gradient method considered in this paper is shown to converge at a linear rate, a result which is the first of its kind for numerically solving the sparse inverse covariance estimation problem. The convergence rate is provided in closed form, and is related to the condition number of the optimal point. Numerical results demonstrating the proven rate of convergence are presented.


Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Neural Information Processing Systems

A new methodology is developed for joint analysis of a matrix and accompanying documents, with the documents associated with the matrix rows/columns. The documents are modeled with a focused topic model, inferring interpretable latent binary features for each document. A new matrix decomposition is developed, with latent binary features associated with the rows/columns, and with imposition of a low-rank constraint. The matrix decomposition and topic model are coupled by sharing the latent binary feature vectors associated with each. The model is applied to roll-call data, with the associated documents defined by the legislation. Advantages of the proposed model are demonstrated for prediction of votes on a new piece of legislation, based only on the observed text of legislation. The coupling of the text and legislation is also shown to yield insight into the properties of the matrix decomposition for roll-call data.


Privacy Aware Learning

Neural Information Processing Systems

We study statistical risk minimization problems under a version of privacy in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff betweenthe amount of privacy the data preserves and the utility, measured by convergence rate, of any statistical estimator.


Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

Neural Information Processing Systems

Multiple Kernel Learning (MKL) generalizes SVMs to the setting where one simultaneously trains a linear classifier and chooses an optimal combination of given base kernels. Model complexity is typically controlled using various norm regularizations on the vector of base kernel mixing coefficients. Existing methods, however, neither regularize nor exploit potentially useful information pertaining to how kernels in the input set 'interact'; that is, higher order kernel-pair relationships that can be easily obtained via unsupervised (similarity, geodesics), supervised (correlation in errors), or domain knowledge driven mechanisms (which features were used to construct the kernel?). We show that by substituting the norm penalty with an arbitrary quadratic function Q \succeq 0, one can impose a desired covariance structure on mixing coefficient selection, and use this as an inductive bias when learning the concept. This formulation significantly generalizes the widely used 1- and 2-norm MKL objectives. We explore the model’s utility via experiments on a challenging Neuroimaging problem, where the goal is to predict a subject’s conversion to Alzheimer’s Disease (AD) by exploiting aggregate information from several distinct imaging modalities. Here, our new model outperforms the state of the art (p-values << 10−3 ). We briefly discuss ramifications in terms of learning bounds (Rademacher complexity).