Industry
Minimax Probability Machine
Lanckriet, Gert, Ghaoui, Laurent E., Bhattacharyya, Chiranjib, Jordan, Michael I.
When constructing a classifier, the probability of correct classification offuture data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization.We also show how to exploit Mercer kernels in this setting to obtain nonlinear decision boundaries. A worst-case bound on the probability of misclassification of future data is obtained explicitly. 1 Introduction Consider the problem of choosing a linear discriminant by minimizing the probabilities thatdata vectors fall on the wrong side of the boundary. One way to attempt to achieve this is via a generative approach in which one makes distributional assumptions aboutthe class-conditional densities and thereby estimates and controls the relevant probabilities. The need to make distributional assumptions, however, casts doubt on the generality and validity of such an approach, and in discriminative solutionsto classification problems it is common to attempt to dispense with class-conditional densities entirely.
Online Learning with Kernels
Kivinen, Jyrki, Smola, Alex J., Williamson, Robert C.
We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization.
Discriminative Direction for Kernel Classifiers
In many scientific and engineering applications, detecting and understanding differencesbetween two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classification setting,the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, suchanalysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical modelin the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier function. Wederive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work.
Product Analysis: Learning to Model Observations as Products of Hidden Variables
Frey, Brendan J., Kannan, Anitha, Jojic, Nebojsa
Factor analysis and principal components analysis can be used to model linear relationships between observed variables and linearly map high-dimensional data to a lower-dimensional hidden space. In factor analysis, the observations are modeled as a linear combination ofnormally distributed hidden variables. We describe a nonlinear generalization of factor analysis, called "product analysis", thatmodels the observed variables as a linear combination of products of normally distributed hidden variables. Just as factor analysiscan be viewed as unsupervised linear regression on unobserved, normally distributed hidden variables, product analysis canbe viewed as unsupervised linear regression on products of unobserved, normally distributed hidden variables. The mapping betweenthe data and the hidden space is nonlinear, so we use an approximate variational technique for inference and learning.
Spectral Kernel Methods for Clustering
Cristianini, Nello, Shawe-Taylor, John, Kandola, Jaz S.
In this paper we introduce new algorithms for unsupervised learning basedon the use of a kernel matrix. All the information required bysuch algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related costfunctions, the Alignment and the'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters andlabels.
Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms
Khardon, Roni, Roth, Dan, Servedio, Rocco A.
We study online learning in Boolean domains using kernels which capture featureexpansions equivalent to using conjunctions over basic features. Wedemonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization ability ofthe resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithmover an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple functions. Wealso consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. While known upper bounds imply that Winnow can learn DNF formulae with a polynomial mistake bound in this setting, we prove that it is computationally hard to simulate Winnow's behaviorfor learning DNF over such a feature set, and thus that such kernel functions for Winnow are not efficiently computable.
Generating velocity tuning by asymmetric recurrent connections
Xie, Xiaohui, Giese, Martin A.
Asymmetric lateral connections are one possible mechanism that can account forthe direction selectivity of cortical neurons. We present a mathematical analysisfor a class of these models. Contrasting with earlier theoretical work that has relied on methods from linear systems theory, we study the network's nonlinear dynamic properties that arise when the threshold nonlinearity of the neurons is taken into account. We show that such networks have stimulus-locked traveling pulse solutions that are appropriate for modeling the responses of direction selective cortical neurons. In addition, our analysis shows that outside a certain regime of stimulus speeds the stability of this solutions breaks down giving rise to another class of solutions that are characterized by specific spatiotemporal periodicity.This predicts that if direction selectivity in the cortex is mainly achieved by asymmetric lateral connections lurching activity waves might be observable in ensembles of direction selective cortical neurons within appropriate regimes of the stimulus speed.
Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes
Trappenberg, Thomas P., Rolls, Edmund T., Stringer, Simon M.
Inferior temporal cortex (IT) neurons have large receptive fields when a single effective object stimulus is shown against a blank background, but have much smaller receptive fields when the object is placed in a natural scene. Thus, translation invariant object recognition is reduced in natural scenes, and this may help object selection. We describe a model which accounts for this by competition within an attractor in which the neurons are tuned to different objects in the scene, and the fovea has a higher cortical magnification factor than the peripheral visual field. Furthermore, weshow that top-down object bias can increase the receptive field size, facilitating object search in complex visual scenes, and providing a model of object-based attention. The model leads to the prediction that introduction of a second object into a scene with blank background will reduce the receptive field size to values that depend on the closeness of the second object to the target stimulus. We suggest that mechanisms of this type enable the output of IT to be primarily about one object, so that the areas that receive from IT can select the object as a potential target for action.
Characterizing Neural Gain Control using Spike-triggered Covariance
Schwartz, Odelia, Chichilnisky, E.J., Simoncelli, Eero P.
Spike-triggered averaging techniques are effective for linear characterization of neural responses. But neurons exhibit important nonlinear behaviors, such as gain control, that are not captured by such analyses. We describe a spike-triggered covariance method for retrieving suppressive components of the gain control signal in a neuron. We demonstrate the method in simulation and on retinal ganglion cell data. Analysis of physiological data reveals significant suppressive axes and explains neural nonlinearities. This method should be applicable to other sensory areas and modalities.