Goto

Collaborating Authors

 Asia


Active learning for misspecified generalized linear models

Neural Information Processing Systems

Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in nonlinear settings through the use of Mercer kernels.


Towards a general independent subspace analysis

Neural Information Processing Systems

The increasingly popular independent component analysis (ICA) may only be applied to data following the generative ICA model in order to guarantee algorithmindependent and theoretically valid results. Subspace ICA models generalize the assumption of component independence to independence between groups of components. They are attractive candidates for dimensionality reduction methods, however are currently limited by the assumption of equal group sizes or less general semi-parametric models. By introducing the concept of irreducible independent subspaces or components, we present a generalization to a parameter-free mixture model. Moreover, we relieve the condition of at-most-one-Gaussian by including previous results on non-Gaussian component analysis. After introducing this general model, we discuss joint block diagonalization with unknown block sizes, on which we base a simple extension of JADE to algorithmically perform the subspace analysis. Simulations confirm the feasibility of the algorithm.


A Probabilistic Algorithm Integrating Source Localization and Noise Suppression of MEG and EEG data

Neural Information Processing Systems

We have developed a novel algorithm for integrating source localization and noise suppression based on a probabilistic graphical model of stimulus-evoked MEG/EEG data. Our algorithm localizes multiple dipoles while suppressing noise sources with the computational complexity equivalent to a single dipole scan, and is therefore more efficient than traditional multidipole fitting procedures. In simulation, the algorithm can accurately localize and estimate the time course of several simultaneously-active dipoles, with rotating or fixed orientation, at noise levels typical for averaged MEG data. Furthermore, the algorithm is superior to beamforming techniques, which we show to be an approximation to our graphical model, in estimation of temporally correlated sources. Success of this algorithm for localizing auditory cortex in a tumor patient and for localizing an epileptic spike source are also demonstrated.


Multi-Instance Multi-Label Learning with Application to Scene Classification

Neural Information Processing Systems

In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning.


Learning with Hypergraphs: Clustering, Classification, and Embedding

Neural Information Processing Systems

We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classification on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs.


MLLE: Modified Locally Linear Embedding Using Multiple Weights

Neural Information Processing Systems

The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE.


Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

Neural Information Processing Systems

Semi-supervised learning algorithms have been successfully applied in many applications with scarce labeled data, by utilizing the unlabeled data. One important category is graph based semi-supervised learning algorithms, for which the performance depends considerably on the quality of the graph, or its hyperparameters. In this paper, we deal with the less explored problem of learning the graphs. We propose a graph learning method for the harmonic energy minimization method; this is done by minimizing the leave-one-out prediction error on labeled data points. We use a gradient based method and designed an efficient algorithm which significantly accelerates the calculation of the gradient by applying the matrix inversion lemma and using careful pre-computation. Experimental results show that the graph learning method is effective in improving the performance of the classification algorithm.


Simplifying Mixture Models through Function Approximation

Neural Information Processing Systems

Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models.



Nonnegative Sparse PCA

Neural Information Processing Systems

We describe a nonnegative variant of the "Sparse PCA" problem. The goal is to create a low dimensional representation from a collection of points which on the one hand maximizes the variance of the projected points and on the other uses only parts of the original coordinates, and thereby creating a sparse representation. What distinguishes our problem from other Sparse PCA formulations is that the projection involves only nonnegative weights of the original coordinates -- a desired quality in various fields, including economics, bioinformatics and computer vision. Adding nonnegativity contributes to sparseness, where it enforces a partitioning of the original coordinates among the new axes. We describe a simple yet efficient iterative coordinate-descent type of scheme which converges to a local optimum of our optimization criteria, giving good results on large real world datasets.