Goto

Collaborating Authors

 Asia


Efficient Subspace Segmentation via Quadratic Programming

AAAI Conferences

We explore in this paper efficient algorithmic solutions to robustsubspace segmentation. We propose the SSQP, namely SubspaceSegmentation via Quadratic Programming, to partition data drawnfrom multiple subspaces into multiple clusters. The basic idea ofSSQP is to express each datum as the linear combination of otherdata regularized by an overall term targeting zero reconstructioncoefficients over vectors from different subspaces. The derivedcoefficient matrix by solving a quadratic programming problem istaken as an affinity matrix, upon which spectral clustering isapplied to obtain the ultimate segmentation result. Similar tosparse subspace clustering (SCC) and low-rank representation (LRR),SSQP is robust to data noises as validated by experiments on toydata. Experiments on Hopkins 155 database show that SSQP can achievecompetitive accuracy as SCC and LRR in segmenting affine subspaces,while experimental results on the Extended Yale Face Database Bdemonstrate SSQP's superiority over SCC and LRR. Beyond segmentationaccuracy, all experiments show that SSQP is much faster than bothSSC and LRR in the practice of subspace segmentation.


Transfer Learning by Structural Analogy

AAAI Conferences

Transfer learning allows knowledge to be extracted from auxiliary domains and be used to enhance learning in a target domain. For transfer learning to be successful, it is critical to find the similarity between auxiliary and target domains, even when such mappings are not obvious. In this paper, we present a novel algorithm for finding the structural similarity between two domains, to enable transfer learning at a structured knowledge level. In particular, we address the problem of how to learn a non-trivial structural similarity mapping between two different domains when they are completely different on the representation level. This problem is challenging because we cannot directly compare features across domains. Our algorithm extracts the structural features within each domain and then maps the features into the Reproducing Kernel Hilbert Space (RKHS), such that the "structural dependencies" of features across domains can be estimated by kernel matrices of the features within each domain. By treating the analogues from both domains as equivalent, we can transfer knowledge to achieve a better understanding of the domains and improved performance for learning. We validate our approach on synthetic and real-world datasets.


Fast Newton-CG Method for Batch Learning of Conditional Random Fields

AAAI Conferences

We propose a fast batch learning method for linear-chain Conditional Random Fields (CRFs) based on Newton-CG methods. Newton-CG methods are a variant of Newton method for high-dimensional problems. They only require the Hessian-vector products instead of the full Hessian matrices. To speed up Newton-CG methods for the CRF learning, we derive a novel dynamic programming procedure for the Hessian-vector products of the CRF objective function. The proposed procedure can reuse the byproducts of the time-consuming gradient computation for the Hessian-vector products to drastically reduce the total computation time of the Newton-CG methods. In experiments with tasks in natural language processing, the proposed method outperforms a conventional quasi-Newton method. Remarkably, the proposed method is competitive with online learning algorithms that are fast but unstable.


Towards Maximizing the Area Under the ROC Curve for Multi-Class Classification Problems

AAAI Conferences

The Area Under the ROC Curve (AUC) metric has achieved a big success in binary classification problems since they measure the performance of classifiers without making any specific assumptions about the class distribution and misclassification costs. This is desirable because the class distribution and misclassification costs may be unknown during training process or even change in environment. MAUC, the extension of AUC to multi-class problems, has also attracted a lot of attention. However, despite the emergence of approaches for training classifiers with large AUC, little has been done for MAUC. This paper analyzes MAUC in-depth, and reveals that the maximization of MAUC can be achieved by decomposing the multi-class problem into a number of independent sub-problems. These sub-problems are formulated in the form of a โ€œlearning to rankโ€ problem, for which well-established methods already exist. Based on the analysis, a method that employs RankBoost algorithm as the sub-problem solver is proposed to achieve classification systems with maximum MAUC. Empirical studies have shown the advantages of the proposed method over other eight relevant methods. Due to the importance of MAUC to multi-class cost-sensitive learning and class imbalanced learning problems, the proposed method is a general technique for both problems. It can also be generalized to accommodate other learning algorithms as the sub-problem solvers.


Collaborative Usersโ€™ Brand Preference Mining across Multiple Domains from Implicit Feedbacks

AAAI Conferences

Advanced e-applications require comprehensive knowledge about their usersโ€™ preferences in order to provide accurate personalized services. In this paper, we propose to learn usersโ€™ preferences to product brands from their implicit feedbacks such as their searching and browsing behaviors in user Web browsing log data. The user brand preference learning problem is challenge since (1) the usersโ€™ implicit feedbacks are extremely sparse in various product domains; and (2) we can only observe positive feedbacks from usersโ€™ behaviors. In this paper, we propose a latent factor model to collaboratively mine usersโ€™ brand preferences across multiple domains simultaneously. By collective learning, the learning processes in all the domains are mutually enhanced and hence the problem of data scarcity in each single domain can be effectively addressed. On the other hand, we learn our model with an adaption of the Bayesian personalized ranking (BPR) optimization criterion which is a general learning framework for collaborative filtering from implicit feedbacks. Experiments with both synthetic and real world datasets show that our proposed model significantly outperforms the baselines.


Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification

AAAI Conferences

We concern the problem of learning a Mahalanobis distance metric for improving nearest neighbor classification. Our work is built upon the large margin nearest neighbor (LMNN) classification framework. Due to the semidefiniteness constraint in the optimization problem of LMNN, it is not scalable in terms of the dimensionality of the input data. The original LMNN solver partially alleviates this problem by adopting alternating projection methods instead of standard interior-point methods. Still, at each iteration, the computation complexity is at least O(D 3 ) (D is the dimension of input data). In this work, we propose a column generation based algorithm to solve the LMNN optimization problem much more efficiently. Our algorithm is much more scalable in tha tat each iteration, it does not need full eigen-decomposition. Instead, we only need to find the leading eigen value and its corresponding eigen vector, which is of O(D 2 ) complexity. Experiments show the efficiency and efficacy of our algorithms.


Sparse Group Restricted Boltzmann Machines

AAAI Conferences

Since learning in Boltzmann machines is typically quite slow, there is a need to restrict connections within hidden layers. However, theresulting states of hidden units exhibit statistical dependencies. Based on this observation, we propose using l1/l2 regularization upon the activation probabilities of hidden units in restricted Boltzmann machines to capture the local dependencies among hidden units. This regularization not only encourages hidden units of many groups to be inactive given observed data but also makes hidden units within a group compete with each other for modeling observed data. Thus, the l1/l2 regularization on RBMs yields sparsity at both the group and the hidden unit levels. We call RBMs trained with the regularizer sparse group RBMs (SGRBMs). The proposed SGRBMs are appliedto model patches of natural images, handwritten digits and OCR English letters. Then to emphasize that SGRBMs can learn more discriminative features we applied SGRBMs to pretrain deep networks for classification tasks. Furthermore, we illustrate the regularizer can also be applied to deep Boltzmann machines, which lead to sparse group deep Boltzmann machines. When adapted to the MNIST data set, a two-layer sparse group Boltzmann machine achieves an error rate of 0.84%, which is, to our knowledge, the best published result on the permutation-invariant version of the MNIST task.


Latent Semantic Learning by Efficient Sparse Coding with Hypergraph Regularization

AAAI Conferences

This paper presents a novel latent semantic learning algorithm for action recognition. Through efficient sparse coding, we can learn latent semantics (i.e. high-level features) from a large vocabulary of abundant mid-level features (i.e. visual keywords). More importantly, we can capture the manifold structure hidden among mid-level features by incorporating hypergraph regularization into sparse coding. The learnt latent semantics can further be readily used for action recognition by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our sparse coding method with hypergraph regularization can exploit the manifold structure hidden among mid-level features for latent semantic learning, which results in compact but discriminative high-level features for action recognition. We have tested our method on the commonly used KTH action dataset and the unconstrained YouTube action dataset. The experimental results show the superior performance of our method.


Mean Field Inference in Dependency Networks: An Empirical Study

AAAI Conferences

Dependency networks are a compelling alternative to Bayesian networks for learning joint probability distributions from data and using them to compute probabilities. A dependency network consists of a set of conditional probability distributions, each representing the probability of a single variable given its Markov blanket. Running Gibbs sampling with these conditional distributions produces a joint distribution that can be used to answer queries, but suffers from the traditional slowness of sampling-based inference. In this paper, we observe that the mean field update equation can be applied to dependency networks, even though the conditional probability distributions may be inconsistent with each other. In experiments with learning and inference on 12 datasets, we demonstrate that mean field inference in dependency networks offers similar accuracy to Gibbs sampling but with orders of magnitude improvements in speed. Compared to Bayesian networks learned on the same data, dependency networks offer higher accuracy at greater amounts of evidence. Furthermore, mean field inference is consistently more accurate in dependency networks than in Bayesian networks learned on the same data.


Ordinal Regression via Manifold Learning

AAAI Conferences

Ordinal regression is an important research topic in machine learning. It aims to automatically determine the implied rating of a data item on a fixed, discrete rating scale. In this paper, we present a novel ordinal regression approach via manifold learning, which is capable of uncovering the embedded nonlinear structure of the data set according to the observations in the highdimensional feature space. By optimizing the order information of the observations and preserving the intrinsic geometry of the data set simultaneously, the proposed algorithm provides the faithful ordinal regression to the new coming data points. To offer more general solution to the data with natural tensor structure, we further introduce the multilinear extension of the proposed algorithm, which can support the ordinal regression of high order data like images. Experiments on various data sets validate the effectiveness of the proposed algorithm as well as its extension.