Goto

Collaborating Authors

 Asia


Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation

AAAI Conferences

The increasing realization in recent years that artificial In particular, there is an alternative kind of discriminative neural networks (ANNs) can learn many layers of features learning that is unsupervised rather than supervised. In this (Bengio et al. 2007; Hinton, Osindero, and Teh 2006; proposed alternative approach, called divergent discriminative Marc'Aurelio, Boureau, and LeCun 2007; CireลŸan et al. feature accumulation (DDFA), instead of searching for 2010) has reinvigorated the study of representation learning features constrained by the objective of solving the discriminative in ANNs (Bengio, Courville, and Vincent 2013). While classification problem, a learning algorithm can instead the beginning of this renaissance focused on the sequential attempt to collect as many features that discriminate unsupervised training of individual layers one upon another strongly among training examples as possible, without regard (Bengio et al. 2007; Hinton, Osindero, and Teh 2006), the to any particular classification problem.


Spectral Label Refinement for Noisy and Missing Text Labels

AAAI Conferences

With the recent growth of online content on the Web, there have been more user generated data with noisy and missing labels, e.g., social tags and voted labels from Amazon's Mechanical Turks. Most of machine learning methods, which require accurate label sets, could not be trusted when the label sets were yet unreliable. In this paper, we provide a text label refinement algorithm to adjust the labels for such noisy and missing labeled datasets. We assume that the labeled sets can be refined based on the labels with certain confidence, and the similarity between data being consistent with the labels. We propose a label smoothness ratio criterion to measure the smoothness of the labels and the consistency between labels and data. We demonstrate the effectiveness of the label refining algorithm on eight labeled document datasets, and validate that the results are useful for generating better labels.


SP-SVM: Large Margin Classifier for Data on Multiple Manifolds

AAAI Conferences

As one of the most important state-of-the-art classification techniques, Support Vector Machine (SVM) has been widely adopted in many real-world applications, such as object detection, face recognition, text categorization, etc., due to its competitive practical performance and elegant theoretical interpretation. However, it treats all samples independently, and ignores the fact that, in many real situations especially when data are in high dimensional space, samples typically lie on low dimensional manifolds of the feature space and thus a sample can be related to its neighbors by being represented as a linear combination of other samples on the same manifold. This linear representation, which is usually sparse, reflects the structure of underlying manifolds. It has been extensively explored in the recent literature and proven to be critical for the performance of classification. To benefit from both the underlying low dimensional manifold structure and the large margin classifier, this paper proposes a novel method called Sparsity Preserving Support Vector Machine(SP-SVM), which explicitly considers the sparse representation of samples while maximizing the margin between different classes. Consequently, SP-SVM inherits both the discriminative power of support vector machine and the merits of sparsity. A set of experiments on real-world benchmark data sets show that SP-SVM achieves significantly higher precision on recognition task than various competitive baselines including the traditional SVM, the sparse representation based method and the classical nearest neighbor classifier.


Doubly Robust Covariate Shift Correction

AAAI Conferences

Covariate shift correction allows one to perform supervised learning even when the distribution of the covariates on the training set does not match that on the test set. This is achieved by re-weighting observations. Such a strategy removes bias, potentially at the expense of greatly increased variance. We propose a simple strategy for removing bias while retaining small variance. It uses a biased, low variance estimate as a prior and corrects the final estimate relative to the prior. We prove that this yields an efficient estimator and demonstrate good experimental performance.


Leveraging Features and Networks for Probabilistic Tensor Decomposition

AAAI Conferences

We present a probabilistic model for tensor decomposition where one or more tensor modes may have side-information about the mode entities in form of their features and/or their adjacency network. We consider a Bayesian approach based on the Canonical PARAFAC (CP) decomposition and enrich this single-layer decomposition approach with a two-layer decomposition. The second layer fits a factor model for each layer-one factor matrix and models the factor matrix via the mode entities' features and/or the network between the mode entities. The second-layer decomposition of each factor matrix also learns a binary latent representation for the entities of that mode, which can be useful in its own right. Our model can handle both continuous as well as binary tensor observations. Another appealing aspect of our model is the simplicity of the model inference, with easy-to-sample Gibbs updates. We demonstrate the results of our model on several benchmarks datasets, consisting of both real and binary tensors.


Pareto Ensemble Pruning

AAAI Conferences

Ensemble learning is among the state-of-the-art learning techniques, which trains and combines many base learners. Ensemble pruning removes some of the base learners of an ensemble, and has been shown to be able to further improve the generalization performance. However, the two goals of ensemble pruning, i.e., maximizing the generalization performance and minimizing the number of base learners, can conflict when being pushed to the limit. Most previous ensemble pruning approaches solve objectives that mix the two goals. In this paper, motivated by the recent theoretical advance of evolutionary optimization, we investigate solving the two goals explicitly in a bi-objective formulation and propose the PEP (Pareto Ensemble Pruning) approach. We disclose that PEP does not only achieve significantly better performance than the state-of-the-art approaches, and also gains theoretical support.


Adaptive Sampling with Optimal Cost for Class-Imbalance Learning

AAAI Conferences

Learning from imbalanced data sets is one of the challenging problems in machine learning, which means the number of negative examples is far more than that of positive examples. The main problems of existing methods are: (1) The degree of re-sampling, a key factor greatly affecting performance, needs to be pre-fixed, which is difficult to make the optimal choice; (2) Many useful negative samples are discarded in under-sampling; (3) The effectiveness of algorithm-level methods are limited because they just use the original training data for single classifier. To address the above issues, a novel approach of adaptive sampling with optimal cost is proposed for class-imbalance learning in this paper. The novelty of the proposed approach mainly lies in: adaptively over-sampling the minority positive examples and under-sampling the majority negative examples, forming different sub-classifiers by different subsets of training data with the best cost ratio adaptively chosen, and combining these sub-classifiers according to their accuracy to create a strong classifier. It aims to make full use of the whole training data and improve the performance of class-imbalance learning classifier. The solid experiments are conducted to compare the performance between the proposed approach and 12 state-of-the-art methods on challenging 16 UCI data sets on 3 evaluation metrics, and the results show the proposed approach can achieve superior performance in class-imbalance learning.


Detecting and Tracking Concept Class Drift and Emergence in Non-Stationary Fast Data Streams

AAAI Conferences

As the proliferation of constant data feeds increases from social media, embedded sensors, and other sources, the capability to provide predictive concept labels to these data streams will become ever more important and lucrative. However, the dynamic, non-stationary nature, and effectively infinite length of data streams pose additional challenges for stream data mining algorithms. The sparse quantity of training data also limits the use of algorithms that are heavily dependent on supervised training. To address all these issues, we propose an incremental semi-supervised method that provides general concept class label predictions, but it also tracks concept clusters within the feature space using an innovative new online clustering algorithm. Each concept cluster contains an embedded stream classifier, creating a diverse ensemble for data instance classification within the generative model used for detecting emerging concepts in the stream. Unlike other recent novel class detection methods, our method goes beyond detecting, and continues to differentiate and track the emerging concepts. We show the effectiveness of our method on several synthetic and real world data sets, and we compare the results against other leading baseline methods.


Probabilistic Attributed Hashing

AAAI Conferences

Due to the simplicity and efficiency, many hashing methods have recently been developed for large-scale similarity search. Most of the existing hashing methods focus on mapping low-level features to binary codes, but neglect attributes that are commonly associated with data samples. Attribute data, such as image tag, product brand, and user profile, can represent human recognition better than low-level features. However, attributes have specific characteristics, including high-dimensional, sparse and categorical properties, which is hardly leveraged into the existing hashing learning frameworks. In this paper, we propose a hashing learning framework, Probabilistic Attributed Hashing (PAH), to integrate attributes with low-level features. The connections between attributes and low-level features are built through sharing a common set of latent binary variables, i.e. hash codes, through which attributes and features can complement each other. Finally, we develop an efficient iterative learning algorithm, which is generally feasible for large-scale applications. Extensive experiments and comparison study are conducted on two public datasets, i.e., DBLP and NUS-WIDE. The results clearly demonstrate that the proposed PAH method substantially outperforms the peer methods.


Learning Relational Sum-Product Networks

AAAI Conferences

Sum-product networks (SPNs) are a recently-proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational Sum-Product Networks (RSPNs), a new tractable first-order probabilistic architecture. RSPNs generalize SPNs by modeling a set of instances jointly, allowing them to influence each other's probability distributions, as well as modeling probabilities of relations between objects. We also present LearnRSPN, the first algorithm for learning high-treewidth tractable statistical relational models. LearnRSPN is a recursive top-down structure learning algorithm for RSPNs, based on Gens and Domingos' LearnSPN algorithm for propositional SPN learning. We evaluate the algorithm on three datasets; the RSPN learning algorithm outperforms Markov Logic Networks in both running time and predictive accuracy.