Goto

Collaborating Authors

 Asia


LIFT: Multi-Label Learning with Label-Specific Features

AAAI Conferences

Multi-label learning deals with the problem where each training example is represented by a single instance while associated with a set of class labels. For an unseen example, existing approaches choose to determine the membership of each possible class label to it based on identical feature set, i.e. the very instance representation of the unseen example is employed in the discrimination processes of all labels. However, this commonly-used strategy might be suboptimal as different class labels usually carry specific characteristics of their own, and it could be beneficial to exploit different feature sets for the discrimination of different labels. Based on the above reflection, we propose a new strategy to multi-label learning by leveraging label-specific features, where a simple yet effective algorithm named LIFT is presented. Briefly, LIFT constructs features specific to each label by conducting clustering analysis on its positive and negative instances, and then performs training and testing by querying the clustering results. Extensive experiments across sixteen diversified data sets clearly validate the superiority of LIFT against other well-established multi-label learning algorithms.


Diversity Regularized Machine

AAAI Conferences

Ensemble methods, which train multiple learners for a task, are among the state-of-the-art learning approaches. The diversity of the component learners has been recognized as a key to a good ensemble, and existing ensemble methods try different ways to encourage diversity, mostly by heuristics. In this paper, we propose the diversity regularized machine (DRM) in a mathematical programming framework, which efficiently generates an ensemble of diverse support vector machines (SVMs). Theoretical analysis discloses that the diversity constraint used in DRM can lead to an effective reduction on its hypothesis space complexity, implying that the diversity control in ensemble methods indeed plays a role of regularization as in popular statistical learning approaches. Experiments show that DRM can significantly improve generalization ability and is superior to some state-of-the-art SVM ensemble methods.


Dealing with Concept Drift and Class Imbalance in Multi-Label Stream Classification

AAAI Conferences

Data streams containing objects that are (or can be) associated with more than one label at the same time are ubiquitous. In spite of its important applications, classification of streaming multi-label data is largely unexplored. Existing approaches try to tackle the problem by transferring traditional single-label stream classification practices to the multi-label domain. Nevertheless, they fail to consider some of the unique properties of the problem such as within and between class imbalance and multiple concept drift. To deal with these challenges, this paper proposes a novel multi-label stream classification approach that employs two windows for each label, one for positive and one for negative examples. Instance-sharing is exploited for space efficiency, while a time-efficient instantiation based on the k-Nearest Neighbor algorithm is also proposed. Finally, a batch-incremental thresholding technique is proposed to further deal with the class imbalance problem. Results of an empirical comparison against two other methods on three real world datasets are in favor of the proposed approach.


Learning to Rank Under Multiple Annotators

AAAI Conferences

Learning to rank has received great attention in recent years as it plays a crucial role in information retrieval. The existing concept of learning to rank assumes that each training sample is associated with an instance and a reliable label. However, in practice, this assumption does not necessarily hold true. This study focuses on the learning to rank when each training instance is labeled by multiple annotators that may be unreliable. In such a scenario, no accurate labels can be obtained. This study proposes two learning approaches. One is to simply estimate the ground truth first and then to learn a ranking model with it. The second approach is a maximum likelihood learning approach which estimates the ground truth and learns the ranking model iteratively. The two approaches have been tested on both synthetic and real-world data. The results reveal that the maximum likelihood approach outperforms the first approach significantly and is comparable of achieving results with the learning model considering reliable labels. Further more, both the approaches have been applied for ranking the Web visual clutter.


Bayesian Policy Search with Policy Priors

AAAI Conferences

We consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm's search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm---our algorithm learns to learn a structured policy efficiently. We show how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge. We demonstrate the flexibility of this approach by learning meta search biases, by constructing a nonparametric finite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.


Local and Structural Consistency for Multi-Manifold Clustering

AAAI Conferences

Data sets containing multi-manifold structures are ubiquitous in real-world tasks, and effective grouping of such data is an important yet challenging problem. Though there were many studies on this problem, it is not clear on how to design principled methods for the grouping of multiple hybrid manifolds. In this paper, we show that spectral methods are potentially helpful for hybridmanifold clustering when the neighborhood graph is constructed to connect the neighboring samples from the same manifold. However, traditional algorithms which identify neighbors according to Euclidean distance will easily connect samples belonging to different manifolds. To handle this drawback, we propose a new criterion, i.e., local and structural consistency criterion, which considers the neighboring information as well as the structural information implied by the samples. Based on this criterion, we develop a simple yet effective algorithm, named Local and Structural Consistency (LSC), for clustering with multiple hybrid manifolds. Experiments show that LSC achieves promising performance.


Bi-Weighting Domain Adaptation for Cross-Language Text Classification

AAAI Conferences

Text classification is widely used in many real-world applications. To obtain satisfied classification performance, most traditional data mining methods require lots of labeled data, which can be costly in terms of both time and human efforts. In reality, there are plenty of such resources in English since it has the largest population in the Internet world, which is not true in many other languages. In this paper, we present a novel transfer learning approach to tackle the cross-language text classification problems. We first align the feature spaces in both domains utilizing some on-line translation service, which makes the two feature spaces under the same coordinate. Although the feature sets in both domains are the same, the distributions of the instances in both domains are different, which violates the i.i.d. assumption in most traditional machine learning methods. For this issue, we propose an iterative feature and instance weighting (Bi-Weighting) method for domain adaptation. We empirically evaluate the effectiveness and efficiency of our approach. The experimental results show that our approach outperforms some baselines including four transfer learning algorithms.


Fast Anomaly Detection for Streaming Data

AAAI Conferences

This paper introduces Streaming Half-Space-Trees (HS-Trees), a fast one-class anomaly detector for evolving data streams. It requires only normal data for training and works well when anomalous data are rare. The model features an ensemble of random HS-Trees, and the tree structure is constructed without any data. This makes the method highly efficient because it requires no model restructuring when adapting to evolving data streams. Our analysis shows that Streaming HS-Trees has constant amortised time complexity and constant memory requirement. When compared with a state-of-the-art method, our method performs favourably in terms of detection accuracy and runtime performance. Our experimental results also show that the detection performance of Streaming HS-Trees is not sensitive to its parameter settings.


Angular Decomposition

AAAI Conferences

Dimensionality reduction plays a vital role in pattern recognition. However, for normalized vector data, existing methods do not utilize the fact that the data is normalized. In this paper, we propose to employ an Angular Decomposition of the normalized vector data which corresponds to embedding them on a unit surface. On graph data for similarity/kernel matrices with constant diagonal elements, we propose the Angular Decomposition of the similarity matrices which corresponds to embedding objects on a unit sphere. In these angular embeddings, the Euclidean distance is equivalent to the cosine similarity. Thus data structures best described in the cosine similarity and data structures best captured by the Euclidean distance can both be effectively detected in our angular embedding. We provide the theoretical analysis, derive the computational algorithm, and evaluate the angular embedding on several datasets. Experiments on data clustering demonstrate that our method can provide a more discriminative subspace.


Active Online Classification Via Information Maximization

AAAI Conferences

We propose an online classification approach for co-occurrence data which is based on a simple information theoretic principle. We further show how to properly estimate the uncertainty associated with each prediction of our scheme and demonstrate how to exploit these uncertainty estimates. First, in order to abstain highly uncertain predictions. And second, within an active learning framework, in order to preserve classification accuracy while substantially reducing training set size. Our method is highly efficient in terms of run-time and memory footprint requirements. Experimental results in the domain of text classification demonstrate that the classification accuracy of our method is superior or comparable to other state-of-the-art online classification algorithms.