Goto

Collaborating Authors

 Genre


Linear Discriminant Analysis: New Formulations and Overfit Analysis

AAAI Conferences

In this paper, we will present a unified view for LDA. We will (1) emphasize that standard LDA solutions are not unique, (2) propose several new LDA formulations: St-orthonormal LDA, Sw-orthonormal LDA and orthogonal LDA which have unique solutions, and (3) show that with St-orthonormal LDA and Sw-orthonormal LDA formulations, solutions to all four major LDA objective functions are identical. Furthermore, we perform an indepth analysis to show that the LDA sometimes performs poorly due to over-fitting, i.e., it picks up PCA dimensions with small eigenvalues. From this analysis, we propose a stable LDA which uses PCA first to reduce to a small PCA subspace and do LDA in the subspace.


Latent Semantic Learning by Efficient Sparse Coding with Hypergraph Regularization

AAAI Conferences

This paper presents a novel latent semantic learning algorithm for action recognition. Through efficient sparse coding, we can learn latent semantics (i.e. high-level features) from a large vocabulary of abundant mid-level features (i.e. visual keywords). More importantly, we can capture the manifold structure hidden among mid-level features by incorporating hypergraph regularization into sparse coding. The learnt latent semantics can further be readily used for action recognition by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our sparse coding method with hypergraph regularization can exploit the manifold structure hidden among mid-level features for latent semantic learning, which results in compact but discriminative high-level features for action recognition. We have tested our method on the commonly used KTH action dataset and the unconstrained YouTube action dataset. The experimental results show the superior performance of our method.


Mean Field Inference in Dependency Networks: An Empirical Study

AAAI Conferences

Dependency networks are a compelling alternative to Bayesian networks for learning joint probability distributions from data and using them to compute probabilities. A dependency network consists of a set of conditional probability distributions, each representing the probability of a single variable given its Markov blanket. Running Gibbs sampling with these conditional distributions produces a joint distribution that can be used to answer queries, but suffers from the traditional slowness of sampling-based inference. In this paper, we observe that the mean field update equation can be applied to dependency networks, even though the conditional probability distributions may be inconsistent with each other. In experiments with learning and inference on 12 datasets, we demonstrate that mean field inference in dependency networks offers similar accuracy to Gibbs sampling but with orders of magnitude improvements in speed. Compared to Bayesian networks learned on the same data, dependency networks offer higher accuracy at greater amounts of evidence. Furthermore, mean field inference is consistently more accurate in dependency networks than in Bayesian networks learned on the same data.


Size Adaptive Selection of Most Informative Features

AAAI Conferences

In this paper, we propose a novel method to select the most informativesubset of features, which has little redundancy andvery strong discriminating power. Our proposed approach automaticallydetermines the optimal number of features and selectsthe best subset accordingly by maximizing the averagepairwise informativeness, thus has obvious advantage overtraditional filter methods. By relaxing the essential combinatorialoptimization problem into the standard quadratic programmingproblem, the most informative feature subset canbe obtained efficiently, and a strategy to dynamically computethe redundancy between feature pairs further greatly acceleratesour method through avoiding unnecessary computationsof mutual information. As shown by the extensive experiments,the proposed method can successfully select the mostinformative subset of features, and the obtained classificationresults significantly outperform the state-of-the-art results onmost test datasets.



OASIS: Online Active Semi-Supervised Learning

AAAI Conferences

We consider a learning setting of importance to large scale machine learning: potentially unlimited data arrives sequentially, but only a small fraction of it is labeled. The learner cannot store the data; it should learn from both labeled and unlabeled data, and it may also request labels for some of the unlabeled items. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. Yet previous learning models fail to consider these characteristics jointly. We present OASIS, a Bayesian model for this learning setting. The main contributions of the model include the novel integration of a semi-supervised likelihood function, a sequential Monte Carlo scheme for efficient online Bayesian updating, and a posterior-reduction criterion for active learning. Encouraging results on both synthetic and real-world optical character recognition data demonstrate the synergy of these characteristics in OASIS.


A Feasible Nonconvex Relaxation Approach to Feature Selection

AAAI Conferences

Variable selection problems are typically addressed under apenalized optimization framework. Nonconvex penalties such as the minimax concave plus (MCP) and smoothly clipped absolute deviation(SCAD), have been demonstrated to have the properties of sparsity practically and theoretically. In this paper we propose a new nonconvex penalty that we call exponential-type penalty. The exponential-type penalty is characterized by a positive parameter,which establishes a connection with the ell 0 and ell 1 penalties.We apply this new penalty to sparse supervised learning problems. To solve to resulting optimization problem, we resort to a reweighted ell 1 minimization method. Moreover, we devise an efficient method for the adaptive update of the tuning parameter. Our experimental results are encouraging. They show that the exponential-type penalty is competitive with MCP and SCAD.


Across-Model Collective Ensemble Classification

AAAI Conferences

Ensemble classification methods that independently construct component models (e.g., bagging) improve accuracy over single models by reducing the error due to variance. Some work has been done to extend ensemble techniques for classification in relational domains by taking relational data characteristics or multiple link types into account during model construction. However, since these approaches follow the conventional approach to ensemble learning, they improve performance by reducing the error due to variance in learning. We note however, that variance in inference can be an additional source of error in relational methods that use collective classification, since inferred values are propagated during inference. We propose a novel ensemble mechanism for collective classification that reduces ย both learning and inference variance, by incorporating prediction averaging into the collective inference process itself. We show that our proposed method significantly outperforms a straightforward relational ensemble baseline on both synthetic and real-world datasets.


Large Scale Spectral Clustering with Landmark-Based Representation

AAAI Conferences

Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to large-scale problems due to its high computational complexity. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, these methods usually sacrifice quite a lot information of the original data, thus result in a degradation of performance. In this paper, we propose a novel approach, called Landmark-based Spectral Clustering (LSC), for large scale clustering problems. Specifically, we select $p\ (\ll n)$ representative data points as the landmarks and represent the original data points as the linear combinations of these landmarks. The spectral embedding of the data can then be efficiently computed with the landmark-based representation. The proposed algorithm scales linearly with the problem size. Extensive experiments show the effectiveness and efficiency of our approach comparing to the state-of-the-art methods.


A Nonparametric Bayesian Model of Multi-Level Category Learning

AAAI Conferences

Categories are often organized into hierarchical taxonomies, that is, tree structures where each node represents a labeled category, and a node's parent and children are, respectively, the category's supertype and subtypes. A natural question is whether it is possible to reconstruct category taxonomies in cases where we are not given explicit information about how categories are related to each other, but only a sample of observations of the members of each category. In this paper, we introduce a nonparametric Bayesian model of multi-level category learning, an extension of the hierarchical Dirichlet process (HDP) that we call the tree-HDP. We demonstrate the ability of the tree-HDP to reconstruct simulated datasets of artificial taxonomies, and show that it produces similar performance to human learners on a taxonomy inference task.