Goto

Collaborating Authors

 Statistical Learning


Fast Newton-CG Method for Batch Learning of Conditional Random Fields

AAAI Conferences

We propose a fast batch learning method for linear-chain Conditional Random Fields (CRFs) based on Newton-CG methods. Newton-CG methods are a variant of Newton method for high-dimensional problems. They only require the Hessian-vector products instead of the full Hessian matrices. To speed up Newton-CG methods for the CRF learning, we derive a novel dynamic programming procedure for the Hessian-vector products of the CRF objective function. The proposed procedure can reuse the byproducts of the time-consuming gradient computation for the Hessian-vector products to drastically reduce the total computation time of the Newton-CG methods. In experiments with tasks in natural language processing, the proposed method outperforms a conventional quasi-Newton method. Remarkably, the proposed method is competitive with online learning algorithms that are fast but unstable.


A Generalised Solution to the Out-of-Sample Extension Problem in Manifold Learning

AAAI Conferences

Manifold learning is a powerful tool for reducing the dimensionality of a dataset by finding a low-dimensional embedding that retains important geometric and topological features. In many applications it is desirable to add new samples to a previously learnt embedding, this process of adding new samples is known as the out-of-sample extension problem. Since many manifold learning algorithms do not naturally allow for new samples to be added we present an easy to implement generalized solution to the problem that can be used with any existing manifold learning algorithm. Our algorithm is based on simple geometric intuition about the local structure of a manifold and our results show that it can be effectively used to add new samples to a previously learnt embedding. We test our algorithm on both artificial and real world image data and show that our method significantly out performs existing out-of-sample extension strategies.


Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents

AAAI Conferences

Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints on agent design: modify the reward function. In this work, we compare this reward design approach to the common leaf-evaluation heuristic approach for improving planning agents. We show that in many agents, the reward design approach strictly subsumes the leaf-evaluation approach, i.e., there exists a reward function for every leaf-evaluation heuristic that leads to equivalent behavior, but the converse is not true. We demonstrate that this generality leads to improved performance when an agent makes approximations in addition to the finite-horizon approximation. As part of our contribution, we extend PGRD, an online reward design algorithm, to develop reward design algorithms for Sparse Sampling and UCT, two algorithms capable of planning in large state spaces.


Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification

AAAI Conferences

We concern the problem of learning a Mahalanobis distance metric for improving nearest neighbor classification. Our work is built upon the large margin nearest neighbor (LMNN) classification framework. Due to the semidefiniteness constraint in the optimization problem of LMNN, it is not scalable in terms of the dimensionality of the input data. The original LMNN solver partially alleviates this problem by adopting alternating projection methods instead of standard interior-point methods. Still, at each iteration, the computation complexity is at least O(D 3 ) (D is the dimension of input data). In this work, we propose a column generation based algorithm to solve the LMNN optimization problem much more efficiently. Our algorithm is much more scalable in tha tat each iteration, it does not need full eigen-decomposition. Instead, we only need to find the leading eigen value and its corresponding eigen vector, which is of O(D 2 ) complexity. Experiments show the efficiency and efficacy of our algorithms.


Linear Discriminant Analysis: New Formulations and Overfit Analysis

AAAI Conferences

In this paper, we will present a unified view for LDA. We will (1) emphasize that standard LDA solutions are not unique, (2) propose several new LDA formulations: St-orthonormal LDA, Sw-orthonormal LDA and orthogonal LDA which have unique solutions, and (3) show that with St-orthonormal LDA and Sw-orthonormal LDA formulations, solutions to all four major LDA objective functions are identical. Furthermore, we perform an indepth analysis to show that the LDA sometimes performs poorly due to over-fitting, i.e., it picks up PCA dimensions with small eigenvalues. From this analysis, we propose a stable LDA which uses PCA first to reduce to a small PCA subspace and do LDA in the subspace.


Latent Semantic Learning by Efficient Sparse Coding with Hypergraph Regularization

AAAI Conferences

This paper presents a novel latent semantic learning algorithm for action recognition. Through efficient sparse coding, we can learn latent semantics (i.e. high-level features) from a large vocabulary of abundant mid-level features (i.e. visual keywords). More importantly, we can capture the manifold structure hidden among mid-level features by incorporating hypergraph regularization into sparse coding. The learnt latent semantics can further be readily used for action recognition by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our sparse coding method with hypergraph regularization can exploit the manifold structure hidden among mid-level features for latent semantic learning, which results in compact but discriminative high-level features for action recognition. We have tested our method on the commonly used KTH action dataset and the unconstrained YouTube action dataset. The experimental results show the superior performance of our method.


Ordinal Regression via Manifold Learning

AAAI Conferences

Ordinal regression is an important research topic in machine learning. It aims to automatically determine the implied rating of a data item on a fixed, discrete rating scale. In this paper, we present a novel ordinal regression approach via manifold learning, which is capable of uncovering the embedded nonlinear structure of the data set according to the observations in the highdimensional feature space. By optimizing the order information of the observations and preserving the intrinsic geometry of the data set simultaneously, the proposed algorithm provides the faithful ordinal regression to the new coming data points. To offer more general solution to the data with natural tensor structure, we further introduce the multilinear extension of the proposed algorithm, which can support the ordinal regression of high order data like images. Experiments on various data sets validate the effectiveness of the proposed algorithm as well as its extension.


Size Adaptive Selection of Most Informative Features

AAAI Conferences

In this paper, we propose a novel method to select the most informativesubset of features, which has little redundancy andvery strong discriminating power. Our proposed approach automaticallydetermines the optimal number of features and selectsthe best subset accordingly by maximizing the averagepairwise informativeness, thus has obvious advantage overtraditional filter methods. By relaxing the essential combinatorialoptimization problem into the standard quadratic programmingproblem, the most informative feature subset canbe obtained efficiently, and a strategy to dynamically computethe redundancy between feature pairs further greatly acceleratesour method through avoiding unnecessary computationsof mutual information. As shown by the extensive experiments,the proposed method can successfully select the mostinformative subset of features, and the obtained classificationresults significantly outperform the state-of-the-art results onmost test datasets.


Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection

AAAI Conferences

Semi-supervised support vector machines (S3VMs) are a kind of popular approaches which try to improve learning performance by exploiting unlabeled data. Though S3VMs have been found helpful in many situations, they may degenerate performance and the resultant generalization ability may be even worse than using the labeled data only. In this paper, we try to reduce the chance of performance degeneration of S3VMs. Our basic idea is that, rather than exploiting all unlabeled data, the unlabeled instances should be selected such that only the ones which are very likely to be helpful are exploited, while some highly risky unlabeled instances are avoided. We propose the S3VM- us method by using hierarchical clustering to select the unlabeled instances. Experiments on a broad range of data sets over eighty-eight different settings show that the chance of performance degeneration of S3VM- us is much smaller than that of existing S3VMs.


Value Function Approximation in Reinforcement Learning Using the Fourier Basis

AAAI Conferences

We describe the Fourier basis, a linear value function approximation scheme based on the Fourier series. We empirically demonstrate that it performs well compared to radial basis functions and the polynomial basis, the two most popular fixed bases for linear value function approximation, and is competitive with learned proto-value functions.