Asia
Size Adaptive Selection of Most Informative Features
Liu, Si (Chinese Academy of Science) | Liu, Hairong (National University of Singapore) | Latecki, Longin Jan (Temple University) | Yan, Shuicheng (National University of Singapore) | Xu, Changsheng (China-Singapore Institute of Digital Media) | Lu, Hanqing (Chinese Academy of Science)
In this paper, we propose a novel method to select the most informativesubset of features, which has little redundancy andvery strong discriminating power. Our proposed approach automaticallydetermines the optimal number of features and selectsthe best subset accordingly by maximizing the averagepairwise informativeness, thus has obvious advantage overtraditional filter methods. By relaxing the essential combinatorialoptimization problem into the standard quadratic programmingproblem, the most informative feature subset canbe obtained efficiently, and a strategy to dynamically computethe redundancy between feature pairs further greatly acceleratesour method through avoiding unnecessary computationsof mutual information. As shown by the extensive experiments,the proposed method can successfully select the mostinformative subset of features, and the obtained classificationresults significantly outperform the state-of-the-art results onmost test datasets.
Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection
Li, Yu-Feng (Nanjing University, China) | Zhou, Zhi-Hua (Nanjing University, China)
Semi-supervised support vector machines (S3VMs) are a kind of popular approaches which try to improve learning performance by exploiting unlabeled data. Though S3VMs have been found helpful in many situations, they may degenerate performance and the resultant generalization ability may be even worse than using the labeled data only. In this paper, we try to reduce the chance of performance degeneration of S3VMs. Our basic idea is that, rather than exploiting all unlabeled data, the unlabeled instances should be selected such that only the ones which are very likely to be helpful are exploited, while some highly risky unlabeled instances are avoided. We propose the S3VM- us method by using hierarchical clustering to select the unlabeled instances. Experiments on a broad range of data sets over eighty-eight different settings show that the chance of performance degeneration of S3VM- us is much smaller than that of existing S3VMs.
Value Function Approximation in Reinforcement Learning Using the Fourier Basis
Konidaris, George (Massachusetts Institute of Technology) | Osentoski, Sarah (Brown University) | Thomas, Philip (University of Massachusetts Amherst)
We describe the Fourier basis, a linear value function approximation scheme based on the Fourier series. We empirically demonstrate that it performs well compared to radial basis functions and the polynomial basis, the two most popular fixed bases for linear value function approximation, and is competitive with learned proto-value functions.
Adaptive Large Margin Training for Multilabel Classification
Guo, Yuhong (Temple University) | Schuurmans, Dale (University of Alberta)
Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without requiring exponential enumeration of label subsets during training or testing. We investigate novel loss functions for multilabel training within a large margin framework---identifying a simple alternative that yields improved generalization while still allowing efficient training. We furthermore show how covariances between the label models can be learned simultaneously with the classification model itself, in a jointly convex formulation, without compromising scalability. The resulting combination yields state of the art accuracy in multilabel webpage classification.
OASIS: Online Active Semi-Supervised Learning
Goldberg, Andrew B. (Arcode Corporation) | Zhu, Xiaojin (University of Wisconsin-Madison) | Furger, Alex (University of Wisconsin-Madison) | Xu, Jun-Ming (University of Wisconsin-Madison)
We consider a learning setting of importance to large scale machine learning: potentially unlimited data arrives sequentially, but only a small fraction of it is labeled. The learner cannot store the data; it should learn from both labeled and unlabeled data, and it may also request labels for some of the unlabeled items. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. Yet previous learning models fail to consider these characteristics jointly. We present OASIS, a Bayesian model for this learning setting. The main contributions of the model include the novel integration of a semi-supervised likelihood function, a sequential Monte Carlo scheme for efficient online Bayesian updating, and a posterior-reduction criterion for active learning. Encouraging results on both synthetic and real-world optical character recognition data demonstrate the synergy of these characteristics in OASIS.
A Feasible Nonconvex Relaxation Approach to Feature Selection
Gao, Cuixia (Zhejiang University) | Wang, Naiyan (Zhejiang University) | Yu, Qi (Zhejiang University) | Zhang, Zhihua (Zhejiang University)
Variable selection problems are typically addressed under apenalized optimization framework. Nonconvex penalties such as the minimax concave plus (MCP) and smoothly clipped absolute deviation(SCAD), have been demonstrated to have the properties of sparsity practically and theoretically. In this paper we propose a new nonconvex penalty that we call exponential-type penalty. The exponential-type penalty is characterized by a positive parameter,which establishes a connection with the ell 0 and ell 1 penalties.We apply this new penalty to sparse supervised learning problems. To solve to resulting optimization problem, we resort to a reweighted ell 1 minimization method. Moreover, we devise an efficient method for the adaptive update of the tuning parameter. Our experimental results are encouraging. They show that the exponential-type penalty is competitive with MCP and SCAD.
Selective Transfer Between Learning Tasks Using Task-Based Boosting
Eaton, Eric (Bryn Mawr College) | desJardins, Marie (University of Maryland Baltimore County)
The success of transfer learning on a target task is highly dependent on the selected source data. Instance transfer methods reuse data from the source tasks to augment the training data for the target task. If poorly chosen, this source data may inhibit learning, resulting in negative transfer. The current most widely used algorithm for instance transfer, TrAdaBoost, performs poorly when given irrelevant source data. We present a novel task-based boosting technique for instance transfer that selectively chooses the source knowledge to transfer to the target task. Our approach performs boosting at both the instance level and the task level, assigning higher weight to those source tasks that show positive transferability to the target task, and adjusting the weights of individual instances within each source task via AdaBoost. We show that this combination of task- and instance-level boosting significantly improves transfer performance over existing instance transfer algorithms when given a mix of relevant and irrelevant source data, especially for small amounts of data on the target task.
Symmetric Graph Regularized Constraint Propagation
Fu, Zhenyong (City University of Hong Kong) | Lu, Zhiwu (Peking University) | Ip, Horace (City University of Hong Kong) | Peng, Yuxin (Peking University) | Lu, Hongtao (Shanghai Jiao Tong University)
This paper presents a novel symmetric graph regularization framework for pairwise constraint propagation. We first decompose the challenging problem of pairwise constraint propagation into a series of two-class label propagation subproblems and then deal with these subproblems by quadratic optimization with symmetric graph regularization. More importantly, we clearly show that pairwise constraint propagation is actually equivalent to solving a Lyapunov matrix equation, which is widely used in Control Theory as a standard continuous-time equation. Different from most previous constraint propagation methods that suffer from severe limitations, our method can directly be applied to multi-class problem and also can effectively exploit both must-link and cannot-link constraints. The propagated constraints are further used to adjust the similarity between data points so that they can be incorporated into subsequent clustering. The proposed method has been tested in clustering tasks on six real-life data sets and then shown to achieve significant improvements with respect to the state of the arts.
Large Scale Spectral Clustering with Landmark-Based Representation
Chen, Xinlei (Zhejiang University) | Cai, Deng (Zhejiang University)
Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to large-scale problems due to its high computational complexity. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, these methods usually sacrifice quite a lot information of the original data, thus result in a degradation of performance. In this paper, we propose a novel approach, called Landmark-based Spectral Clustering (LSC), for large scale clustering problems. Specifically, we select $p\ (\ll n)$ representative data points as the landmarks and represent the original data points as the linear combinations of these landmarks. The spectral embedding of the data can then be efficiently computed with the landmark-based representation. The proposed algorithm scales linearly with the problem size. Extensive experiments show the effectiveness and efficiency of our approach comparing to the state-of-the-art methods.
A Nonparametric Bayesian Model of Multi-Level Category Learning
Canini, Kevin Robert (University of California, Berkeley) | Griffiths, Thomas L. (University of California, Berkeley)
Categories are often organized into hierarchical taxonomies, that is, tree structures where each node represents a labeled category, and a node's parent and children are, respectively, the category's supertype and subtypes. A natural question is whether it is possible to reconstruct category taxonomies in cases where we are not given explicit information about how categories are related to each other, but only a sample of observations of the members of each category. In this paper, we introduce a nonparametric Bayesian model of multi-level category learning, an extension of the hierarchical Dirichlet process (HDP) that we call the tree-HDP. We demonstrate the ability of the tree-HDP to reconstruct simulated datasets of artificial taxonomies, and show that it produces similar performance to human learners on a taxonomy inference task.