Asia
Two-Stage Sparse Representation for Robust Recognition on Large-Scale Database
He, Ran (Dalian University of Technology) | Hu, BaoGang (Chinese Academy of Sciences) | Zheng, Wei-Shi (Queen Mary University of London) | Guo, YanQing (Dalian University of Technology)
This paper proposes a novel robust sparse representation method, called the two-stage sparse representation (TSR), for robust recognition on a large-scale database. Based on the divide and conquer strategy, TSR divides the procedure of robust recognition into outlier detection stage and recognition stage. In the first stage, a weighted linear regression is used to learn a metric in which noise and outliers in image pixels are detected. In the second stage, based on the learnt metric, the large-scale dataset is firstly filtered into a small set according to the nearest neighbor criterion. Then a sparse representation is computed by the non-negative least squares technique. The sparse solution is unique and can be optimized efficiently. The extensive numerical experiments on several public databases demonstrate that the proposed TSR approach generally obtains better classification accuracy than the state of the art Sparse Representation Classification (SRC). At the same time, by using the TSR, a significant reduction of computational cost is reached by over fifty times in comparison with the SRC, which enables the TSR to be deployed more suitably for large-scale dataset.
Multi-Task Sparse Discriminant Analysis (MtSDA) with Overlapping Categories
Han, Yahong (Zhejiang University) | Wu, Fei (Zhejiang University) | Jia, Jinzhu (University of California, Berkeley) | Zhuang, Yueting (Zhejiang University) | Yu, Bin (University of California, Berkeley)
Multi-task learning aims at combining information across tasks to boost prediction performance, especially when the number of training samples is small and the number of predictors is very large. In this paper, we first extend the Sparse Discriminate Analysis (SDA) of Clemmensen et al.. We call this Multi-task Sparse Discriminate Analysis (MtSDA). MtSDA formulates multi-label prediction as a quadratic optimization problem whereas SDA obtains single labels via a nearest class mean rule. Second, we propose a class of equicorrelation matrices to use in MtSDA which includes the identity matrix. MtSDA with both matrices are compared with singletask learning (SVM and LDA+SVM) and multi-task learning (HSML). The comparisons are made on real data sets in terms of AUC and F-measure. The data results show that MtSDA outperforms other methods substantially almost all the time and in some cases MtSDA with the equicorrelation matrix substantially outperforms MtSDA with identity matrix.
A Topic Model for Linked Documents and Update Rules for its Estimation
Guo, Zhen (State University of New York at Binghamton) | Zhu, Shenghuo (NEC Laboratories America, Inc.) | Zhang, Zhongfei (State University of New York at Binghamton) | Chi, Yun (NEC Laboratories America, Inc.) | Gong, Yihong (NEC Laboratories America, Inc.)
The latent topic model plays an important role in the unsupervised learning from a corpus, which provides a probabilistic interpretation of the corpus in terms of the latent topic space. An underpinning assumption which most of the topic models are based on is that the documents are assumed to be independent of each other. However, this assumption does not hold true in reality and the relations among the documents are available in different ways, such as the citation relations among the research papers. To address this limitation, in this paper we present a Bernoulli Process Topic (BPT) model, where the interdependence among the documents is modeled by a random Bernoulli process. In the BPT model a document is modeled as a distribution over topics that is a mixture of the distributions associated with the related documents. Although BPT aims at obtaining a better document modeling by incorporating the relations among the documents, it could also be applied to many applications including detecting the topics from corpora and clustering the documents. We apply the BPT model to several document collections and the experimental comparisons against several state-of-the-art approaches demonstrate the promising performance.
Exact Algorithms and Experiments for Hierarchical Tree Clustering
Hartung, Sepp (University of Jena) | Guo, Jiong (Universität des Saarlandes) | Komusiewicz, Christian (University of Jena) | Niedermeier, Rolf (University of Jena) | Uhlmann, Johannes (University of Jena)
We perform new theoretical as well as first-time experimental studies for the NP-hard problem to find a closest ultrametric for given dissimilarity data on pairs. This is a central problem in the area of hierarchical clustering, where so far only polynomial-time approximation algorithms were known. In contrast, we develop efficient preprocessing algorithms (known as kernelization in parameterized algorithmics) with provable performance guarantees and a simple search tree algorithm. These are used to find optimal solutions. Our experiments with synthetic and biological data show the effectiveness of our algorithms and demonstrate that an approximation algorithm due to Ailon and Charikar [FOCS 2005] often gives (almost) optimal solutions.
Facial Age Estimation by Learning from Label Distributions
Geng, Xin (Monash University) | Smith-Miles, Kate (Monash University) | Zhou, Zhi-Hua (Nanjing University)
One of the main difficulties in facial age estimation is the lack of sufficient training data for many ages. Fortunately, the faces at close ages look similar since aging is a slow and smooth process. Inspired by this observation, in this paper, instead of considering each face image as an example with one label (age), we regard each face image as an example associated with a label distribution. The label distribution covers a number of class labels, representing the degree that each label describes the example. Through this way, in addition to the real age, one face image can also contribute to the learning of its adjacent ages. We propose an algorithm named IIS-LLD for learning from the label distributions, which is an iterative optimization process based on the maximum entropy model. Experimental results show the advantages of IIS-LLD over the traditional learning methods based on single-labeled data.
Learning Discriminative Piecewise Linear Models with Boundary Points
Gai, Kun (Tsinghua University) | Zhang, Changshui (Tsinghua University)
We introduce a new discriminative piecewise linear model for classification. A two-step method is developed to construct the model. In the first step, we sample some boundary points that lie between positive and negative data, as well as corresponding directions from negative data to positive data. The sampling result gives a discriminative nonparametric decision surface, which preserves enough information to correctly classify all training data. To simplify this surface, in the second step we propose a nonparametric approach for linear surface segmentation using Dirichlet process mixtures. The final result is a piecewise linear model, in which the number of linear surface pieces is automatically determined by the Bayesian inference according to data. Experiments on both synthetic and real data verify the effectiveness of the proposed model.
What if the Irresponsible Teachers Are Dominating?
Chen, Shuo (Tsinghua University) | Zhang, Jianwen (Tsinghua University) | Chen, Guangyun (Tsinghua University) | Zhang, Changshui (Tsinghua University)
As the Internet-based crowdsourcing services become more and more popular, learning from multiple teachers or sources has received more attention of the researchers in the machine learning area. In this setting, the learning system is dealing with samples and labels provided by multiple teachers, who in common cases, are non-expert. Their labeling styles and behaviors are usually diverse, some of which are even detrimental to the learning system. Thus, simply putting them together and utilizing the algorithms designed for single-teacher scenario would be not only improper, but also damaging. The problem calls for more specific methods. Our work focuses on a case where the teachers are composed of good ones and irresponsible ones. By irresponsible, we mean the teacher who takes the labeling task not seriously and label the sample at random without inspecting the sample itself. This behavior is quite common when the task is not attractive enough and the teacher just wants to finish it as soon as possible. Sometimes, the irresponsible teachers could take a considerable part among all the teachers. If we do not take out their effects, our learning system would be ruined with no doubt. In this paper, we propose a method for picking out the good teachers with promising experimental results. It works even when the irresponsible teachers are dominating in numbers.
G-Optimal Design with Laplacian Regularization
Chen, Chun (Zhejiang University) | Chen, Zhengguang (Zhejiang University) | Bu, Jiajun (Zhejiang University) | Wang, Can (Zhejiang University) | Zhang, Lijun (Zhejiang University) | Zhang, Cheng (China Disabled Persons')
In many real world applications, labeled data are usually expensive to get, while there may be a large amount of unlabeled data. To reduce the labeling cost, active learning attempts to discover the most informative data points for labeling. Recently, Optimal Experimental Design (OED) techniques have attracted an increasing amount of attention. OED is concerned with the design of experiments that minimizes variances of a parameterized model. Typical design criteria include D-, A-, and E-optimality. However, all these criteria are based on an ordinary linear regression model which aims to minimize the empirical error whereas the geometrical structure of the data space is not well respected. In this paper, we propose a novel optimal experimental design approach for active learning, called Laplacian G-Optimal Design (LapGOD), which considers both discriminating and geometrical structures. By using Laplacian Regularized Least Squares which incorporates manifold regularization into linear regression, our proposed algorithm selects those data points that minimizes the maximum variance of the predicted values on the data manifold. We also extend our algorithm to nonlinear case by using kernel trick. The experimental results on various image databases have shown that our proposed LapGOD active learning algorithm can significantly enhance the classification accuracy if the selected data points are used as training data.
Adaptive Transfer Learning
Cao, Bin (The Hong Kong University of Science and Technology) | Pan, Sinno Jialin (The Hong Kong University of Science and Technology) | Zhang, Yu (The Hong Kong University of Science and Technology) | Yeung, Dit-Yan (The Hong Kong University of Science and Technology) | Yang, Qiang (The Hong Kong University of Science and Technology)
Transfer learning aims at reusing the knowledge in some source tasks to improve the learning of a target task. Many transfer learning methods assume that the source tasks and the target task be related, even though many tasks are not related in reality. However, when two tasks are unrelated, the knowledge extracted from a source task may not help, and even hurt, the performance of a target task. Thus, how to avoid negative transfer and then ensure a "safe transfer" of knowledge is crucial in transfer learning. In this paper, we propose an Adaptive Transfer learning algorithm based on Gaussian Processes (AT-GP), which can be used to adapt the transfer learning schemes by automatically estimating the similarity between a source and a target task. The main contribution of our work is that we propose a new semi-parametric transfer kernel for transfer learning from a Bayesian perspective, and propose to learn the model with respect to the target task, rather than all tasks as in multi-task learning. We can formulate the transfer learning problem as a unified Gaussian Process (GP) model. The adaptive transfer ability of our approach is verified on both synthetic and real-world datasets.
Myopic Policies for Budgeted Optimization with Constrained Experiments
Azimi, Javad (Oregon State University) | Fern, Xiaoli (Oregon State University) | Fern, Alan (Oregon State University) | Burrows, Elizabeth (Oregon State University) | Chaplen, Frank (Oregon State University) | Fan, Yanzhen (Oregon State University) | Liu, Hong (Oregon State University) | Jaio, Jun (Portland State University) | Schaller, Rebecca (Portland State University)
Motivated by a real-world problem, we study a novel budgeted optimization problem where the goal is to optimize an unknown function f ( x ) given a budget. In our setting, it is not practical to request samples of f ( x ) at precise input values due to the formidable cost of precise experimental setup. Rather, we may request a constrained experiment, which is a subset r of the input space for which the experimenter returns x in r and f ( x ). Importantly, as the constraints become looser, the experimental cost decreases, but the uncertainty about the location x of the next observation increases. Our goal is to manage this trade-off by selecting a sequence of constrained experiments to best optimize f within the budget. We introduce cost-sensitive policies for selecting constrained experiments using both model-free and model-based approaches, inspired by policies for unconstrained settings. Experiments on synthetic functions and functions derived from real-world experimental data indicate that our policies outperform random selection, that the model-based policies are superior to model-free ones, and give insights into which policies are preferable overall.