Asia
Multi-Label Classification with Feature-Aware Non-Linear Label Space Transformation
Li, Xin (Temple University) | Guo, Yuhong (Temple University)
Multi-label classification with many classes has recently drawn a lot of attention. Existing methods address this problem by performing linear label space transformation to reduce the dimension of label space, and then conducting independent regression for each reduced label dimension. These methods however do not capture nonlinear correlations of the multiple labels and may lead to significant information loss in the process of label space reduction. In this paper, we first propose to exploit kernel canonical correlation analysis (KCCA) to capture nonlinear label correlation information and perform nonlinear label space reduction. Then we develop a novel label space reduction method that explicitly combines linear and nonlinear label space transformations based on CCA and KCCA respectively to address multi-label classification with many classes. The proposed method is a feature-aware label transformation method that promotes the label predictability in the transformed label space from the input features. We conduct experiments on a number of multi-label classification datasets. The proposed approach demonstrates good performance, comparing to a number of state-of-the-art label dimension reduction methods.
Data Sparseness in Linear SVM
Li, Xiang (University of Western Ontario and National University of Defense Technology) | Wang, Huaimin (National University of Defense Technology) | Gu, Bin (Nanjing University of Information Science Technologyย and University of Western Ontario) | Ling, Charles X. (University of Western Ontario)
Large sparse datasets are common in many real-world applications. Linear SVM has been shown to be very efficient for classifying such datasets. However, it is still unknown how data sparseness would affect its convergence behavior. To study this problem in a systematic manner, we propose a novel approach to generate large and sparse data from real-world datasets, using statistical inference and the data sampling process in the PAC framework. We first study the convergence behavior of linear SVM experimentally, and make several observations, useful for real-world applications. We then offer theoretical proofs for our observations by studying the Bayes risk and PAC bound. Our experiment and theoretic results are valuable for learning large sparse datasets with linear SVM.
Bayesian Active Learning for Posterior Estimation
Kandasamy, Kirthevasan (Carnegie Mellon University) | Schneider, Jeff (Carnegie Mellon University) | Poczos, Barnabas (Carnegie Mellon University)
This paper studies active posterior estimation in a Bayesian setting when the likelihood is expensive to evaluate. Existing techniques for posterior estimation are based on generating samples representative of the posterior. Such methods do not consider efficiency in terms of likelihood evaluations. In order to be query efficient we treat posterior estimation in an active regression framework. ย We propose two myopic query strategies to choose where to evaluate the likelihood and implement them using Gaussian processes. Via experiments on a series of synthetic and real examples we demonstrate that our approach is significantly more query efficient than existing techniques and other heuristics for posterior estimation.
Training-Time Optimization of a Budgeted Booster
Huang, Yi (University of Illinios at Chicago) | Powers, Brian (University of Illinios at Chicago) | Reyzin, Lev (University of Illinios at Chicago)
We consider the problem of feature-efficient prediction - a setting where features have costs and the learner is limited by a budget constraint on the total cost of the features it can examine in test time. We focus on solving this problem with boosting by optimizing the choice of base learners in the training phase and stopping the boosting process when the learner's budget runs out. We experimentally show that our method improves upon the boosting approach AdaBoostRS [Reyzin, 2011] and in many cases also outperforms the recent algorithm SpeedBoost [Grubb and Bagnell, 2012]. We provide a theoretical justication for our optimization method via the margin bound. We also experimentally show that our method outperforms pruned decision trees, a natural budgeted classifier.
Scalable Gaussian Process Regression Using Deep Neural Networks
Huang, Wenbing (Tsinghua University) | Zhao, Deli (HTC Research) | Sun, Fuchun (Tsinghua University) | Liu, Huaping (Tsinghua University) | Chang, Edward (HTC Research)
We propose a scalable Gaussian process model for regression by applying a deep neural network as the feature-mapping function. We first pre-train the deep neural network with a stacked denoising auto-encoder in an unsupervised way. Then, we perform a Bayesian linear regression on the top layer of the pre-trained deep network. The resulting model, Deep-Neural-Network-based Gaussian Process (DNN-GP), can learn much more meaningful representation of the data by the finite-dimensional but deep-layered feature-mapping function. Unlike standard Gaussian processes, our model scales well with the size of the training set due to the avoidance of kernel matrix inversion. Moreover, we present a mixture of DNN-GPs to further improve the regression performance. For the experiments on three representative large datasets, our proposed models significantly outperform the state-of-the-art algorithms of Gaussian process regression.
A New Simplex Sparse Learning Model to Measure Data Similarity for Clustering
Huang, Jin (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)
The Laplacian matrix of a graph can be used in many areas of mathematical research and has a physical interpretation in various theories. However, there are a few open issues in the Laplacian graph construction: (i) Selecting the appropriate scale of analysis, (ii) Selecting the appropriate number of neighbors, (iii) Handling multiscale data, and, (iv) Dealing with noise and outliers. In this paper, we propose that the affinity between pairs of samples could be computed using sparse representation with proper constraints. This parameter free setting automatically produces the Laplacian graph, leads to significant reduction in computation cost and robustness to the outliers and noise. We further provide an efficient algorithm to solve the difficult optimization problem based on improvement of existing algorithms. To demonstrate our motivation, we conduct spectral clustering experiments with benchmark methods. Empirical experiments on 9 data sets demonstrate the effectiveness of our method.
Identification of Time-Dependent Causal Model: A Gaussian Process Treatment
Huang, Biwei (Max Planck Institute for Intelligent Systems) | Zhang, Kun (University of Southern California) | Schรถlkopf, Bernhard (Max Planck Institute for Intelligent Systems)
Most approaches to causal discovery assume a fixed (or time-invariant) causal model; however, in practical situations, especially in neuroscience and economics, causal relations might be time-dependent for various reasons. This paper aims to identify the time-dependent causal relations from observational data. We consider general formulations for time-varying causal modeling on stochastic processes, which can also capture the causal influence from a certain type of unobserved confounders. ย We focus on two issues: one is whether such a causal model, including the causal direction, is identifiable from observational data; the other is how to estimate such a model in a principled way. We show that under appropriate assumptions, the causal structure is identifiable according to our formulated model. We then propose a principled way for its estimation by extending Gaussian Process regression, which enables an automatic way to learn how the causal model changes over time. Experimental results on both artificial and real data demonstrate the practical usefulness of time-dependent causal modeling and the effectiveness of the proposed approach for estimation.
Robust Subspace Segmentation by Simultaneously Learning Data Representations and Their Affinity Matrix
Guo, Xiaojie (Chinese Academy of Sciecnces)
The goal of subspace segmentation is to partition a set of data drawn from a union of subspace into their underlying subspaces. The performance of ย spectral clustering based approaches heavily depends on learned data affinity matrices, which are usually constructed either directly from the raw data or from their computed representations. In this paper, we propose a novel method to simultaneously learn the representations of data and the affinity matrix of representation in a unified optimization framework. A novel Augmented Lagrangian Multiplier based algorithm is designed to effectively and efficiently seek the optimal solution of the problem. The experimental results on both synthetic and real data demonstrate the efficacy of the proposed method and its superior performance over the state-of-the-art alternatives.
Online Robust Low Rank Matrix Recovery
Guo, Xiaojie (Chinese Academy of Sciences)
Low rank matrix recovery has shown its importance as a theoretic foundation in many areas of information processing. Its solutions are usually obtained in batch mode that requires to load all the data into memory during processing, and thus are hardly applicable on large scale data. Moreover, a fraction of data may be severely contaminated by outliers, which makes accurate recovery significantly more challenging. This paper proposes a novel online robust low rank matrix recovery method to address these difficulties. In particular, we first introduce an online algorithm to solve the problem of low rank matrix completion. Then we move on to low rank matrix recovery from observations with intensive outliers. The outlier support is robustly estimated from a perspective of mixture model. Experiments on both synthetic and real data are conducted to demonstrate the efficacy of our method and show its superior performance over the state-of-the-arts.
Bi-Parameter Space Partition for Cost-Sensitive SVM
Gu, Bin (Nanjing University of Information Science and Technology) | Sheng, Victor S. (University of Central Arkansas) | Li, Shuo (GE HealthCare)
Model selection is an important problem of cost-sensitive SVM (CS-SVM). Although using solution path to find global optimal parameters is a powerful method for model selection, it is a challenge to extend the framework to solve two regularization parameters of CS-SVM simultaneously. To overcome this challenge, we make three main steps in this paper. (i) A critical-regions-based bi-parameter space partition algorithm is proposed to present all piecewise linearities of CS-SVM. (ii) An invariant-regions-based bi-parameter space partition algorithm is further proposed to compute empirical errors for all parameter pairs. (iii) The global optimal solutions for K-fold cross validation are computed by superposing K invariant region based bi-parameter space partitions into one. The three steps constitute the model selection of CS-SVM which can find global optimal parameter pairs in K-fold cross validation. Experimental results on seven normal datsets and four imbalanced datasets, show that our proposed method has better generalization ability and than various kinds of grid search methods, however, with less running time.