Goto

Collaborating Authors

 Asia


Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds

AAAI Conferences

Subspace recovery from noisy or even corrupted data is critical for various applications in machine learning and data analysis. To detect outliers, Robust PCA (R PCA) via Outlier Pursuit was proposed and had found many successful applications. However, the current theoretical analysis on Outlier Pursuit only shows that it succeeds when the sparsity of the corruption matrix is of O(n/r), where n is the number of the samples and r is the rank of the intrinsic matrix which may be comparable to n. Moreover, the regularization parameter is suggested as 3/(7 squareroot gamma n}, where gamma is a parameter that is not known a priori. In this paper, with incoherence condition and proposed ambiguity condition we prove that Outlier Pursuit succeeds when the rank of the intrinsic matrix is of O(n log n) and the sparsity of the corruption matrix is of O(n). We further show that the orders of both bounds are tight. Thus R-PCA via Outlier Pursuit is able to recover intrinsic matrix of higher rank and identify much denser corruptions than what the existing results could predict. Moreover, we suggest that the regularization parameter be chosen as 1 squareroot{log n}, which is definite. Our analysis waives the necessity of tuning the regularization parameter and also significantly extends the working range of the Outlier Pursuit. Experiments on synthetic and real data verify our theories.


A Mathematical Programming-Based Approach to Determining Objective Functions from Qualitative and Subjective Comparisons

AAAI Conferences

The solutions or states of optimization problems or simulations are evaluated by using objective functions. The weights for these objective functions usually have to be estimated from experts' evaluations, which are likely to be qualitative and somewhat subjective. Although such estimation tasks are normally regarded as quite suitable for machine learning, we propose a mathematical programming-based method for better estimation. The key idea of our method is to use an ordinal scale for measuring paired differences of the objective values as well as the paired objective values. By using an ordinal scale, experts' qualitative and subjective evaluations can be appropriately expressed with simultaneous linear inequalities, and which can be handled by a mathematical programming solver. This allows us to extract more information from experts' evaluations compared to machine-learning-based algorithms, which increases the accuracy of our estimation. We show that our method outperforms machine-learning-based algorithms in a test of finding appropriate weights for an objective function.


Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model

AAAI Conferences

Gaussian process (GP) regression is a widely used method for non-linear prediction.The performance of the GP regression depends on whether it can properly capture the covariance structure of target variables, which is represented by kernels between input data.However, when the input is represented as a set of features, e.g. bag-of-words, it is difficult to calculate desirable kernel values because the co-occurrence of different but relevant words cannot be reflected in the kernel calculation.To overcome this problem, we propose a Gaussian process latent variable set model (GP-LVSM), which is a non-linear regression model effective for bag-of-words data.With the GP-LVSM, a latent vector is associated with each word, and each document is represented as a distribution of the latent vectors for words appearing in the document. We efficiently represent the distributions by using the framework of kernel embeddings of distributions that can hold high-order moment information of distributions without need for explicit density estimation.By learning latent vectors so as to maximize the posterior probability, kernels that reflect relations between words are obtained, and also words are visualized in a low-dimensional space.In experiments using 25 item review datasets, we demonstrate the effectiveness of the GP-LVSM in prediction and visualization.


Dictionary Learning with Mutually Reinforcing Group-Graph Structures

AAAI Conferences

In this paper, we propose a novel dictionary learning method in the semi-supervised setting by dynamically coupling graph and group structures. To this end, samples are represented by sparse codes inheriting their graph structure while the labeled samples within the same class are represented with group sparsity, sharing the same atoms of the dictionary. Instead of statically combining graph and group structures, we take advantage of them in a mutually reinforcing way — in the dictionary learning phase, we introduce the unlabeled samples into groups by an entropy-based method and then update the corresponding local graph, resulting in a more structured and discriminative dictionary. We analyze the relationship between the two structures and prove the convergence of our proposed method. Focusing on image classification task, we evaluate our approach on several datasets and obtain superior performance compared with the state-of-the-art methods, especially in the case of only a few labeled samples and limited dictionary size.


Transfer Feature Representation via Multiple Kernel Learning

AAAI Conferences

Learning an appropriate feature representation across source and target domains is one of the most effective solutions to domain adaptation problems. Conventional cross-domain feature learning methods rely on the Reproducing Kernel Hilbert Space (RKHS) induced by a single kernel. Recently, Multiple Kernel Learning (MKL), which bases classifiers on combinations of kernels, has shown improved performance in the tasks without distribution difference between domains. In this paper, we generalize the framework of MKL for cross-domain feature learning and propose a novel Transfer Feature Representation (TFR) algorithm. TFR learns a convex combination of multiple kernels and a linear transformation in a single optimization which integrates the minimization of distribution difference with the preservation of discriminating power across domains. As a result, standard machine learning models trained in the source domain can be reused for the target domain data. After rewritten into a differentiable formulation, TFR can be optimized by a reduced gradient method and reaches the convergence. Experiments in two real-world applications verify the effectiveness of our proposed method.


Learning to Hash on Structured Data

AAAI Conferences

Hashing techniques have been widely applied for large scale similarity search problems due to the computational and memory efficiency.However, most existing hashing methods assume data examples are independently and identically distributed.But there often exists various additional dependency/structure information between data examplesin many real world applications. Ignoring this structure information may limit theperformance of existing hashing algorithms.This paper explores the research problemof learning to Hash on Structured Data (HSD) and formulates anovel framework that considers additional structure information.In particular, the hashing function is learned in a unified learning framework by simultaneously ensuring the structural consistency and preserving the similarities between data examples.An iterative gradient descent algorithm is designed as the optimization procedure. Furthermore, we improve the effectiveness of hashing function through orthogonal transformation by minimizing the quantization error.Experimentalresults on two datasets clearly demonstrate the advantages ofthe proposed method over several state-of-the-art hashing methods.


Convex Batch Mode Active Sampling via α-Relative Pearson Divergence

AAAI Conferences

Active learning is a machine learning technique that trains a classifier after selecting a subset from an unlabeled dataset for labeling and using the selected data for training. Recently, batch mode active learning, which selects a batch of samples to label in parallel, has attracted a lot of attention. Its challenge lies in the choice of criteria used for guiding the search of the optimal batch. In this paper, we propose a novel approach to selecting the optimal batch of queries by minimizing the α-relative Pearson divergence (RPE) between the labeled and the original datasets. This particular divergence is chosen since it can distinguish the optimal batch more easily than other measures especially when available candidates are similar. The proposed objective is a min-max optimization problem, and it is difficult to solve due to the involvement of both minimization and maximization. We find that the objective has an equivalent convex form, and thus a global optimal solution can be obtained. Then the subgradient method can be applied to solve the simplified convex problem. Our empirical studies on UCI datasets demonstrate the effectiveness of the proposed approach compared with the state-of-the-art batch mode active learning methods.


Gaussian Cardinality Restricted Boltzmann Machines

AAAI Conferences

Restricted Boltzmann Machine (RBM) has been applied to a wide variety of tasks due to its advantage in feature extraction. Implementing sparsity constraint in the activated hidden units of RBM is an important improvement on RBM. The sparsity constraints in the existing methods are usually specified by users and are independent of the input data. However, the input data could be heterogeneous in content and thus naturally demand elastic and adaptive settings of the sparsity constraints. To solve this problem, we proposed a generalized model with adaptive sparsity constraint, named Gaussian Cardinality Restricted Boltzmann Machines (GC-RBM). In this model, the thresholds of hidden unit activations are decided by the input data and a given Gaussian distribution on the pre-training phase. We provide a principled method to train the GC-RBM with Gaussian prior. Experimental results on two real world data sets justify the effectiveness of the proposed method and its superiority over CaRBM in terms of classification accuracy.


Optimizing the CVaR via Sampling

AAAI Conferences

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the gradient of the CVaR, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.


Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation

AAAI Conferences

The increasing realization in recent years that artificial In particular, there is an alternative kind of discriminative neural networks (ANNs) can learn many layers of features learning that is unsupervised rather than supervised. In this (Bengio et al. 2007; Hinton, Osindero, and Teh 2006; proposed alternative approach, called divergent discriminative Marc'Aurelio, Boureau, and LeCun 2007; Cireşan et al. feature accumulation (DDFA), instead of searching for 2010) has reinvigorated the study of representation learning features constrained by the objective of solving the discriminative in ANNs (Bengio, Courville, and Vincent 2013). While classification problem, a learning algorithm can instead the beginning of this renaissance focused on the sequential attempt to collect as many features that discriminate unsupervised training of individual layers one upon another strongly among training examples as possible, without regard (Bengio et al. 2007; Hinton, Osindero, and Teh 2006), the to any particular classification problem.