Goto

Collaborating Authors

 Statistical Learning


Constrained NMF-Based Multi-View Clustering on Unmapped Data

AAAI Conferences

We use the disagreement between the Multi-view clustering gains increasing attention in the past views to guide the factorization of the matrices. The overall decade (Bickel and Scheffer 2004) (Kumar and III 2011) objective of our algorithm is to minimize the loss function of (Kumar, Rai, and III 2011) (Liu et al. 2013) (Blaschko and NMF in each view as well as the disagreement between each Lampert 2008) (Chaudhuri et al. 2009) (Tzortzis and Likas pair of views. Experimental results show that, with a small 2012). Most existing multi-view clustering algorithms require number of constraints, the proposed CMVNMF (Constrained that the data is completely mapped, i.e., every object Multi-View clustering based on NMF) algorithm gets good has representations in all the views, representations from different performance on unmapped data, and outperforms existing views representing a same object are exactly known, algorithms on partially mapped data and completely mapped and the representations of the same object have the same data.


Online Dictionary Learning on Symmetric Positive Definite Manifolds with Vision Applications

AAAI Conferences

Symmetric Positive Definite (SPD) matrices in the form of region covariances are considered rich descriptors for images and videos. Recent studies suggest that exploiting the Riemannian geometry of the SPD manifolds could lead to improved performances for vision applications. For tasks involving processing large-scale and dynamic data in computer vision, the underlying model is required to progressively and efficiently adapt itself to the new and unseen observations. Motivated by these requirements, this paper studies the problem of online dictionary learning on the SPD manifolds. We make use of the Stein divergence to recast the problem of online dictionary learning on the manifolds to a problem in Reproducing Kernel Hilbert Spaces, for which, we develop efficient algorithms by taking into account the geometric structure of the SPD manifolds. To our best knowledge, our work is the first study that provides a solution for online dictionary learning on the SPD manifolds. Empirical results on both large-scale image classification task and dynamic video processing tasks validate the superior performance of our approach as compared to several state-of-the-art algorithms.


Online Bandit Learning for a Special Class of Non-Convex Losses

AAAI Conferences

In online bandit learning, the learner aims to minimize a sequence of losses, while only observing the value of each loss at a single point. Although various algorithms and theories have been developed for online bandit learning, most of them are limited to convex losses. In this paper, we investigate the problem of online bandit learning with non-convex losses, and develop an efficient algorithm with formal theoretical guarantees. To be specific, we consider a class of losses which is a composition of a non-increasing scalar function and a linear function. This setting models a wide range of supervised learning applications such as online classification with a non-convex loss. Theoretical analysis shows that our algorithm achieves an O(poly(d)T2/3) regret bound when the variation of the loss function is small. To the best of our knowledge, this is the first work in online bandit learning that does not rely on convexity.


Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model

AAAI Conferences

Gaussian process (GP) regression is a widely used method for non-linear prediction.The performance of the GP regression depends on whether it can properly capture the covariance structure of target variables, which is represented by kernels between input data.However, when the input is represented as a set of features, e.g. bag-of-words, it is difficult to calculate desirable kernel values because the co-occurrence of different but relevant words cannot be reflected in the kernel calculation.To overcome this problem, we propose a Gaussian process latent variable set model (GP-LVSM), which is a non-linear regression model effective for bag-of-words data.With the GP-LVSM, a latent vector is associated with each word, and each document is represented as a distribution of the latent vectors for words appearing in the document. We efficiently represent the distributions by using the framework of kernel embeddings of distributions that can hold high-order moment information of distributions without need for explicit density estimation.By learning latent vectors so as to maximize the posterior probability, kernels that reflect relations between words are obtained, and also words are visualized in a low-dimensional space.In experiments using 25 item review datasets, we demonstrate the effectiveness of the GP-LVSM in prediction and visualization.


Dictionary Learning with Mutually Reinforcing Group-Graph Structures

AAAI Conferences

In this paper, we propose a novel dictionary learning method in the semi-supervised setting by dynamically coupling graph and group structures. To this end, samples are represented by sparse codes inheriting their graph structure while the labeled samples within the same class are represented with group sparsity, sharing the same atoms of the dictionary. Instead of statically combining graph and group structures, we take advantage of them in a mutually reinforcing way โ€” in the dictionary learning phase, we introduce the unlabeled samples into groups by an entropy-based method and then update the corresponding local graph, resulting in a more structured and discriminative dictionary. We analyze the relationship between the two structures and prove the convergence of our proposed method. Focusing on image classification task, we evaluate our approach on several datasets and obtain superior performance compared with the state-of-the-art methods, especially in the case of only a few labeled samples and limited dictionary size.


Optimal Estimation of Multivariate ARMA Models

AAAI Conferences

A central problem in applied data analysis is time series In this paper, we develop a tractable approach to maximum modeling--estimating and forecasting a discrete-time likelihood parameter estimation for stochastic multivariate stochastic process--for which the autoregressive moving ARMA models. To efficiently compute a globally average (ARMA) and stochastic ARMA (Thiesson et al. optimal estimate, the problem is re-expressed as a regularized 2012) are fundamental models. An ARMA model describes loss minimization, which then allows recent algorithmic the behavior of a linear dynamical system under advances in sparse estimation to be applied (Shah et al. latent Gaussian perturbations (Brockwell and Davis 2002; 2012; Candes et al. 2011; Bach, Mairal, and Ponce 2008; Lรผtkepohl 2007), which affords intuitive modeling capability, Zhang et al. 2011; White et al. 2012). Although there has efficient forecasting algorithms, and a close relationship been recent progress in global estimation for ARMA, such to linear Gaussian state-space models (Katayama 2006, approaches have either been restricted to single-input singleoutput pp.5-6).


Transfer Feature Representation via Multiple Kernel Learning

AAAI Conferences

Learning an appropriate feature representation across source and target domains is one of the most effective solutions to domain adaptation problems. Conventional cross-domain feature learning methods rely on the Reproducing Kernel Hilbert Space (RKHS) induced by a single kernel. Recently, Multiple Kernel Learning (MKL), which bases classifiers on combinations of kernels, has shown improved performance in the tasks without distribution difference between domains. In this paper, we generalize the framework of MKL for cross-domain feature learning and propose a novel Transfer Feature Representation (TFR) algorithm. TFR learns a convex combination of multiple kernels and a linear transformation in a single optimization which integrates the minimization of distribution difference with the preservation of discriminating power across domains. As a result, standard machine learning models trained in the source domain can be reused for the target domain data. After rewritten into a differentiable formulation, TFR can be optimized by a reduced gradient method and reaches the convergence. Experiments in two real-world applications verify the effectiveness of our proposed method.


Learning to Hash on Structured Data

AAAI Conferences

Hashing techniques have been widely applied for large scale similarity search problems due to the computational and memory efficiency.However, most existing hashing methods assume data examples are independently and identically distributed.But there often exists various additional dependency/structure information between data examplesin many real world applications. Ignoring this structure information may limit theperformance of existing hashing algorithms.This paper explores the research problemof learning to Hash on Structured Data (HSD) and formulates anovel framework that considers additional structure information.In particular, the hashing function is learned in a unified learning framework by simultaneously ensuring the structural consistency and preserving the similarities between data examples.An iterative gradient descent algorithm is designed as the optimization procedure. Furthermore, we improve the effectiveness of hashing function through orthogonal transformation by minimizing the quantization error.Experimentalresults on two datasets clearly demonstrate the advantages ofthe proposed method over several state-of-the-art hashing methods.


Learning Robust Locality Preserving Projection via p-Order Minimization

AAAI Conferences

Locality preserving projection (LPP) is an effective dimensionality reduction method based on manifold learning, which is defined over the graph weighted squared L2-norm distances in the projected subspace. Since squared L2-norm distance is prone to outliers, it is desirable to develop a robust LPP method. In this paper, motivated by existing studies that improve the robustness of statistical learning models via L1-norm or not-squared L2-norm formulations, we propose a robust LPP (rLPP) formulation to minimize the p-th order of the L2-norm distances, which can better tolerate large outlying data samples because it suppress the introduced biased more than the L1-norm or not squared L2-norm minimizations. However, solving the formulated objective is very challenging because it not only non-smooth but also non-convex. As an important theoretical contribution of this work, we systematically derive an efficient iterative algorithm to solve the general p-th order L2-norm minimization problem, which, to the best of our knowledge, is solved for the first time in literature. Extensive empirical evaluations on the proposed rLPP method have been performed, in which our new method outperforms the related state-of-the-art methods in a variety of experimental settings and demonstrate its effectiveness in seeking better subspaces on both noiseless and noisy data.


Improving Multi-Step Prediction of Learned Time Series Models

AAAI Conferences

Most typical statistical and machine learning approaches to time series modeling optimize a single-step prediction error. In multiple-step simulation, the learned model is iteratively applied, feeding through the previous output as its new input. Any such predictor however, inevitably introduces errors, and these compounding errors change the input distribution for future prediction steps, breaking the train-test i.i.d assumption common in supervised learning. We present an approach that reuses training data to make a no-regret learner robust to errors made during multi-step prediction. Our insight is to formulate the problem as imitation learning; the training data serves as a "demonstrator" by providing corrections for the errors made during multi-step prediction. By this reduction of multi-step time series prediction to imitation learning, we establish theoretically a strong performance guarantee on the relation between training error and the multi-step prediction error. We present experimental results of our method, DaD, and show significant improvement over the traditional approach in two notably different domains, dynamic system modeling and video texture prediction.