Goto

Collaborating Authors

 Statistical Learning


Deep Linear Coding for Fast Graph Clustering

AAAI Conferences

Clustering has been one of the most critical unsupervised learning techniques that has been widely applied in data mining problems. As one of its branches, graph clustering enjoys its popularity due to its appealing performance and strong theoretical supports. However, the eigen-decomposition problems involved are computationally expensive. In this paper, we propose a deep structure with a linear coder as the building block for fast graph clustering, called Deep Linear Coding (DLC). Different from conventional coding schemes, we jointly learn the feature transform function and discriminative codings, and guarantee that the learned codes are robust in spite of local distortions. In addition, we use the proposed linear coders as the building blocks to formulate a deep structure to further refine features in a layerwise fashion. Extensive experiments on clustering tasks demonstrate that our method performs well in terms of both time complexity and clustering accuracy. On a large-scale benchmark dataset (580K), our method runs 1500 times faster than the original spectral clustering.


Scalable Probabilistic Tensor Factorization for Binary and Count Data

AAAI Conferences

Tensor factorization methods provide a useful way to extract latent factors from complex multirelational data, and also for predicting missing data. Developing tensor factorization methods for massive tensors, especially when the data are binary- or count-valued (which is true of most real-world tensors), however, remains a challenge. We develop a scalable probabilistic tensor factorization framework that enables us to perform efficient factorization of massive binary and count tensor data. The framework is based on (i) the Polya-Gamma augmentation strategy which makes the model fully locally conjugate and allows closed-form parameter updates when data are binary- or count-valued; and (ii) an efficient online Expectation Maximization algorithm, which allows processing data in small mini-batches, and facilitates handling massive tensor data. Moreover, various types of constraints on the factor matrices (e.g., sparsity, non-negativity) can be incorporated under the proposed framework, providing good interpretability, which can be useful for qualitative analyses of the results. We apply the proposed framework on analyzing several binary- and count-valued real-world data sets.


EigenGP: Gaussian Process Models with Adaptive Eigenfunctions

AAAI Conferences

Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost for big data. In this paper, we propose a new Bayesian approach, EigenGP, that learns both basis dictionary elements — eigenfunctions of a GP prior — and prior precisions in a sparse finite model. It is well known that, among all orthogonal basis functions, eigenfunctions can provide the most compact representation. Unlike other sparse Bayesian finite models where the basis function has a fixed form, our eigenfunctions live in a reproducing kernel Hilbert space as a finite linear combination of kernel functions. We learn the dictionary elements — eigenfunctions — and the prior precisions over these elements as well as all the other hyperparameters from data by maximizing the model marginal likelihood. We explore computational linear algebra to simplify the gradient computation significantly. Our experimental results demonstrate improved predictive performance of EigenGP over alternative sparse GP methods as well as relevance vector machines.


Image Feature Learning for Cold Start Problem in Display Advertising

AAAI Conferences

In online display advertising, state-of-the-art Click Through Rate(CTR) prediction algorithms rely heavily on historical information, and they work poorly on growing number of new ads without any historical information. This is known as the the cold start problem. For image ads, current state-of-the-art systems use handcrafted image features such as multimedia features and SIFT features to capture the attractiveness of ads. However, these handcrafted features are task dependent, inflexible and heuristic. In order to tackle the cold start problem in image display ads, we propose a new feature learning architecture to learn the most discriminative image features directly from raw pixels and user feedback in the target task. The proposed method is flexible and does not depend on human heuristic. Extensive experiments on a real world dataset with 47 billion records show that our feature learning method outperforms existing handcrafted features significantly, and it can extract discriminative and meaningful features.


Optimizing Locally Linear Classifiers with Supervised Anchor Point Learning

AAAI Conferences

Kernel SVM suffers from high computational complexity when dealing with large-scale nonlinear datasets. To address this issue, locally linear classifiers have been proposed for approximating nonlinear decision boundaries with locally linear functions using a local coding scheme. The effectiveness of such coding scheme depends heavily on the quality of anchor points chosen to produce the local codes. Existing methods usually involve a phase of unsupervised anchor point learning followed by supervised classifier learning. Thus, the anchor points and classifiers are obtained separately whereas the learned anchor points may not be optimal for the discriminative task. In this paper, we present a novel fully supervised approach for anchor point learning. A single optimization problem is formulated over both anchor point and classifier variables, optimizing the initial anchor points jointly with the classifiers to minimize the classification risk. Experimental results show that our method outperforms other competitive methods which employ unsupervised anchor point learning and achieves performance on par with the kernel SVM albeit with much improved efficiency.


Between Imitation and Intention Learning

AAAI Conferences

Research in learning from demonstration can generally be grouped into either imitation learning or intention learning. In imitation learning, the goal is to imitate the observed behavior of an expert and is typically achieved using supervised learning techniques. In intention learning, the goal is to learn the intention that motivated the expert's behavior and to use a planning algorithm to derive behavior. Imitation learning has the advantage of learning a direct mapping from states to actions, which bears a small computational cost. Intention learning has the advantage of behaving well in novel states, but may bear a large computational cost by relying on planning algorithms in complex tasks. In this work, we introduce receding horizon inverse reinforcement learning, in which the planning horizon induces a continuum between these two learning paradigms. We present empirical results on multiple domains that demonstrate that performing IRL with a small, but non-zero, receding planning horizon greatly decreases the computational cost of planning while maintaining superior generalization performance compared to imitation learning.


Multi-Task Multi-Dimensional Hawkes Processes for Modeling Event Sequences

AAAI Conferences

We propose a Multi-task Multi-dimensional Hawkes Process (MMHP) for modeling event sequences where there exist multiple triggering patterns within sequences and structures across sequences.MMHP is able to model the dynamics of multiple sequences jointly by imposing structural constraints and thus systematically uncover clustering structure among sequences.We propose an effective and robust optimization algorithm to learn MMHP models, which takes advantage of alternating direction method of multipliers (ADMM), majorization minimization and Euler-Lagrange equations.Our experimental results demonstrate that MMHP performs well on both synthetic and real data


Robust Kernel Dictionary Learning Using a Whole Sequence Convergent Algorithm

AAAI Conferences

Kernel sparse coding is an effective strategy to capturethe non-linear structure of data samples. However,how to learn a robust kernel dictionary remainsan open problem. In this paper, we propose a new optimization model to learn the robust kernel dictionary while isolating outliers in the training samples. This model is essentially based on the decomposition of the reconstruction error into small dense noises and large sparse outliers. The outliererror term is formulated as the product of the sample matrix in the feature space and a diagonal coefficient matrix. This facilitates the kernelized dictionary learning. To solve the non-convex optimization problem, we develop a whole sequence convergent algorithm which guarantees the obtained solution sequence is a Cauchy sequence. The experimental results show that the proposed robust kernel dictionary learning method provides significant performance improvement.


Regularizing Flat Latent Variables with Hierarchical Structures

AAAI Conferences

In this paper, we propose a stratified topic model (STM). Instead of directly modeling and inferring flat topics or hierarchically structured topics, we use the stratified relationships in topic hierarchies to regularize the flat topics. The topic structures are captured by a hierarchical clustering method and play as constraints during the learning process. We propose two theoretically sound and practical inference methods to solve the model. Experimental results with two real world data sets and various evaluation metrics demonstrate the effectiveness of the proposed model.


Density Corrected Sparse Recovery when R.I.P. Condition Is Broken

AAAI Conferences

Traditional methods which the features form cluster structures, as can be seen in often rely on R.I.P or its relaxed variants. However, many machine learning [Lehiste, 1976] and computer vision in real applications, features are often correlated problems [Lan et al., 2013; Lowe, 2004]. Due to the fact that to each other, which makes these assumptions many features extractors are similar to each others and they too strong to be useful. In this paper, we reflect the characteristics of the same image, vision features study the sparse recovery problem in which the feature are often correlated and have cluster structures. This correlation matrix is strictly non-R.I.P.. We prove that is even stronger in those systems that have thousands when features exhibit cluster structures, which often to millions of features [Lan et al., 2013; Gan et al., 2015a; happens in real applications, we are able to recover 2015b].