Goto

Collaborating Authors

 Statistical Learning


Semantic Data Representation for Improving Tensor Factorization

AAAI Conferences

Predicting human activities is important for improving recommender systems or analyzing social relationships among users. Those human activities are usually repre- sented as multi-object relationships (e.g. userโ€™s tagging activities for items or userโ€™s tweeting activities at some locations). Since multi-object relationships are naturally represented as a tensor, tensor factorization is becom- ing more important for predicting usersโ€™ possible ac- tivities. However, its prediction accuracy is weak for ambiguous and/or sparsely observed objects. Our so- lution, Semantic data Representation for Tensor Fac- torization (SRTF), tackles these problems by incorpo- rating semantics into tensor factorization based on the following ideas: (1) It first links objects to vocabu- laries/taxonomies and resolves the ambiguity caused by objects that can be used for multiple purposes. (2) It next links objects to composite classes that merge classes in different kinds of vocabularies/taxonomies (e.g. classes in vocabularies for movie genres and those for directors) to avoid low prediction accuracy caused by rough-grained semantics. (3) It then lifts sparsely observed objects into their classes to solve the sparsity problem for rarely observed objects. To the best of our knowledge, this is the first study that leverages seman- tics to inject expert knowledge into tensor factorization. Experiments show that SRTF achieves up to 10% higher accuracy than state-of-the-art methods.


Convex Co-embedding

AAAI Conferences

We present a general framework for association learning, where entities are embedded in a common latent space to express relatedness by geometry -- an approach that underlies the state of the art for link prediction, relation learning, multi-label tagging, relevance retrieval and ranking. Although current approaches rely on local training applied to non-convex formulations, we demonstrate how general convex formulations can be achieved for entity embedding, both for standard multi-linear and prototype-distance models. We investigate an efficient optimization strategy that allows scaling. An experimental evaluation reveals the advantages of global training in different case studies.


Partial Multi-View Clustering

AAAI Conferences

Real data are often with multiple modalities or comingfrom multiple channels, while multi-view clusteringprovides a natural formulation for generating clustersfrom such data. Previous studies assumed that each exampleappears in all views, or at least there is one viewcontaining all examples. In real tasks, however, it is oftenthe case that every view suffers from the missing ofsome data and therefore results in many partial examples,i.e., examples with some views missing. In this paper,we present possibly the first study on partial multiviewclustering. Our proposed approach, PVC, worksby establishing a latent subspace where the instancescorresponding to the same example in different viewsare close to each other, and similar instances (belongingto different examples) in the same view should bewell grouped. Experiments on two-view data demonstratethe advantages of our proposed approach.


Wormhole Hamiltonian Monte Carlo

AAAI Conferences

In machine learning and statistics, probabilistic inference involving multimodal distributions is quite difficult. This is especially true in high dimensional problems, where most existing algorithms cannot easily move from one mode to another. To address this issue, we propose a novel Bayesian inference approach based on Markov Chain Monte Carlo. Our method can effectively sample from multimodal distributions, especially when the dimension is high and the modes are isolated. To this end, it exploits and modifies the Riemannian geometric properties of the target distribution to create \emph{wormholes} connecting modes in order to facilitate moving between them. Further, our proposed method uses the regeneration technique in order to adapt the algorithm by identifying new modes and updating the network of wormholes without affecting the stationary distribution. To find new modes, as opposed to rediscovering those previously identified, we employ a novel mode searching algorithm that explores a \emph{residual energy} function obtained by subtracting an approximate Gaussian mixture density (based on previously discovered modes) from the target density function.


Feature-Cost Sensitive Learning with Submodular Trees of Classifiers

AAAI Conferences

During the past decade, machine learning algorithms have become commonplace in large-scale real-world industrial applications. In these settings, the computation time to train and test machine learning algorithms is a key consideration. At training-time the algorithms must scale to very large data set sizes.At testing-time, the cost of feature extraction can dominate the CPU runtime. Recently, a promising method was proposed to account for the feature extraction cost at testing time, called Cost-sensitive Tree of Classifiers (CSTC). Although the CSTC problem is NP-hard, the authors suggest an approximation through a mixed-norm relaxation across many classifiers. This relaxation is slow to train and requires involved optimization hyperparameter tuning. We propose a different relaxation using approximate submodularity, called Approximately Submodular Tree of Classifiers (ASTC). ASTC is much simpler to implement, yields equivalent results but requires no optimization hyperparameter tuning and is up to two orders of magnitude faster to train.


Pairwise-Covariance Linear Discriminant Analysis

AAAI Conferences

In machine learning, linear discriminant analysis (LDA) is a popular dimension reduction method. In this paper, we first provide a new perspective of LDA from an information theory perspective. From this new perspective, we propose a new formulation of LDA, which uses the pairwise averaged class covariance instead of theglobally averaged class covariance used in standard LDA. This pairwise (averaged) covariance describes data distribution more accurately. The new perspective also provides a natural way to properly weigh different pairwise distances, which emphasizes the pairs of class with small distances, and this leads to the proposed pairwise covariance properly weighted LDA (pcLDA). The kernel version of pcLDA is presented to handle nonlinear projections. Efficient algorithms are presented to efficiently compute the proposed models.


Non-Convex Feature Learning via Lp,inf Operator

AAAI Conferences

We present a feature selection method for solving sparse regularization problem, which hasa composite regularization of $\ell_p$ norm and $\ell_{\infty}$ norm.We use proximal gradient method to solve this \L1inf operator problem, where a simple but efficient algorithm is designed to minimize a relatively simple objective function, which contains a vector of $\ell_2$ norm and $\ell_\infty$ norm. Proposed method brings some insight for solving sparsity-favoring norm, andextensive experiments are conducted to characterize the effect of varying $p$ and to compare with other approaches on real world multi-class and multi-label datasets.


Monte Carlo Filtering Using Kernel Embedding of Distributions

AAAI Conferences

Recent advances of kernel methods have yielded a framework for representing probabilities using a reproducing kernel Hilbert space, called kernel embedding of distributions. In this paper, we propose a Monte Carlo filtering algorithm based on kernel embeddings. The proposed method is applied to state-space models where sampling from the transition model is possible, while the observation model is to be learned from training samples without assuming a parametric model. As a theoretical basis of the proposed method, we prove consistency of the Monte Carlo method combined with kernel embeddings. Experimental results on synthetic models and real vision-based robot localization confirm the effectiveness of the proposed approach.


Imitation Learning with Demonstrations and Shaping Rewards

AAAI Conferences

Imitation Learning (IL) is a popular approach for teaching behavior policies to agents by demonstrating the desired target policy. While the approach has lead to many successes, IL often requires a large set of demonstrations to achieve robust learning, which can be expensive for the teacher. In this paper, we consider a novel approach to improve the learning efficiency of IL by providing a shaping reward function in addition to the usual demonstrations. Shaping rewards are numeric functions of states (and possibly actions) that are generally easily specified, and capture general principles of desired behavior, without necessarily completely specifying the behavior. Shaping rewards have been used extensively in reinforcement learning, but have been seldom considered for IL, though they are often easy to specify. Our main contribution is to propose an IL approach that learns from both shaping rewards and demonstrations. We demonstrate the effectiveness of the approach across several IL problems, even when the shaping reward is not fully consistent with the demonstrations.


Intra-View and Inter-View Supervised Correlation Analysis for Multi-View Feature Learning

AAAI Conferences

Multi-view feature learning is an attractive research topic with great practical success. Canonical correlation analysis (CCA) has become an important technique in multi-view learning, since it can fully utilize the inter-view correlation. In this paper, we mainly study the CCA based multi-view supervised feature learning technique where the labels of training samples are known. Several supervised CCA based multi-view methods have been presented, which focus on investigating the supervised correlation across different views. However, they take no account of the intra-view correlation between samples. Researchers have also introduced the discriminant analysis technique into multi-view feature learning, such as multi-view discriminant analysis (MvDA). But they ignore the canonical correlation within each view and between all views. In this paper, we propose a novel multi-view feature learning approach based on intra-view and inter-view supervised correlation analysis (I2SCA), which can explore the useful correlation information of samples within each view and between all views. The objective function of I2SCA is designed to simultaneously extract the discriminatingly correlated features from both inter-view and intra-view. It can obtain an analytical solution without iterative calculation. And we provide a kernelized extension of I2SCA to tackle the linearly inseparable problem in the original feature space. Four widely-used datasets are employed as test data. Experimental results demonstrate that our proposed approaches outperform several representative multi-view supervised feature learning methods.