Asia
Intra-View and Inter-View Supervised Correlation Analysis for Multi-View Feature Learning
Jing, Xiao-Yuan (Wuhan University) | Hu, Rui-Min (Wuhan University) | Zhu, Yang-Ping (Nanjing University of Posts and Telecommunications) | Wu, Shan-Shan (Nanjing University of Posts and Telecommunications) | Liang, Chao (Wuhan University) | Yang, Jing-Yu (Nanjing University of Science and Technology)
Multi-view feature learning is an attractive research topic with great practical success. Canonical correlation analysis (CCA) has become an important technique in multi-view learning, since it can fully utilize the inter-view correlation. In this paper, we mainly study the CCA based multi-view supervised feature learning technique where the labels of training samples are known. Several supervised CCA based multi-view methods have been presented, which focus on investigating the supervised correlation across different views. However, they take no account of the intra-view correlation between samples. Researchers have also introduced the discriminant analysis technique into multi-view feature learning, such as multi-view discriminant analysis (MvDA). But they ignore the canonical correlation within each view and between all views. In this paper, we propose a novel multi-view feature learning approach based on intra-view and inter-view supervised correlation analysis (I2SCA), which can explore the useful correlation information of samples within each view and between all views. The objective function of I2SCA is designed to simultaneously extract the discriminatingly correlated features from both inter-view and intra-view. It can obtain an analytical solution without iterative calculation. And we provide a kernelized extension of I2SCA to tackle the linearly inseparable problem in the original feature space. Four widely-used datasets are employed as test data. Experimental results demonstrate that our proposed approaches outperform several representative multi-view supervised feature learning methods.
Fast Multi-Instance Multi-Label Learning
Huang, Sheng-Jun (Nanjing University) | Gao, Wei (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
In multi-instance multi-label learning (MIML), one object is represented by multiple instances and simultaneously associated with multiple labels. Existing MIML approaches have been found useful in many applications; however, most of them can only handle moderate-sized data. To efficiently handle large data sets, we propose the MIMLfast approach, which first constructs a low-dimensional subspace shared by all labels, and then trains label specific linear models to optimize approximated ranking loss via stochastic gradient descent. Although the MIML problem is complicated, MIMLfast is able to achieve excellent performance by exploiting label relations with shared space and discovering sub-concepts for complicated labels. Experiments show that the performance of MIMLfast is highly competitive to state-of-the-art techniques, whereas its time cost is much less; particularly, on a data set with 30K bags and 270K instances, where none of existing approaches can return results in 24 hours, MIMLfast takes only 12 minutes. Moreover, our approach is able to identify the most representative instance for each label, and thus providing a chance to understand the relation between input patterns and output semantics.
Deep Modeling of Group Preferences for Group-Based Recommendation
Hu, Liang (Shanghai Jiaotong University) | Cao, Jian (Shanghai Jiaotong University) | Xu, Guandong (University of Technology Sydney) | Cao, Longbing (University of Technology Sydney) | Gu, Zhiping (Shanghai Technical Institute of Electronics &) | Cao, Wei (Information)
Nowadays, most recommender systems (RSs) mainly aim to suggest appropriate items for individuals. Due to the social nature of human beings, group activities have become an integral part of our daily life, thus motivating the study on group RS (GRS). However, most existing methods used by GRS make recommendations through aggregating individual ratings or individual predictive results rather than considering the collective features that govern user choices made within a group. As a result, such methods are heavily sensitive to data, hence they often fail to learn group preferences when the data are slightly inconsistent with predefined aggregation assumptions. To this end, we devise a novel GRS approach which accommodates both individual choices and group decisions in a joint model. More specifically, we propose a deep-architecture model built with collective deep belief networks and dual-wing restricted Boltzmann machines. With such a deep model, we can use high-level features, which are induced from lower-level features, to represent group preference so as to relieve the vulnerability of data. Finally, the experiments conducted on a real-world dataset prove the superiority of our deep model over other state-of-the-art methods.
Encoding Tree Sparsity in Multi-Task Learning: A Probabilistic Framework
Han, Lei (Peking University) | Zhang, Yu (Hong Kong Baptist University) | Song, Guojie (Peking University) | Xie, Kunqing (Peking University)
Multi-task learning seeks to improve the generalization performance by sharing common information among multiple related tasks. A key assumption in most MTL algorithms is that all tasks are related, which, however, may not hold in many real-world applications. Existing techniques, which attempt to address this issue, aim to identify groups of related tasks using group sparsity. In this paper, we propose a probabilistic tree sparsity (PTS) model to utilize the tree structure to obtain the sparse solution instead of the group structure. Specifically, each model coefficient in the learning model is decomposed into a product of multiple component coefficients each of which corresponds to a node in the tree. Based on the decomposition, Gaussian and Cauchy distributions are placed on the component coefficients as priors to restrict the model complexity. We devise an efficient expectation maximization algorithm to learn the model parameters. Experiments conducted on both synthetic and real-world problems show the effectiveness of our model compared with state-of-the-art baselines.
Signed Laplacian Embedding for Supervised Dimension Reduction
Gong, Chen (Shanghai Jiao Tong University and University of Technology Sydney) | Tao, Dacheng (University of Technology Sydney) | Yang, Jie (Shanghai Jiao Tong University) | Fu, Keren (Shanghai Jiao Tong University)
Manifold learning is a powerful tool for solving nonlinear dimension reduction problems. By assuming that the high-dimensional data usually lie on a low-dimensional manifold, many algorithms have been proposed. However, most algorithms simply adopt the traditional graph Laplacian to encode the data locality, so the discriminative ability is limited and the embedding results are not always suitable for the subsequent classification. Instead, this paper deploys the signed graph Laplacian and proposes Signed Laplacian Embedding (SLE) for supervised dimension reduction. By exploring the label information, SLE comprehensively transfers the discrimination carried by the original data to the embedded low-dimensional space. Without perturbing the discrimination structure, SLE also retains the locality.Theoretically, we prove the immersion property by computing the rank of projection, and relate SLE to existing algorithms in the frame of patch alignment. Thorough empirical studies on synthetic and real datasets demonstrate the effectiveness of SLE.
ReLISH: Reliable Label Inference via Smoothness Hypothesis
Gong, Chen (Shanghai Jiao Tong University and University of Technology Sydney) | Tao, Dacheng (University of Technology Sydney) | Fu, Keren (Shanghai Jiao Tong University) | Yang, Jie (Shanghai Jiao Tong University)
The smoothness hypothesis is critical for graph-based semi-supervised learning. This paper defines local smoothness, based on which a new algorithm, Reliable Label Inference via Smoothness Hypothesis (ReLISH), is proposed. ReLISH has produced smoother labels than some existing methods for both labeled and unlabeled examples. Theoretical analyses demonstrate good stability and generalizability of ReLISH. Using real-world datasets, our empirical analyses reveal that ReLISH is promising for both transductive and inductive tasks, when compared with representative algorithms, including Harmonic Functions, Local and Global Consistency, Constraint Metric Learning, Linear Neighborhood Propagation, and Manifold Regularization.
Kernelized Bayesian Transfer Learning
Gönen, Mehmet (Sage Bionetworks) | Margolin, Adam A. (Sage Bionetworks)
Transfer learning considers related but distinct tasks defined on heterogenous domains and tries to transfer knowledge between these tasks to improve generalization performance. It is particularly useful when we do not have sufficient amount of labeled training data in some tasks, which may be very costly, laborious, or even infeasible to obtain. Instead, learning the tasks jointly enables us to effectively increase the amount of labeled training data. In this paper, we formulate a kernelized Bayesian transfer learning framework that is a principled combination of kernel-based dimensionality reduction models with task-specific projection matrices to find a shared subspace and a coupled classification model for all of the tasks in this subspace. Our two main contributions are: (i) two novel probabilistic models for binary and multiclass classification, and (ii) very efficient variational approximation procedures for these models. We illustrate the generalization performance of our algorithms on two different applications. In computer vision experiments, our method outperforms the state-of-the-art algorithms on nine out of 12 benchmark supervised domain adaptation experiments defined on two object recognition data sets. In cancer biology experiments, we use our algorithm to predict mutation status of important cancer genes from gene expression profiles using two distinct cancer populations, namely, patient-derived primary tumor data and in-vitro-derived cancer cell line data. We show that we can increase our generalization performance on primary tumors using cell lines as an auxiliary data source.
Finding Median Point-Set Using Earth Mover's Distance
Ding, Hu (State University of New York at Buffalo) | Xu, Jinhui (State University of New York at Buffalo)
In this paper, we study a prototype learning problem, called Median Point-Set, whose objective is to construct a prototype for a set of given point-sets so as to minimize the total Earth Mover's Distances (EMD) between the prototype and the point-sets, where EMD between two point-sets is measured under affine transformation. For this problem, we present the first purely geometric approach. Comparing to existing graph-based approaches (e.g., median graph, shock graph), our approach has several unique advantages: (1) No encoding and decoding procedures are needed to map between objects and graphs, and therefore avoid errors caused by information losing during the mappings; (2) Staying only in the geometric domain makes our approach computationally more efficient and robust to noise. We evaluate the performance of our technique for prototype reconstruction on a random dataset and a benchmark dataset, handwriting Chinese characters. Experiments suggest that our technique considerably outperforms the existing graph-based methods.
Learning the Structure of Probabilistic Graphical Models with an Extended Cascading Indian Buffet Process
Dallaire, Patrick (Laval University) | Giguère, Philippe (Laval University) | Chaib-draa, Brahim (Laval University)
This paper presents an extension of the cascading Indian buffet process (CIBP) intended to learning arbitrary directed acyclic graph structures as opposed to the CIBP, which is limited to purely layered structures. The extended cascading Indian buffet process (eCIBP) essentially consists in adding an extra sampling step to the CIBP to generate connections between non-consecutive layers. In the context of graphical model structure learning, the proposed approach allows learning structures having an unbounded number of hidden random variables and automatically selecting the model complexity. We evaluated the extended process on multivariate density estimation and structure identification tasks by measuring the structure complexity and predictive performance. The results suggest the extension leads to extracting simpler graphs without scarifying predictive precision.
Learning with Augmented Class by Exploiting Unlabeled Data
Da, Qing (Nanjing University) | Yu, Yang (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
In many real-world applications of learning, the environment is open and changes gradually, which requires the learning system to have the ability of detecting and adapting to the changes. Class-incremental learning (C-IL) is an important and practical problem where data from unseen augmented classes are fed, but has not been studied well in the past. In C-IL, the system should beware of predicting instances from augmented classes as a seen class, and thus faces the challenge that no such instances were observed during training stage. In this paper, we tackle the challenge by using unlabeled data, which can be cheaply collected in many real-world applications. We propose the LACU framework as well as the LACU-SVM approach to learn the concept of seen classes while incorporating the structure presented in the unlabeled data, so that the misclassification risks among the seen classes as well as between the augmented and the seen classes are minimized simultaneously. Experiments on diverse datasets show the effectiveness of the proposed approach.